CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

1ShanghaiTech University 2Transcengram 3DeepSeek AI 4University of Hong Kong
(* denotes equal contribution, † denotes the corresponding author)

Example of Command Sequence Representation

CAD-MLLM

A simple example about the construction process of a CAD model with command sequence representation.

Network Architecture

CAD-MLLM

We propose a network capable of simultaneously processing up to three modalities of input data. Each non-text input is first processed through a frozen encoder, followed by a projection layer that aligns these features within a shared large language model (LLM) feature space. By integrating the prompt with the multi-modal embeddings and applying fine-tuning to the LLM using Low-Rank Adaptation (LoRA), our model generates accurate CAD models conditioned on the combined input data.

Our Dataset(Omni-CAD)

CAD-MLLM

Qualitative Comparison: We exclude the CAD models’ IDs that have been included in the DeepCAD dataset for visualization. The extension part of our dataset contains more complex and realistic models with more details.

CAD-MLLM

Example of the conditioned multimodality data and the corresponding ground truth CAD models.

Point Conditioned Generation Results

(Please view our paper for more results with more modalities conditions)
CAD-MLLM

Qualitative comparison with B-rep point reconstruction baselines. Blue lines denote dangling edges, which leads to non-manifold structures.

CAD-MLLM

Our model demonstrates enhanced robustness against noise and partial point cloud elimintaion compared to the baseline.

BibTeX

 @misc{xu2024CADMLLM,
      title={CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM}, 
      author={Jingwei Xu and Chenyu Wang and Zibo Zhao and Wen Liu and Yi Ma and Shenghua Gao},
      year={2024},
      eprint={2411.04954},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}