Die Inhalte dieser Website wurden mithilfe künstlicher Intelligenz (KI) oder maschineller Übersetzungstechnologie übersetzt und können Fehler enthalten.

Skip to content

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

Building Toward Functional Game-Ready Assets

Modern 3D generative models can generate beautiful, complex 3D objects from text prompts, but for a game developer, a monolithic 3D model isn’t useful. A car, for example, needs to be drivable. The wheels need to rotate separately, the doors need to open, and the headlights need to switch on. 

Currently, 3D artists have to manually chop up generated models and name the parts—a process that scales poorly. Our breakthrough is CubePart: the first generative AI framework that allows open-vocabulary, part-controllable 3D mesh generation. CubePart outputs an assembled set of distinct, functional, and accurately-labeled meshes that match the developer's programming needs out of the box.

CubePart expands the concept of fixed schemas we introduced with 4D Generation to empower a creator to define the list of parts an object should be broken into. The set of meshes generated by CubePart drop straight into the game engine and can be controlled by animation, physics, and gameplay scripts without manual cleanup. We published our CubePart research on arXiv and updated our open source Cube repository to support part-controllable generation. Later this year, we’ll present our findings at SIGGRAPH

Schema: The API Contract for Interactive 3D Assets

On Roblox, interactive behavior is implemented in scripts that operate on parts—specific, named children of an asset. Even similar assets may require completely different parts depending on the game or situation. A fixed taxonomy would limit creativity and functionality, so CubePart offers two inputs: 

  1. A global text prompt describing what the object looks like: e.g., "a jelly fish themed race car."
  2. A specific, open-ended list of required parts called a schema: e.g., "front left wheel”, “front right wheel”, “rear left wheel”, “rear right wheel”, “gun”, “headlights”, “exhaust pipe”, “body”. 

The schema is the API contract between the asset and the gameplay code, and CubePart lets a creator generate assets that conform to the contract. This open-vocabulary control allows CubePart to capture the diversity of Roblox assets and experiences.

Generation in Two Stages 

CubePart is a two-stage diffusion architecture built on a VecSet latent shape representation. 

In the illustrations below, the user input two prompts. 

  1. The global text prompt: "A tow truck characterized by cartoonish features." 
  2. The schema: “cab”, “chassis”, “wheels”, “roof beacon”, “tow assembly”.

Stage 1 is responsible for defining the foundational shape of the object, (a tow truck characterized by cartoonish features). This stage generates a single latent for the whole object using an MMDiT architecture with the Qwen-VL text encoder, trained on roughly 4.7M mesh-text pairs. This is the data-hungry stage: Mapping open-vocabulary language onto 3D geometry is the hard part of generative 3D, and it requires a large, diverse corpus to do well. We additionally fine-tune Stage 1 to be schema-aware. 

Stage 2 takes the Stage 1 latent and produces one part latent for each schema entry to reconstruct the object with parts. For our cartoonish tow truck example, Stage 2 generates a separate part latent for the cab, chassis, wheels, roof beacon, and tow assembly to reconstruct the final tow truck with distinct, functional parts. Part-labeled 3D data is far scarcer than mesh-text data. With Stage 1 absorbing the complex text-to-shape mapping from a larger corpus, Stage 2 only has to learn where the part boundaries go on an object that the model already understands. We see the ablation in the paper as direct evidence for this: Removing Stage 1 pre-training measurably degrades Stage 2's open-vocabulary generalization. In short, Stage 1 is what lets Stage 2 generalize. 

Another critical innovation in our architecture is how parts communicate. Our solution is to insert dedicated cross-part attention blocks rather than modify existing ones, with zero-initialized output projections so they start as no-ops and learn inter-part communication without disturbing the pre-trained pathway. The principle will be familiar to readers of ControlNet, applied here to 3D part decomposition. For our tow truck example, the cross-part attention blocks ensure that the cab and tow assembly are seamlessly integrated and positioned correctly relative to the chassis and wheels.

Our Dataset and VLM pipeline 

To train CubePart, we created a dataset featuring more than 460,000 assets—over 11 times larger than previous public datasets1—and 2.02 million parts. Instead of manual labeling, we built an automated pipeline using vision-language models (VLMs).

The pipeline renders thousands of 3D models from multiple angles using a paired approach: one textured image (for semantic context) and one part-colored image (for precise boundary tracking). Both are stamped with identical numbered markers, giving the VLM a text-addressable handle to reason in 3D space and cluster and name each part.

Unlike previously published datasets where every wheel on a vehicle is simply labeled "wheel," our dataset teaches the AI spatial differentiation (e.g., distinguishing a "front left wheel" from a "rear right wheel"). This matching precision is exactly what game engines look for.

What CubePart Unlocks and What's Next

CubePart allows creators to generate assets that match their gameplay code and have direct compatibility with existing animation, physics, and scripting workflows. CubePart can also decompose existing artist meshes to a new schema, which is useful for upgrading legacy assets, not just generating new ones.

There's plenty still to do. CubePart handles rigid-body decomposition, but we’re also working on skinned vertex weights for organic character deformation. Cross-part attention dramatically reduces overlap but doesn't eliminate it. Spatial reasoning—"front-left" versus "rear-right"—still has significant room for improvement.

We see schema-driven generation as the step that makes generative 3D useful on a platform where every asset participates in a simulation. Soon, this technology will be available to Roblox creators directly inside Roblox Studio.

1Compared to PartVerseXL