Introducing SLIM: Scalable Lightweight Interactive Models
Leveraging the Power of Lightweight Rendering Composites to Build Larger and More Detailed Worlds for Every Device
Roblox is home to millions of experiences, from simple mini games to vast open worlds full of high-fidelity 3D assets. Our goal is to support the increasingly dense and complex experiences that creators envision across a broad spectrum of user devices, which requires innovation in our engine, content delivery systems, and infrastructure. Scalable Lightweight Interactive Models, or SLIM, is one part of a multi-part development effort that enables creators to achieve their grand artistic visions without compromising performance.
SLIM allows creators to automatically create lightweight representations of any object in a Roblox experience, minimizing the number of draw calls, triangles, and data model instances required by the client to render a robust, seamless world. SLIM gives our streaming model powerful new ways to optimize content, allowing a user with a high-end gaming PC and a user with a low-end mobile device to share the same experience at the highest fidelity their device can handle.
| Without instance streaming & SLIM | With instance streaming & SLIM | |
| Client data model instances | 159,745 | 92,536 |
| Triangles | 20M | 3.35M |
| Draw Calls | 2,402 | 1,454 |
Two Pillars of Streaming: Instances and Assets
When you watch a movie on your favorite streaming service, your device doesn’t download the entire file before you start watching. It downloads just enough data for you to immediately start watching, then continuously downloads (or buffers) the next few seconds so there’s no interruption to the experience. If you’re streaming on a low-end device or slow connection, the platform automatically adapts by sending you a lower-fidelity version of the content while it downloads enough data to stream the high-fidelity content.
Roblox uses a similar concept to stream content on the fly, but the data required to represent a high-fidelity 3D simulation presents its own challenges. While a video has a linear timeline of content viewed from a single perspective, Roblox experiences contain vast, interactive 3D worlds full of many types of assets that can be viewed from many perspectives controlled by the user. Add in 151.5M daily active users playing, exploring, and ultimately streaming content from a wide variety of devices, and there is enormous potential for optimizing how content is delivered and displayed.1
Everything a user sees on Roblox—a car, a tree, an avatar, or a building—is represented as a number of instances within the engine. A usable car, for example, is broken down into instances for the bumpers, doors, wheels, and so on. Each instance is constructed in the experience using multiple assets, like 3D meshes, textures, animations, and audio.
Nearly all streaming on Roblox is separated into two core technologies: instance streaming and asset streaming.
-
Instance streaming determines which instances a user’s device needs to stream. There’s no need to stream parts of the experience that the user can’t yet see or interact with. In the above example, only the instances representing the nearby buildings are streamed into the client.
-
Asset streaming determines the quality of streamed instances. There’s no need to download a high-resolution 4K texture for a mountain so far away that the user can’t discern fine detail. In the image above, buildings in the distance and buildings that take up a small amount of screen space use decimated meshes and lower-resolution textures.
The central brain for this operation is a system we call Harmony, which monitors each user’s available resources every frame. Harmony adjusts both instance and asset streaming to provide the best experience based on a device’s memory, GPU and CPU load, and network bandwidth. For a high-end gaming PC, Harmony cranks everything up to the highest quality. For a mobile device with a weak connection, it automatically scales down to keep the user experience smooth.
The team discussed the tech behind streaming, cloud transcoding, and SLIM in episode 30 of the Tech Talks podcast.
SLIM: Scalable Lightweight Interactive Models
The core idea of SLIM is simple but powerful: SLIM can automatically create multiple lightweight, optimized representations of any object or model in a creator’s world and store them on the server to be fetched at runtime. Each user’s client can then dynamically switch between rendering the original instances and assets or one of the lightweight SLIM representations, based on the device’s available resources.
SLIM uses two main techniques to generate a lightweight representation:
1. Compositing
First, SLIM combines multiple parts into fewer parts. Instead of the car below requiring 112 separate meshes and 24 separate textures, its lightweight representation might require only one mesh and four textures. The compositing process is precisely tuned to match how the engine will finally render the content, eliminating invisible geometry within an object and reducing the number of draw calls necessary to render it.
2. Level of Detail (LoD)
After compositing the model, SLIM generates multiple versions at different levels of detail. This means creating versions of the 3D mesh with significantly fewer triangles and generating textures at much lower resolutions, as we do with any individual mesh or texture asset using traditional LoD techniques. These techniques can be further optimized when applied to SLIM models since we have the individual coordinate frames of each underlying instance. This gives us full context of how the creator intended each of these assets to be rendered together. With this knowledge, SLIM allows us to make more informed decisions about where to remove unnecessary detail and where to keep the details that users will notice.
The Right Representation at the Right Time
Once we’ve created multiple representations of an object, SLIM must decide which one to use for a specific user’s device or whether to use traditional rendering techniques instead. The system splits the world into three distinct regions. A simple way to think of these regions is to imagine concentric circles of detail extending outward from the player.
Streaming boundaries are not actually circular. Their shape depends on a variety of factors.
HH Region (Heavyweight Instances, Heavyweight Rendering)
In the HH region, full, heavyweight instances are streamed from the server to the client data model, and the client determines the specific asset representation to download and render for each instance. Scaling can still be achieved with mesh LoDs and texture mips in this region, but there’s no compositing. Before SLIM, this is how every instance that was streamed into an experience was rendered.
HL Region (Heavyweight Instances, Lightweight Rendering)
The HL region sits between the HH and LL regions. In this region, the client has the heavyweight instances in the data model but can choose to render using either the full render pipeline or the SLIM pipeline. This region adapts to ensure a seamless transition between the HH and LL regions even if the user encounters network latency. The transition point between the HH and HL regions is dynamic, which allows Harmony to scale up or down immediately in response to a resource spike in either direction.
LL Region (Lightweight Instances, Lightweight Rendering)
In the LL region, the client only streams super-lightweight representations of instances necessary to define a coordinate frame for a SLIM model, along with the bare minimum metadata. Only lightweight composited SLIM models are rendered in this region, rather than every single instance and asset. The LL region requires far fewer triangles and draw calls, and reduces memory usage on the user’s device compared with streaming in every heavyweight instance and using the traditional render pipeline.
This region technique allows the client to render the entire visible world at all times without incurring the full computational cost of using every heavyweight instance and asset at once. Faraway objects are highly optimized, lightweight representations, which are replaced by their high-fidelity counterparts as the user gets closer. SLIM’s ability to create composites and multiple scaled LoD models gives Harmony more levers to pull to optimize asset quality for each user’s device.
When everything comes together, the player should feel completely immersed and not notice any of the transition points or varying levels of detail.
The Future: Where Does SLIM Go From Here?
SLIM is just step one of a multistep journey, and we’re excited to see how creators integrate the technology into their workflows. We’re exploring expanding SLIM in two main directions in the future.
Determining What Else Can Be SLIM-ed
We’re starting with the static models that creators designate in Studio, but in the future, SLIM will be able to optimize some of the most complex models on Roblox: platform avatars. Avatars, with all of their associated animations, layered clothing, and accessories, can be an unpredictable variable for creators. Allowing avatars to be SLIM-ed means the engine can effectively cap the resources an individual avatar model uses.
Eventually, we want to give creators the option to leverage SLIM for changes to dynamic models. Imagine a model where the server can actively make changes (e.g., a door opens or a part is destroyed), but with a few clever tricks, the client can reuse the same lightweight representation.
Optimizing the SLIM Pipeline
Now that we have an end-to-end pipeline that provides the engine with a new dimension of flexibility, we’re also focused on making the pipeline itself smarter, faster, and more efficient. This includes:
-
Texture re-atlassing: Intelligently packing multiple model textures into a single, optimized texture sheet.
-
Automatic segmentation: Automatically using semantic and spatial understanding of the world to identify the best SLIM-able models.
-
Lighter-weight representations: For dynamic objects that are less latency dependent, we’re exploring generating 2D representations that are virtually resource-free to render on the client.
-
Hierarchical SLIM: Nesting SLIM models one after another so that entire groups of instances can be simplified and the engine can dynamically select between levels of granularity—e.g., from a single tree to a forest to an entire landmass full of forests and other objects.
-
Up-rezzing: Today, we’re focused on optimizing down for performance, but very soon, this same system will allow us to increase the resolution of assets for future hardware while maintaining the creator’s original artistic intent. This new architecture means that as our engine gets better at simulating reality, we can continually upgrade the representations it uses.
SLIM, in combination with Harmony and the rest of our streaming and content delivery architecture, is a massive leap forward in our vision to support more expansive and detailed worlds for more players. The tight integration of our engine, content delivery, and cloud infrastructure coupled with the massive content base from millions of creators allow us to build deeply interconnected systems that improve the whole experience. As our engine gets better at simulating reality, we can continually upgrade the representations it uses. Today, we’re focused on optimizing down for performance, but very soon, this same system will allow us to up-rez assets for future hardware while maintaining creators’ original artistic intent.
We’re building a platform that not only respects that artistic intent but also can intelligently and automatically deliver their creations to any user on a wide variety of devices anywhere Roblox is available. We can’t wait to see what the community builds with it.
1As of Q3 2025.