Upgrading Our Voice Safety Classifier With 22 New Languages and Sharper Detection Capabilities

New Languages, 2 New Violation Categories, and 14% Greater Recall

By Naren Koneru, Vice President, Engineering, Vesa Silvola, and Janne Pylkkonen

Published Jun 17, 2026

Roblox facilitates millions of minutes of voice data daily across 30 languages, representing a massive challenge in real-time safety at scale. Over the past two years, our internal systems have evolved significantly—growing from 94.6M to 320M parameters and expanding from five to eight policy violation categories—to now handle 10,000 requests per second at peak.

We open-sourced our underlying voice safety classifier model in 2024 to help advance voice safety across the industry, and today we’re releasing v3 of the model, which gives users support for 22 new languages and two additional policy violation categories with 14% greater recall and 5% greater precision, compared with the previous version.

From V1 to V3 and Beyond

When we set out to build a system for real-time voice safety, we focused on English first. We built an automated machine-labeling pipeline to generate a high volume of training data. In 2024, v1 of the open-source model used 2,400 hours of machine-labeled English data for model training. Following the initial launch and notification implementation, U.S. abuse report rates dropped over 50% per hour of speech.

In 2025, we added more languages and tuned the model further and released v2 of the model. For training the latest v3 model in 2026, we used 250,000 hours of machine-labeled multilingual data and 29,000 hours of human-labeled multilingual data. Every model was evaluated using human-labeled datasets.

V3 of the open-source model achieves 61% recall weighted by Roblox voice chat language distribution at 1% false positive rate. Using just the languages supported by v2 of the model for comparison, v3 shows 14% relative improvement in recall weighted by language prevalence.

Voice safety is too important to solve in isolation. We open-sourced our voice safety classifier and joined ROOST as a founding partner because we believe that sharing advances in safety technology strengthens the whole industry. The model has been downloaded more than 70,000 times on Hugging Face since the first release, and each update has been shaped by what we’ve learned running our internal models at scale across our community. We continue to iterate on our safety systems, and we look forward to sharing more updates in the future.

Acknowledgments: We’d like to thank Thomas Bui, Meghatrisa Chatterjee, Bridget Daly, Jason Golubock, Hannes Heikinheimo, Marek Kapolka, Cheryl Kwan, Markus Lang, Aashna Sharma, Hao-En Sung, Tingting Tang, and Alex Trimm for their work on this project.

Latest

More results

Upgrading Our Voice Safety Classifier With 22 New Languages and Sharper Detection Capabilities

From V1 to V3 and Beyond

Upgrading Our Voice Safety Classifier With 22 New Languages and Sharper Detection Capabilities

From V1 to V3 and Beyond

News

Pioneering AI Founders Join to Accelerate Roblox Reality Vision

Engineering

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

Engineering

Introducing the Roblox Hybrid Architecture: Democratizing Photorealistic, Multiplayer Gaming