We present a comprehensive suite of experiments demonstrating our Mixture-of-Experts policy on the Unitree Go2-Edu quadruped, utilizing only proprioception without any external sensor.
The demos below is divided into Indoor, Outdoor, Stability, Comprehensive Track and RoboGauge evaluation, highlight the policy's reliability across diverse terrains and external disturbances. We also provide an interactive web demo for you to experience our model easily.
We have fully open-sourced our go2_rl_gym (training) and RoboGauge (evaluation) . All the demos shown below can be reproduced using our provided model in unitree_cpp_deploy (deployment) .
In 2025, our model won first place in the inaugural Global Embodied AI Reinforcement Learning Locomotion Challenge
RoboGauge is a tool for evaluating motion control in reinforcement learning motion policies. Its goal is to measure 6 metrics of the policy in Sim2Sim (IsaacGym to MuJoCo) so that we can partially predict the performance of Sim2Real, thereby reducing the risk of damaging real hardware. Evaluation inclues 5 different terrains (flat, slope, stairs, obstacles, wave) and 10 difficulty levels (without flat).
Below, we show the model's tracking performance at 2 different levels with 1 m/s command velocity.
Our proposed framework integrates a Mixture-of-Experts architecture for terrain and command representation with the RoboGauge assessment suite to quantify sim-to-real transferability through sim-to-sim metrics. This closed-loop design enables reliable policy selection to facilitate robust deployment for agile locomotion across diverse challenging environments based solely on proprioception.
Reinforcement learning has shown strong promise for quadrupedal agile locomotion, even with proprioception-only sensing. In practice, however, sim-to-real gap and reward overfitting in complex terrains can produce policies that fail to transfer, while physical validation remains risky and inefficient. To address these challenges, we introduce a unified framework encompassing a Mixture-of-Experts (MoE) locomotion policy for robust multi-terrain representation with RoboGauge, a predictive assessment suite that quantifies sim-to-real transferability. The MoE policy employs a gated set of specialist experts to decompose latent terrain and command modeling, achieving superior deployment robustness and generalization via proprioception alone. RoboGauge further provides multi-dimensional proprioception-based metrics via sim-to-sim tests over terrains, difficulty levels, and domain randomizations, enabling reliable MoE policy selection without extensive physical trials. Experiments on a Unitree Go2 demonstrate robust locomotion on unseen challenging terrains, including snow, sand, stairs, slopes, and 30 cm obstacles. In dedicated high-speed tests, the robot reaches 4 m/s and exhibits an emergent narrow-width gait associated with improved stability at high velocity.
Thanks to Tencent AI Arena and Unitree Robotics for providing the Go2 quadruped for the initial experiments.
Thanks to Guangsheng Li for providing the training code for Baseline DreamWaQ.
@article{wu2026robogauge,
title={Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion},
author={Tianyang Wu and Hanwei Guo and Yuhang Wang and Junshu Yang and Xinyang Sui and Jiayi Xie and Xingyu Chen and Zeyang Liu and Xuguang Lan},
year={2026},
journal={arXiv preprint arXiv:2602.00678},
url={https://arxiv.org/abs/2602.00678},
}