CLONE is a whole-body teleoperation system that achieves comprehensive robot control using a VR headset. It enabeles previously unattainable comprehensive skills, such as picking up an object from the ground and placing it in a distant bin, facilitating the collection of long-horizon interaction data and establishes a foundation for more capable human-robot interaction in both research and practical applications.
CLONE employs an MoE-based policy with closed-loop error correction for holistic humanoid teleoperation, enabling capabilities previously unattainable with existing systems—such as whole-body coordination and long-horizon task execution.
Using only minimal input from a commercial MR headset, CLONE significantly improves tracking precision over existing open-loop approaches, opening new possibilities for practical humanoid deployment in unstructured environments.
All videos show real-time teleoperation at 1x speed using a unified policy.
Whole-Body Tracking: Robot tracks various motions with stable, precise performance. Notably, it covers 15 meters while transitioning poses during walking, then returns to the start position with minimal drift.
Long-Horizon Motion Tracking
Long-horizon Tracking in Outdoor Environments
Long-horizon Tracking in Outdoor Environments
Circular Walking
Robust and Accurate Global Position Tracking
Outdoor tracking results show the closed-loop error correction adapts well to dynamic disturbances and changing conditions.
Robust and Accurate Global Position Tracking
Turning
Side-Stepping
Squatting and Walking
Upper-Body Motion Tracking
Interactive Tasks: The robot demonstrates smooth, precise interaction capabilities.
Playing Table Tennis
Teleoperating the humanoid to play table tennis using forehand strokes driven by waist movement.
Playing Table Tennis
Teleoperating the humanoid to play table tennis using backhand strokes.
Boxing
Boxing
Tabletop Object Manipulation
Tabletop Object Handover
Long-Horizon Interactive Tasks: The humanoid performs precise, long-horizon interactions with closed-loop error correction.
Single-Handed Object Retrieval from the Ground
Dual-Handed Object Retrieval from the Ground
Dual-Handed Pick-and-Place
Dual-Handed General Pick-and-Place
Squatting and Wiping
Standing Wiping

Framework and structure of CLONE. CLONE curates and augments the retargeted AMASS dataset through motion editing to introduce diverse humanoid motions and detailed hand movements. We employ an MoE network as the student policy, distilling it from a teacher policy trained with privileged information. For the real-world deployment, we integrate LiDAR odometry into the system to obtain real-time humanoid states, enabling closed-loop error correction.
We adopt an MoE framework that enables a unified policy to learn diverse motion skills while synthesizing lower-body motions coordinated with upper-body actions.
We incorporate LiDAR odometry and Apple Vision Pro tracking to provide closed-loop global pose feedback, enabling real-time drift correction during teleoperation.
We curate a large-scale dataset CLONED by enhancing a subset of the AMASS dataset with sampled hand orientations and additional motion-captured dataset, ensuring robust generalization to dexterous and dynamic whole-body motions.
We thank Le Ma (BIGAI) and Peiyuan Zhi (BIGAI) for their valuable assistance as teleoperators and for their advice on real-world deployment. We are also grateful to Weiqi Huang (BIT) for support with motion capture and to Jiaxin Li (BIT) for insightful suggestions on LiDAR odometry.
We gratefully acknowledge Unitree Robotics for their support with hardware.