CLONE:
Holistic Closed-Loop Humanoid Whole-Body Teleoperation for Long-Horizon Tasks
Yixuan Li*,1,2, Yutang Lin*,3,4,5,6, Jieming Cui2,3,4,6, Tengyu Liu2,7,
Wei Liang†,1, Yixin Zhu†,3,4,6,8, Siyuan Huang†,2,7
(*: equal contribution, †: corresponding author) 1School of Computer Science and Technology, Beijing Institute of Technology 2Beijing Institute for General Artificial Intelligence (BIGAI) 3School of Psychological and Cognitive Sciences, Peking University 4Institute for Artificial Intelligence, Peking University 5Yuanpei College, Peking University 6Beijing Key Laboratory of Behavior and Mental Health, Peking University 7Joint Laboratory of Embodied AI and Humanoid Robots, BIGAI & UniTree Robotics 8Embodied Intelligence Lab, PKU-Wuhan Institute for Artificial Intelligence

CLONE is a whole-body teleoperation system that achieves comprehensive robot control using a VR headset. It enabeles previously unattainable comprehensive skills, such as picking up an object from the ground and placing it in a distant bin, facilitating the collection of long-horizon interaction data and establishes a foundation for more capable human-robot interaction in both research and practical applications.

Abstract.

CLONE employs an MoE-based policy with closed-loop error correction for holistic humanoid teleoperation, enabling capabilities previously unattainable with existing systems—such as whole-body coordination and long-horizon task execution.
Using only minimal input from a commercial MR headset, CLONE significantly improves tracking precision over existing open-loop approaches, opening new possibilities for practical humanoid deployment in unstructured environments.

Real-world Demo.

All videos show real-time teleoperation at 1x speed using a unified policy.

Whole-Body Tracking: Robot tracks various motions with stable, precise performance. Notably, it covers 15 meters while transitioning poses during walking, then returns to the start position with minimal drift.

Interactive Tasks: The robot demonstrates smooth, precise interaction capabilities.