Learning to Manipulate Tools by Aligning Simulation to Video Demonstration
K. Zorina
J. Carpentier
J. Sivic
V. Petrík
Learning tool manipulation from unconstrained instructional videos here shown on learning the spade task policy for the Panda robot. The input video demonstration (A) is first processed to extract the 3D trajectory of the human and the manipulated tool (B). The extracted tool trajectory from the video is used to learn the robot policy in a simulated environment (C). The learned policy is then applied to the real robot (D).


A seamless integration of robots into human environments requires robots to learn how to use existing human tools. Current approaches for learning tool manipulation skills mostly rely on expert demonstrations provided in the target robot environment, for example, by manually guiding the robot manipulator or by teleoperation. In this work, we introduce an automated approach that replaces an expert demonstration with a Youtube video for learning a tool manipulation strategy. The main contributions are twofold. First, we design an alignment procedure that aligns the simulated environment with the real-world scene observed in the video. This is formulated as an optimization problem that finds a spatial alignment of the tool trajectory to maximize the sparse goal reward given by the environment. Second, we describe an imitation learning approach that focuses on the trajectory of the tool rather than the motion of the human. For this we combine reinforcement learning with an optimization procedure to find a control policy and the placement of the robot based on the tool motion in the aligned environment.

Supplementary video

Paper and Supplementary Material

K. Zorina, J. Carpentier, J. Sivic, V. Petrík
Learning to Manipulate Tools by Aligning Simulation to Video Demonstration
Accepted to IEEE Robotics and Automation Letters (RA-L), 2021.
(hosted on ArXiv)

    title={Learning to Manipulate Tools by Aligning Simulation to Video Demonstration},
    author={Kateryna Zorina, Justin Carpentier, Josef Sivic and Vladim{\'}ir Petr{\'}ik},
    booktitle={Accepted to IEEE Robotics and Automation Letters (RA-L)},


This work was funded by the European Regional Development Fund under the project IMPACT (reg. No. CZ.02.1.01/0.0/0.0/15\_003/0000468), the Grant Agency of the Czech Technical University in Prague, grant No. SGS21/178/OHK3/3T/17, the French government under management of Agence Nationale de la Recherche as part of the "Investissements d'avenir" program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute) and the Louis Vuitton ENS Chair on Artificial Intelligence.