PhysPose: Refining 6D Object Poses with
Physical Constraints

Martin Malenický

Martin Cífka

Médéric Fourmy

Louis Montaut

Justin Carpentier

Josef Sivic

Vladimir Petrik


arXiv Supplementary material

Abstract

Accurate 6D object pose estimation from images is a key problem in object-centric scene understanding, enabling applications in robotics, augmented reality, and scene reconstruction. Despite recent advances, existing methods often produce physically inconsistent pose estimates, hindering their deployment in real-world scenarios. We introduce PhysPose, a novel approach that integrates physical reasoning into pose estimation through a postprocessing optimization enforcing non-penetration and gravitational constraints. By leveraging scene geometry, PhysPose refines pose estimates to ensure physical plausibility. Our approach achieves state-of-the-art accuracy on the YCB-Video dataset from the BOP benchmark and improves over the state-of-the-art pose estimation methods on the HOPE-Video dataset. Furthermore, we demonstrate its impact in robotics by significantly improving success rates in a challenging pick-and-place task, highlighting the importance of physical consistency in real-world applications.


The PhysPose pipeline processes an input image by first (I)~detecting objects and then (II)~independently estimating their poses. This initial estimation often results in a (III)~physically inconsistent scene, where collisions are highlighted in blue. Our (V)~physically consistent optimization leverages (IV)~estimated scene geometry and (A)~a~pose cost, (B)~a~collision cost, and (C)~a~gravity cost to generate a (VI)~physically consistent scene. (A)~Pose cost attracts the optimized object poses towards the initial measurements (indicated by dotted outlines). This cost accounts for the varying accuracy of depth estimation using non-isometric covariance, represented by the ellipsoid. (B)~Collision cost penalizes object penetration, visualized by red-shaded areas. (C)~Gravity cost minimizes the scene's potential energy. The gradients of these costs are represented by arrows and their formulas are provided in the supplementary material.


Supplementary video


Acknowledgement

This work was partly supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254), and by the European Union’s Horizon Europe projects AGIMUS (No. 101070165), euROBIN (No. 101070596), and ERC FRONTIER (No. 101097822).