Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

返回详情 VLA / Vision-Language-Action 每日论文卡