4B's Learning Record

4B's Learning Record https://infinity4b.github.io/learning-record Last 10 notes on 4B's Learning Record Quartz -- quartz.jzhao.xyz VLA Models https://infinity4b.github.io/learning-record/vla/ https://infinity4b.github.io/learning-record/vla/ Thu, 05 Feb 2026 01:28:49 GMT Diffusion https://infinity4b.github.io/learning-record/diffusion/ https://infinity4b.github.io/learning-record/diffusion/ Thu, 05 Feb 2026 01:28:49 GMT Hands-on https://infinity4b.github.io/learning-record/hands-on/ https://infinity4b.github.io/learning-record/hands-on/ Thu, 05 Feb 2026 01:28:49 GMT Welcome to 4B's Learning Blog https://infinity4b.github.io/learning-record/ https://infinity4b.github.io/learning-record/ 在这里记录一些学习相关的笔记，尽量把所学知识整合到一块，起到一个知识库的作用。更新记录： 2025.9.25：完成之前VLA阅读记录更新。 2025.9.22：完成设置、评论区构建，以及Github页面部署。 2025.9.19：初步完成环境搭建。. Thu, 05 Feb 2026 01:28:49 GMT Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning https://infinity4b.github.io/learning-record/vla/Cosmos-Policy https://infinity4b.github.io/learning-record/vla/Cosmos-Policy 论文信息论文地址：arxiv.org/pdf/2601.16163 项目地址：research.nvidia.com/labs/dir/cosmos-policy/ 代码地址：github.com/NVlabs/cosmos-policy OpenVLA 和 OpenVLA-OFT 一作的工作，在 ICLR 2026 匿名期就关注到了，没想到是大佬的工作。本来以为是一篇和其他利用 World Model 生成图片作为条件的工作，看完发现和目前其他工作的思路都不一样，是一个很新的思路。研究内容预训练视频生成模型从数百万条视频中学习时间因果关系、隐性物理规律和运动模式。这些时空先验对于机器人... Thu, 29 Jan 2026 00:00:00 GMT Ctrl-World: A Controllable Generative World Model for Robot Manipulation https://infinity4b.github.io/learning-record/vla/Ctrl-World https://infinity4b.github.io/learning-record/vla/Ctrl-World 论文信息论文地址：arxiv.org/pdf/2510.10125 项目地址：ctrl-world.github.io/ 代码地址：github.com/Robert-gyj/Ctrl-World Chelsea Finn组的工作。一作是UP-VLA的二作，也发过Video Prediction Policy。研究内容目前的VLA在开放世界表现比较弱，具体有两大挑战：策略评估难：评估一个通用策略往往需要大量的真实世界推演，需要仔细地在任务和环境中重复。策略提升难：一旦检测出一个弱点，现有的方法几乎没有其他办法在失败案例中进行学习强化，只能收集更多的专家数据。目前VLA缺失的是一种快... Wed, 29 Oct 2025 00:00:00 GMT Lerobot Pi-0.5 代码分析 https://infinity4b.github.io/learning-record/hands-on/Lerobot-Pi-0.5-Hands-on https://infinity4b.github.io/learning-record/hands-on/Lerobot-Pi-0.5-Hands-on 前言 INFO Pi-0.5目前官方开源的实现和原论文有所不同。在阅读本文之前，强烈建议先阅读Lerobot Pi-0 Hands-on。参考 github.com/Physical-Intelligence/openpi/issues/647 。官方给的回应是： Jup that’s correct — we currently only support action decoding in openpi, but as @Qu3tzal mentioned this should already give you a capable policy! 开源的 \pi_{0.5} 代码没有... Thu, 16 Oct 2025 00:00:00 GMT OpenVLA-OFT是如何将VLM改造成VLA的 https://infinity4b.github.io/learning-record/hands-on/OpenVLA-OFT-Hands-on https://infinity4b.github.io/learning-record/hands-on/OpenVLA-OFT-Hands-on 前言在OpenVLA Hands-on中已经介绍了OpenVLA的改造。OpenVLA-OFT我们也介绍过，是在OpenVLA后对于VLA范式的探索，具体内容就不展开叙述了。OpenVLA-OFT仓库是OpenVLA仓库的一个fork，在本文中研究OpenVLA-OFT对于VLM的改造。环境配置原始仓库地址： github.com/moojink/openvla-oft 相比于OpenVLA每个LIBERO任务都有一个模型，这次提供了一个all-in-one的模型 huggingface.co/moojink/openvla-7b-oft-finetuned-libero-spatial... Thu, 16 Oct 2025 00:00:00 GMT Lerobot Pi-0 代码分析 https://infinity4b.github.io/learning-record/hands-on/Lerobot-Pi-0-Hands-on https://infinity4b.github.io/learning-record/hands-on/Lerobot-Pi-0-Hands-on 前言 github.com/huggingface/lerobot 是HuggingFace提供的一个机器人代码库，包含了一些模拟环境和一些模型的代码。原始的 π0是Jax实现，最近推出了Pytorch版，而Lerobot把Pi-0的代码重构成了自己的版本，最近也支持了 π0.5的代码。具体而言，Lerobot对 \pi_0 的实现基本遵循了原封不动的代码，并用框架中的Policy包装起来。环境安装也十分简单，直接照着安装就可以。代码分析具体的代码在 github.com/huggingface/lerobot/tree/main/src/lerobot/policies/pi0 。其中... Sat, 11 Oct 2025 00:00:00 GMT GeoVLA: Empowering 3D Representations in Vision-Language-Action Models https://infinity4b.github.io/learning-record/vla/GeoVLA https://infinity4b.github.io/learning-record/vla/GeoVLA 论文信息论文地址：arxiv.org/pdf/2508.09071 项目地址：linsun449.github.io/GeoVLA 天津大学。一个加强VLA 3D表示的文章。研究内容目前的VLA模型依赖2D视觉输入，忽略了丰富的3D物理世界的几何先验。3D几何先验可以提供准确的深度线索，提高空间理解和视角鲁棒性。最近的LLaVA-3D和SpatialVLA将3D位置编码集成到VLM中。这种集成干扰了视觉编码器和LLM的对齐，因此需要大规模的3D具身指令微调数据集。除了这种数据密集的方法，另一类工作关注于将3D信息直接注入到动作专家中。例如PointVLA就是采用了一个两阶段的训练方案：首... Mon, 29 Sep 2025 00:00:00 GMT