(논문 요약) π0: A Vision-Language-Action Flow Model for General Robot Control (Paper)
핵심 내용
- Architecture: pretrained PaliGemma vision (3B) + action expert (300M)
- $q_t$: vector of joint angles
- 학습: flowing matching
- $v_{\theta}$: network
- $A^{\tau}_t$: noisy action
- $u(A^{\tau}_t|A_t)$: denoising vector field