(논문 요약) π0: A Vision-Language-Action Flow Model for General Robot Control (Paper)

핵심 내용

  • Architecture: pretrained PaliGemma vision (3B) + action expert (300M)
    • $q_t$: vector of joint angles

  • 학습: flowing matching

  • $v_{\theta}$: network
  • $A^{\tau}_t$: noisy action
  • $u(A^{\tau}_t|A_t)$: denoising vector field

실험 결과