(논문 요약) π0; A Vision-Language-Action Flow Model for General Robot Control | Jaemin’s Arxiv

(논문 요약) π0: A Vision-Language-Action Flow Model for General Robot Control (Paper)

핵심 내용

Architecture: pretrained PaliGemma vision (3B) + action expert (300M)
- $q_t$: vector of joint angles

학습: flowing matching

$v_{\theta}$: network
$A^{\tau}_t$: noisy action
$u(A^{\tau}_t|A_t)$: denoising vector field

실험 결과