(논문 요약) FAST: Efficient Action Tokenization for Vision-Language-Action Models (Paper)

핵심 내용

  • 문제 의식: robot action tokenization, based on simple per-dimension, per-timestep binning schemes, typically perform poorly when learning dexterous skills from high-frequency robot data.

  • Frequency-space Action Sequence Tokenization (FAST): compression-based tokenization scheme for robot actions, based on the Discrete Cosine Transform

실험 결과