(논문 요약) FAST: Efficient Action Tokenization for Vision-Language-Action Models (Paper)
핵심 내용
문제 의식: robot action tokenization, based on simple per-dimension, per-timestep binning schemes, typically perform poorly when learning dexterous skills from high-frequency robot data.
Frequency-space Action Sequence Tokenization (FAST): compression-based tokenization scheme for robot actions, based on the Discrete Cosine Transform