(논문 요약) Mixture-of-Agents Enhances Large Language Model Capabilities

(논문 요약) Mixture-of-Agents Enhances Large Language Model Capabilities (Paper)

핵심 내용

vision-language-action 모델
970k real-world robot demonstrations (Open X-Embodiment dataset)
Parameter Efficient FineTuning 으로 new task 학습
조금 움직이고 다시 조금 움직이며 동작 (데모 동영상은 배속)
archiectue
- 600M visual encoder (SigLIP and DinoV2 - channelwise-concat)
- 2-layer MLP projector
- 7B Llama2
robot action
- discretized action space (1st and 99th quantile for each variable in training data)
data curation
- Open X dataset: >70 individual robots, >2M robot trajectories
- Open X-Embodiment dataset: 다음 기준으로 Open X 에서 샘플
  - coherent input and output space across all training
  - balanced mix of embodiments, tasks, and scenes
결과