(모델 요약) Mixture-of-Transformers; A Sparse and Scalable Architecture for Multi-Modal Foundation Models | Jaemin’s Arxiv

(모델 요약) Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models(paper)

핵심 내용

Multimodal model 의 나중 layer 에서는 modality 별로 feature 가 나뉘어짐.
후기 layer 에서 modality 별로 feature extraction.

실험 결과

Chameleon 7B: reduce 55.8% of training FLOPs
Transfusion 7B: reduce ~2/3 of training FLOPs