Table of contents
- (논문 요약) AlphaGo Moment for Model Architecture Discovery
- (논문 요약) Branch-Train-MiX; Mixing Expert LLMs into a Mixture-of-Experts LLM
- (논문 요약) DIFFERENTIAL TRANSFORMER
- (논문 요약) Leave No Context Behind; Efficient Infinite Context Transformers with Infini-attention
- (논문 요약) Native Sparse Attention; Hardware-Aligned and Natively Trainable Sparse Attention