Architecture | Jaemin’s Arxiv

Skip to main content

Book
Code Review
Computer Vision
Economy
Language Model
Reinforcement Learning
Robot
Thoughts
Vision Language Model

Language Model
Architecture

Table of contents

(논문 요약) AlphaGo Moment for Model Architecture Discovery
(논문 요약) Branch-Train-MiX; Mixing Expert LLMs into a Mixture-of-Experts LLM
(논문 요약) DIFFERENTIAL TRANSFORMER
(논문 요약) Leave No Context Behind; Efficient Infinite Context Transformers with Infini-attention
(논문 요약) Native Sparse Attention; Hardware-Aligned and Natively Trainable Sparse Attention