Skip to main content
Link
Menu
Expand
(external link)
Document
Search
Copy
Copied
Jaemin's Arxiv
Book
Code Review
Computer Vision
Economy
Quantitative Finance with Python
Language Model
Agents
Alignment
Analysis
Application
Architecture
Code and Math
Compute Efficiency
Data
Distributed Training
Embedding
Foundation Model
Hallucination
RAG
Training
Reinforcement Learning
Thoughts
Vision Language Model
Language Model
Architecture
Table of contents
(논문 요약) Branch-Train-MiX; Mixing Expert LLMs into a Mixture-of-Experts LLM
(논문 요약) DIFFERENTIAL TRANSFORMER
(논문 요약) Leave No Context Behind; Efficient Infinite Context Transformers with Infini-attention
(논문 요약) Native Sparse Attention; Hardware-Aligned and Natively Trainable Sparse Attention