Training | Jaemin’s Arxiv

Skip to main content

Book
Code Review
Computer Vision
Economy
Language Model
Reinforcement Learning
Robot
Thoughts
Vision Language Model

Language Model
Training

Table of contents

(개념 요약) Tokenization
(논문 요약) Chat Vector; A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages
(논문 요약) Compute Optimal Scaling of Skills; Knowledge vs Reasoning
(논문 요약) DOES RLHF SCALE? EXPLORING THE IMPACTS FROM DATA, MODEL, AND METHOD
(논문 요약) Direct Preference Optimization; Your Language Model is Secretly a Reward Model
(논문 요약) Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
(논문 요약) Generative Representational Instruction Tuning
(논문 요약) Grounded Language-Image Pre-training
(논문 요약) Hunyuan-Large; An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
(논문 요약) LLAMA-OMNI; SEAMLESS SPEECH INTERACTION WITH LARGE LANGUAGE MODELS
(논문 요약) MAGPIE; Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
(논문 요약) META-REWARDING LANGUAGE MODELS; Self-Improving Alignment with LLM-as-a-Meta-Judge
(논문 요약) Make Your LLM Fully Utilize the Context
(논문 요약) One Initialization to Rule them All; Fine-tuning via Explained Variance Adaptation
(논문 요약) RouteLLM; Learning to Route LLMs with Preference Data
(논문 요약) SFT Memorizes, RL Generalizes; A Comparative Study of Foundation Model Post-training
(논문 요약) Scaling Exponents Across Parameterizations and Optimizers
(논문 요약) Scaling Laws for Data Filtering—Data Curation cannot be Compute Agnostic
(논문 요약) Scaling Laws for Precision
(논문 요약) Self-Taught Evaluators
(논문 요약) SimPO; Simple Preference Optimization with a Reference-Free Reward
(논문 요약) Simple and Scalable Strategies to Continually Pre-train Large Language Models
(논문 요약) Smaller, Weaker, Yet Better; Training LLM Reasoners via Compute-Optimal Sampling
(논문 요약) THINKING LLMS; GENERAL INSTRUCTION FOLLOWING WITH THOUGHT GENERATION
(논문 요약) Textbooks Are All You Need
(논문 요약) Training Language Models to Self-Correct via Reinforcement Learning
Miscellaneous Finetuning Methods in Large Language Models