Skip to main content
Link
Menu
Expand
(external link)
Document
Search
Copy
Copied
Jaemin's Arxiv
Book
Code Review
Computer Vision
Economy
ETF 가격 지표앱 개발
Quantitative Finance with Python
Language Model
Agents
Alignment
Analysis
Application
Architecture
Code and Math
Compute Efficiency
Data
Distributed Training
Embedding
Foundation Model
Hallucination
RAG
Training
Reinforcement Learning
Robot
Thoughts
Vision Language Model
(논문 요약) Concise Reasoning via Reinforcement Learning
(논문 요약) Concise Reasoning via Reinforcement Learning
(Paper)
핵심 내용
틀리는게 더 김.
학습이 될수록 답변이 길어짐.
학습
stage 1: PPO 로 어려운 문제 학습
stage 2: 풀수 있는 문제를 섞어줌