(논문 요약) K2-Think; A Parameter-Efficient Reasoning System | Jaemin’s Arxiv

(논문 요약) K2-Think: A Parameter-Efficient Reasoning System (paper)

핵심 내용

학습
- Qwen2.5 에서 시작
- long chain-of-thought SFT
- RL with verifiable rewards
- Plan-Before-You-Think prompt restructuring
- Best-of-N=3 selection
- speculative decoding
성능
- Best-of-3 이 성능을 많이 올림.

Ablation
- SFT 하고 RL 하는게 성능 더 좋음.
- generation length 를 낮추고 학습하면 회복이 안됨.