(논문 요약) QWENLONG-L1; Towards Long-Context Large Reasoning Models with Reinforcement Learning | Jaemin’s Arxiv

(논문 요약) QWENLONG-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning (Paper)

핵심 내용

Long-context Document VQA: input + long context -> answer
Progressive context scaling: initial input length 를 점점 늘려나감
Warm-Up Supervised Fine-Tuning: SFT 학습 후, RL 학습

실험

Long context benchmark 에서 실험