(논문 요약) QWENLONG-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning (Paper)

핵심 내용

  • Long-context Document VQA: input + long context -> answer
  • Progressive context scaling: initial input length 를 점점 늘려나감
  • Warm-Up Supervised Fine-Tuning: SFT 학습 후, RL 학습

실험

  • Long context benchmark 에서 실험