(논문 요약) Guided Self-Evolving LLMs with Minimal Human Supervision (Paper)

핵심 내용

  • R-FEW: guided Self-Play Challenger–Solver framework
    • A challenger samples human-labeled examples to guide synthetic question generation.
    • AsSolver trains on human and synthetic examples under an online, difficulty-based curriculum.
    • $\mathcal{H}_t$: historical data (previous questions, responses, document corpus)

  • Related work