(논문 요약) Guided Self-Evolving LLMs with Minimal Human Supervision

(논문 요약) Guided Self-Evolving LLMs with Minimal Human Supervision (Paper)

핵심 내용

R-FEW: guided Self-Play Challenger–Solver framework
- A challenger samples human-labeled examples to guide synthetic question generation.
- AsSolver trains on human and synthetic examples under an online, difficulty-based curriculum.
- $\mathcal{H}_t$: historical data (previous questions, responses, document corpus)