(논문 요약) Guided Self-Evolving LLMs with Minimal Human Supervision (Paper)
핵심 내용
- R-FEW: guided Self-Play Challenger–Solver framework
- A challenger samples human-labeled examples to guide synthetic question generation.
- AsSolver trains on human and synthetic examples under an online, difficulty-based curriculum.
- $\mathcal{H}_t$: historical data (previous questions, responses, document corpus)

- Related work
