(논문 요약) Towards Robust Mathematical Reasoning (Paper)
핵심 내용
- IMO-Bench
- IMO-AnswerBench: 400 diverse Olympiad problems with verifiable short answers.
- IMO-Proof Bench: 60 problems to evaluate proof-writing capabilities.
- IMO-GradingBench: 1000 human gradings on proofs.
- ProofAutoGrader: leverages Gemini 2.5 Pro, providing it with a prompt containing the problem statement, the candidate solution, a reference solution, and specific grading guidelines