(논문 요약) Towards Robust Mathematical Reasoning (Paper)

핵심 내용

  • IMO-Bench
    • IMO-AnswerBench: 400 diverse Olympiad problems with verifiable short answers.
    • IMO-Proof Bench: 60 problems to evaluate proof-writing capabilities.
    • IMO-GradingBench: 1000 human gradings on proofs.
      • ProofAutoGrader: leverages Gemini 2.5 Pro, providing it with a prompt containing the problem statement, the candidate solution, a reference solution, and specific grading guidelines