(논문 요약) LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover (Paper)

핵심 내용

  • data 수집 pipeline
  • 타 data 와 비교
  • train
    • InternLM-math-plus-7B 부터 학습 시작
    • GPT-f format prompting
    • batch size: 512
    • learning rate: 1e-5 (warm-up with 3% steps and then cosine annealing)
    • 2 epochs of SFT
    • ~6 hours on 32 A100 GPUs
    • LEAN4 에서 동일한 intermediate stage 를 통합함 (free variable 을 internal storage order 대로 부여했다고함)
  • proof search
    • best-first search
    • validate intermediate proof steps within a formal proof until the proof is either finalized or resources are exhausted
    • max 100 expansions, 32 candidate generations per expansion

실험 결과