(블로그 요약) Universe: Scale Real-World Verifiable Environments to Millions (paper)
핵심 내용
- Builder agent
- code base 에서 bug fix.
- evaluation.sh 생성.
- Hacking Detector
- evaluation.sh 가 제대로 된 것인지 검토.
- Verifier agent
- buggy branch 에서는 테스트 실패, fixed branch 에서는 테스트 통과 하는지 체크.
- 데이터셋
- train: ~1M high-quality PRs (filtered from 33.3 M pull requests in 2021–2025 of GitHub)
- high quality: PRs: linked with issue(s) without excessive file changes
- benchmark: 320 pull requests randomly sampled from GitHub
- train: ~1M high-quality PRs (filtered from 33.3 M pull requests in 2021–2025 of GitHub)
- 모델
- Qwen-Next-80B-A3B
- Qwen3-Max-Thinking