(논문 요약) AlphaGo Moment for Model Architecture Discovery (Paper)
핵심 내용
- 컨셉: Self-evolving agent
- 20M Param 모델을 만들고 1B tokens 으로 학습하면서 architecture 를 개선해나감
- fitness: train loss, test benchmark 성능, LLM 의 평가값
- 사용한 LLM 모델
- Researcher: O3 + GPT-4.1
- Engineer: GPT-4.1
- Analyst: O3
- Data
- Train: FinewWeb-edu sample-10BT + sample-100BT datasets (context length of 2048 tokens)
- Test: ARC-Challenge, ARC-Easy, BoolQ, HellaSwag, PIQA, Social IQA, WinoGrande