(논문 요약) Phi-3 Technical Report:A Highly Capable Language Model Locally on Your Phone (Paper)
핵심 내용
- Phi-3 mini: 3.8B model trained on 3.3T tokens (Mixtral 8x7B 과 GPT-3.5 에 근접)
- Textbooks Are All You Need 학습 방법 대로 데이터 퀄리티 높이는데 초점.
- Pretrain
- Phase-1 학습: mostly of web sources (general knowledge and language understanding).
- Phase-2 학습: even more heavily filtered webdata (a subset used in Phase-1) with some synthetic data (logical reasoning and various niche skills).
- Llama2 보다 data, parameter efficient.
- Postrain
- supervised finetuning (SFT): highly curated high-quality data across diverse domains, e.g., math, coding, reasoning, conversation, model identity, and safety (english-only).
- direct preference optimization (DPO): format data, reasoning, and responsible AI (RAI) efforts.