(논문 요약) Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery (paper)
핵심 내용
- high precision teacher 의 softmax output 을 quantized student 가 학습.
- student 는 teacher 를 quantize 하여 initialize.
- Loss: KL($p_{teacher}$ || $p_{student}$)
(논문 요약) Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery (paper)