(논문 요약) BitNet | Jaemin’s Arxiv

(논문 요약) BitNet

BitNet: Scaling 1-bit Transformers for Large Language Models The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits BitNet b1.58 2B4T Technical Report (repo)

핵심 내용

{-1,0,1} weight matrix 로 muliplication 가속화.
Weight quantization formula

실험 결과

BitNet 2B 모델

BitLinear 사용하여 pre-training, SFT, and DPO.