(논문 요약) LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale (Paper)

핵심 내용

  • motivation: 8 bit 로 단순 양자화 했을때 성능 하락. outlier feature dimension 이 존재하기 때문.

  • method: outlier feature dimension (magnitude up to x20 larger) 을 분리.
    • 16-bit matrix multiplication for the outlier feature dimensions
    • 8-bit matrix multiplication for the other 99.9% of the dimensions
    • outlier feature dimension: all dimensions, element of which have a magnitude larger than the threshold 6.0
    • vectorwise quantization: $X_{f16}W_{f16}\in\mathbb{R}^{s\times o}, c_{X_{f16}}\in\mathbb{R}^s, c_{W_{f16}}\in\mathbb{R}^o$
  • 8-bit ([−127, 127]) quantization method
    • Absmax:
    • zeropoint:

실험 결과

  • 0.5 곱한건 heuristic 인 듯함.