(논문 요약) Byte Latent Transformer: Patches Scale Better Than Tokens (Paper)

핵심 내용

  • Tokenizer-free architecture that learns from raw byte data

  • Encoder, Decoder

  • Entropy Patching: use entropy estimates to derive patch boundaries

    • train a small byte-level auto-regressive language model on the training data
    • global thershold 와 relative threshold 사용
    • entropy plot 예시

실험

  • 적은 param, dimension 으로 성능을 냄.