(논문 요약) Byte Latent Transformer: Patches Scale Better Than Tokens (Paper)
핵심 내용
Tokenizer-free architecture that learns from raw byte data

Encoder, Decoder

Entropy Patching: use entropy estimates to derive patch boundaries
- train a small byte-level auto-regressive language model on the training data
- global thershold 와 relative threshold 사용

- entropy plot 예시

실험
- 적은 param, dimension 으로 성능을 냄.
