(논문 요약) Hogwild! Inference: Parallel LLM Generation via Concurrent Attention (Paper)
핵심 내용
- run LLM “workers” in parallel, allowing them to synchronize via a concurrently-updated attention cache and prompt these workers to decide how best to collaborate
- 개별 worker 가 symmetric 하게 비슷한 양의 tokens 생성하여 실험함.

- position embedding 은 필요에 맞게 더해서 계산.
