(논문 요약) Aya Vision; Expanding the worlds AI can see

(논문 요약) Aya Vision: Expanding the worlds AI can see (Blog)

핵심 내용

Architecture
- 8B model: SigLIP2-patch14-384 (vision encoder) + connector + C4AI Command R7B (LLM)
- 32B model: SigLIP2-patch14-384 (vision encoder) + connector + Aya Expanse (LLM)
Image
- input: 364x364
- patch size: 28x28
- output: 13x13
- 364x364 보다 큰 경우 aspect-ratio 고려하여 resize
학습 데이터 생성
- synthetic annotations
- translation and rephrasing (english -> others)