(논문 요약) OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents (Paper)
OBELICS dataset
- open web-scale filtered dataset
- image-text documents (141 million web pages extracted from Common Crawl, 353 million associated images, and 115 billion text tokens)

- 여러 step 에 걸쳐 filter.

- image-text documents (141 million web pages extracted from Common Crawl, 353 million associated images, and 115 billion text tokens)
데이터로 학습한 모델 성능: FLAMINGO 와 견줄만함.
