(논문 요약) OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents (Paper)

OBELICS dataset

  • open web-scale filtered dataset
    • image-text documents (141 million web pages extracted from Common Crawl, 353 million associated images, and 115 billion text tokens)
    • 여러 step 에 걸쳐 filter.

데이터로 학습한 모델 성능: FLAMINGO 와 견줄만함.