(논문 요약) Text Embeddings by Weakly-Supervised Contrastive Pre-training (Paper)

핵심 내용

  • 다양한 소스에서 데이터 수집.
    • (post, comment) - Reddit
    • (question, upvoted answer) Stackexchange
    • (entity name + section title, passage) - English Wikipedia
    • (title, abstract) - Scientific papers
    • (title, passage) - Common Crawl, web pages News sources
  • InfoNCE loss 로 학습. $\tau$ 는 temperature hyperparameter.