(논문 요약) Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine (Paper)

핵심 내용

  • random few-shot: training data 에서 샘플
  • chain-of-thought: GPT4 에 query 로 chain-of-thought data 확보
  • kNN: query 의 embedding 과 비슷한 k(=5) 개 few-shot
  • ensemble with choice shuffle: 문항 순서 shuffle 하면서 output 생성 후 ensemble
  • 알고리즘

실험 결과

  • 데이터
    • MedQA: multiple choice questions in the style of the Medical Licensing Examination questions
    • MedMCQA: mock and historic exam questions in the style of two Indian medical school entrance exams—the AIIMS and NEET-PG
    • PubMedQA: a yes, no, or maybe answer to biomedical research questions when given context provided from PubMed abstracts