(논문 요약) Training Large Language Models to Reason in a Continuous Latent Space (Paper)

핵심 내용

  • Input embedding 을 생성 (continuous space)

  • 학습시 step 한 단계를 $c$ 단계의 continuous token(s) 으로 대체

실험 결과

  • iCoT: internalized CoT
  • Pause token: special tokens inserted between the question and answer
  • w/o curriculum: directly train questions and answers in the final stage
  • w/o thought: no continuous latent thought generated
  • pause as thought: tokens to replace the continuous thoughts