(논문 요약) SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds (blog)

핵심 내용

  • Initial learning: human demonstrations
  • Subsequent training: SIMA 2’s own experience data can then be used to train the next, even more capable version of the agent.
    • Gemini provided an initial task and an estimated reward for SIMA 2’s behavior.
    • The agent uses these for further training in subsequent generations improving on previously failed tasks.