(논문 요약) SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds (blog)
핵심 내용
- Initial learning: human demonstrations
- Subsequent training: SIMA 2’s own experience data can then be used to train the next, even more capable version of the agent.
- Gemini provided an initial task and an estimated reward for SIMA 2’s behavior.
- The agent uses these for further training in subsequent generations improving on previously failed tasks.