(논문 요약) Scaling Instructable Agents Across Many Simulated Worlds

(논문 요약) Scaling Instructable Agents Across Many Simulated Worlds (Paper)

핵심 내용

state 의 graph 를 비교해서 평가
- placeholder equivalence: object 의 identity 고려 하지 않음
- non-placeholder (strict) equivalence: object 의 identity 고려
데이터 예시
모델들의 성능: QLoRA (rank of 16) 로 finetuning