(논문 요약) THEAGENTCOMPANY: BENCHMARKING LLM AGENTS ON CONSEQUENTIAL REAL WORLD TASKS (Paper)

핵심 내용

  • real-world professional tasks in a small software company environment

  • 예시

closed, open models 성능