(논문 요약) Fara-7B: An Efficient Agentic Model for Computer Use (Paper)
핵심 내용
- FaraGen: synthetic data generation system for multi-step web tasks
- Targeted URL Task Proposal (전체의 28%)
- starts from carefully chosen URLs drawn from categories (e.g. e-commerce, entertainment, restaurants, flights).
- An LLM iteratively refines a realistic, user-relevant task that matches what a human would try to do on that page.
- Examples
- booking tickets from a Fandango movie page
- purchasing a specific item with constraints on a retailer site
- Agent URL Exploration (전체의 67%)
- samples random URLs uniformly from the web and lets a multimodal agent explore the site autonomously
- The agent perceives screenshots (and may consult accessibility trees), proposes an initial task based on the page, attempts to execute it.
- Iteratively refines both the task and next actions as it discovers what is actually feasible on the site.
- The resulting tasks tend to be simpler on average than targeted ones, but this pathway is crucial for distributional breadth and for learning robust, general navigation behaviors.
- Exemplar Task Proposal (전체의 5%)
- Starting from a bank of seed tasks, the system decomposes each into a template capturing intent, entities, and arguments.
- It then systematically varies the entities and arguments (e.g. switching the retailer, item type, or constraints).
- This expands coverage within skill families, boosting diversity.
- Qwen2.5-VL-7B 로 SFT 하여 GPT-4o 보다 나은 computer use agents 성능 달성.