⚡️🚀 Important: We have developed a new agent protocol called Tool-Environment-Agent (TEA), which allows you to build agents as flexibly as brewing tea. It’s still in the testing phase — if you’re interested, please check it out:
👉 https://github.com/DVampire/AgentWorld
📄 https://arxiv.org/abs/2506.12508
image.png DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solving. The framework leverages a top-level planning agent to coordinate multiple specialized lower-level agents, enabling automated task decomposition and efficient execution across diverse and complex domains.
🌐 Check out our interactive website: https://skyworkai.github.io/DeepResearchAgent/ - Explore the architecture, view experiments, and learn more about our research!
The system adopts a two-layer structure:
- Responsible for understanding, decomposing, and planning the overall workflow for a given task.
- Breaks down tasks into manageable sub-tasks and assigns them to appropriate lower-level agents.
- Dynamically coordinates the collaboration among agents to ensure smooth task completion.
-
Deep Analyzer
- Performs in-depth analysis of input information, extracting key insights and potential requirements.
- Supports analysis of various data types, including text and structured data.
-
Deep Researcher
- Conducts thorough research on specified topics or questions, retrieving and synthesizing high-quality information.
- Capable of generating research reports or knowledge summaries automatically.
-
Browser Use
- Automates browser operations, supporting web search, information extraction, and data collection tasks.
- Assists the Deep Researcher in acquiring up-to-date information from the internet.
-
MCP Manager Agent
- Manages and orchestrates Model Context Protocol (MCP) tools and services.
- Enables dynamic tool discovery, registration, and execution through MCP standards.
- Supports both local and remote MCP tool integration for enhanced agent capabilities.
-
General Tool Calling Agent
- Provides a general-purpose interface for invoking various tools and APIs.
- Supports function calling, allowing the agent to execute specific tasks or retrieve information from external services.
- Hierarchical agent collaboration for complex and dynamic task scenarios
- Extensible agent system, allowing easy integration of additional specialized agents
- Automated information analysis, research, and web interaction capabilities
- Secure Python code execution environment for tools, featuring configurable import controls, restricted built-ins, attribute access limitations, and resource limits. (See PythonInterpreterTool Sandboxing for details).
- Support for asynchronous operations, enabling efficient handling of multiple tasks and agents
- Support for local and remote model inference, including OpenAI, Anthropic, Google LLMs, and local Qwen models via vLLM
- Support for image and video generation tools based on the Imagen and Veo3 models, respectively
Image and Video Examples:
- 2025年08月04日: Add the support for loading mcp tools from the local json file.
- 2025年07月08日: Add the video generator tool, which can generate a video based on the input text and/or image. The video generator tool is based on the Veo3 model.
- 2025年07月08日: Add the image generator tool, which can generate images based on the input text. The image generator tool is based on the Imagen model.
- 2025年07月07日: Due to the limited flexibility of TOML configuration files, we have switched to using the config format supported by mmengine.
- 2025年06月20日: Add the support for the mcp (Both the local mcp and remote mcp).
- 2025年06月17日: Update technical report https://arxiv.org/pdf/2506.12508.
- 2025年06月01日: Update the browser-use to 0.1.48.
- 2025年05月30日: Convert the sub agent to a function call. Planning agent can now be gpt-4.1 or gemini-2.5-pro.
- 2025年05月27日: Support OpenAI, Anthropic, Google LLMs, and local Qwen models (via vLLM, see details in Usage).
- Asynchronous feature completed
- Image Generator Tool completed
- Video Generator Tool completed
- MCP in progress
- Load local MCP tools from JSON file completed
- AI4Research Agent to be developed
- Novel Writing Agent to be developed
# poetry install environment conda create -n dra python=3.11 conda activate dra make install # (Optional) You can also use requirements.txt conda create -n dra python=3.11 conda activate dra make install-requirements # playwright install if needed pip install playwright playwright install chromium --with-deps --no-shell
Please refer to the .env.template file and create a .env file in the root directory of the project. This file is used to configure API keys and other environment variables.
Refer to the following instructions to obtain the necessary google gemini-2.5-pro API key and set it in the .env file:
- https://aistudio.google.com/app/apikey
- https://cloud.google.com/docs/authentication/application-default-credentials?hl=zh-cn
brew install --cask google-cloud-sdk gcloud init gcloud auth application-default login
A simple example to demonstrate the usage of the DeepResearchAgent framework.
python main.py
A simple example to demonstrate the usage of a single agent, such as a general tool calling agent.
python examples/run_general.py
# Download GAIA mkdir data && cd data git clone https://huggingface.co/datasets/gaia-benchmark/GAIA # Run python examples/run_gaia.py
We evaluated our agent on both GAIA validation and test sets, achieving state-of-the-art performance. Our system demonstrates superior performance across all difficulty levels.
GAIA Test Results GAIA Validation Results
With the integration of the Computer Use and MCP Manager Agent, which now enables pixel-level control of the browser, our system demonstrates remarkable evolutionary capabilities. The agents can dynamically acquire and enhance their abilities through learning and adaptation, leading to significantly improved performance. The latest results show:
- Test Set: 83.39 (average), with 93.55 on Level 1, 83.02 on Level 2, and 65.31 on Level 3
- Validation Set: 82.4 (average), with 92.5 on Level 1, 83.7 on Level 2, and 57.7 on Level 3
Our framework now supports:
- qwen2.5-7b-instruct
- qwen2.5-14b-instruct
- qwen2.5-32b-instruct
Update your config:
model_id = "qwen2.5-7b-instruct"
If problems occur, reinstall:
pip install "browser-use[memory]"==0.1.48
pip install playwright
playwright install chromium --with-deps --no-shellFunction-calling is now supported natively by GPT-4.1 / Gemini 2.5 Pro. Claude-3.7-Sonnet is also recommended.
We provide huggingface as a shortcut to the local model. Also provide vllm as a way to start services so that parallel acceleration can be provided.
nohup bash -c 'CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server \ --model /input0/Qwen3-32B \ --served-model-name Qwen \ --host 0.0.0.0 \ --port 8000 \ --max-num-seqs 16 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --tensor_parallel_size 2' > vllm_qwen.log 2>&1 &
Update .env:
QWEN_API_BASE=http://localhost:8000/v1
QWEN_API_KEY="abc"python main.py
Example command:
Use deep_researcher_agent to search the latest papers on the topic of 'AI Agent' and then summarize it.
DeepResearchAgent is primarily inspired by the architecture of smolagents. The following improvements have been made:
- The codebase of smolagents has been modularized for better structure and organization.
- The original synchronous framework has been refactored into an asynchronous one.
- The multi-agent setup process has been optimized to make it more user-friendly and efficient.
We would like to express our gratitude to the following open source projects, which have greatly contributed to the development of this work:
- smolagents - A lightweight agent framework.
- OpenManus - An asynchronous agent framework.
- browser-use - An AI-powered browser automation tool.
- crawl4ai - A web crawling library for AI applications.
- markitdown - A tool for converting files to Markdown format.
We sincerely appreciate the efforts of all contributors and maintainers of these projects for their commitment to advancing AI technologies and making them available to the wider community.
Contributions and suggestions are welcome! Feel free to open issues or submit pull requests.
@misc{zhang2025agentorchestrahierarchicalmultiagentframework, title={AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving}, author={Wentao Zhang, Liang Zeng, Yuzhen Xiao, Yongcong Li, Ce Cui, Yilei Zhao, Rui Hu, Yang Liu, Yahui Zhou, Bo An}, year={2025}, eprint={2506.12508}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2506.12508}, }
如果你更习惯阅读中文说明文档,请查阅 README_CN.md。