Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

VectorSpaceLab/Infomatica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

24 Commits

Repository files navigation

πŸ”πŸ“š Informatica: Open and Scalable Foundations for Deep Research System

πŸ”† Overview

Informatica is a comprehensive collection of systematic research projects focused on deep research systems. Our mission is to provide open-source, scalable frameworks, datasets, data synthesis methods, models, and demonstrations for the deep research community.

We are committed to advancing the field of deep research through multi-dimensional investigations, including:

  • Scalable Data Synthesis: Advanced frameworks for generating high-quality, complexity-controllable research datasets
  • Deep Research Models: State-of-the-art models trained on structured research tasks
  • Open Datasets: Publicly available datasets designed for training and evaluating deep research capabilities
  • Research Tools: Complete toolchains for constructing, training, and deploying deep research systems
  • Interactive Demonstrations: User-friendly demos showcasing the capabilities of our research systems

Our team continuously explores various aspects of deep research problems, from fundamental question decomposition and reasoning to practical applications in knowledge discovery and information synthesis. Through Informatica, we aim to democratize access to deep research technologies and foster innovation in the broader research community.

πŸ“° News

[2025εΉ΄09月19ζ—₯]πŸŽ‰ Our paper InForage has been accepted by NeurIPS 2025 as a Spotlight paper! Codes is available here.

[2025εΉ΄09月17ζ—₯]πŸ”₯ We have released a large-scale dataset for deep research tasks, named InfoSeek.

[2025εΉ΄05月14ζ—₯]πŸ”₯ We have released our initial research on agentic search, named InForage.

πŸ—ΊοΈ Roadmap

Initial Research

  • Technical Report: InForage - Agentic Search Framework
  • NeurIPS 2025 Spotlight Paper Acceptance

Open and Scalable Data Synthesis

  • Open Dataset: InfoSeek
  • Data Construction Pipeline
  • Scalable Synthesis Framework
  • Quality Control Mechanisms

Model Development

  • SFT Training Code
  • RL Training Code
  • InfoSeeker Model Release
  • Model Evaluation Framework

Applications

  • Knowledge Discovery Tools
  • Information Synthesis Systems
  • Research Assistant Applications

Demo and Deployment

  • Interactive Demo Platform
  • API Integration
  • User Interface Development

🎯 Demo

We are building a demo page to showcase different agentic search methods and allow hands-on exploration of their capabilities. Each demo will be integrated into a standardized retrieval and web browser interface with comparable settings, enabling comprehensive and fair comparisons across various approaches. This systematic evaluation will help identify strengths and limitations of different methods and advance the state-of-the-art in agentic search.

🌟 Misc

πŸ“„ Citation

InfoSeek:

@misc{xia2025opendatasynthesisdeep,
 title={Open Data Synthesis For Deep Research}, 
 author={Ziyi Xia and Kun Luo and Hongjin Qian and Zheng Liu},
 year={2025},
 url={https://arxiv.org/abs/2509.00375}, 
}

InForage:

@misc{qian2025scentknowledgeoptimizingsearchenhanced,
 title={Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging}, 
 author={Hongjin Qian and Zheng Liu},
 year={2025},
 url={https://arxiv.org/abs/2505.09316}, 
}

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /