Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

parastoopil/agentic-research-ops

Repository files navigation

Agentic Research Ops

Production-style multi-agent research automation over private documents.

This project has agent orchestration, tool boundaries, retrieval, critique, report generation, tracing, evaluation, CI, and reproducible artifacts.

What It is exactly

agentic-research-ops turns a broad research request into a cited technical report using a graph of specialized agents:

Agent Responsibility Output
Planner Decomposes the user request into focused research tasks Typed task plan
Retriever Searches a private markdown corpus with BM25 Ranked evidence cards
Critic Checks source coverage, citation density, and missing support Pass/revise verdict
Writer Produces a report using only approved evidence IDs Markdown report
Evaluator Scores grounding, evidence count, trace metadata, and latency JSON metrics

The default runtime is dependency-light and offline-friendly, so recruiters can clone and run it without API keys. The architecture is intentionally compatible with LangGraph-style orchestration, hosted LLMs, local transformer models, and vector databases.

  • Source-grounded report writing with stable citation IDs.
  • Critic gate before final answer delivery.
  • Reproducible benchmark over a fixed private-doc corpus.
  • JSON traces for every agent node.
  • CI test suite covering retrieval, orchestration, and evaluation.

Repository Structure

agentic-research-ops/
├── examples/corpus/ # Private technical knowledge base
├── src/agentic_research_ops/ # Agent graph, retrieval, CLI, evaluation
├── tests/ # Unit and end-to-end tests
├── docs/architecture.md # System design notes
├── runs/ # Generated benchmark and report artifacts
├── scripts/run_demo.py # One-command demo entry point
├── pyproject.toml
└── README.md

Quickstart

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
pytest -q

Run one report:

agentic-research run \
 --query "Design a reliable multi-agent research assistant for private technical documents." \
 --corpus examples/corpus \
 --output-dir runs \
 --stem demo_report

Run the benchmark:

agentic-research benchmark \
 --corpus examples/corpus \
 --output-dir runs

Record CUDA hardware metadata and a lightweight FP16 matmul probe:

agentic-research gpu-probe --output-dir runs

If you do not install the package, use:

PYTHONPATH=src python -m agentic_research_ops benchmark --corpus examples/corpus --output-dir runs

Latest Local Results

Generated on this server with Python 3.13.5 and PyTorch 2.12.0+cu130. The unsandboxed runtime sees 8 NVIDIA RTX 6000 Ada Generation GPUs.

{
 "queries": 3,
 "device": "cuda:0",
 "avg_evidence_items": 12.0,
 "avg_grounded_citation_rate": 1.0,
 "avg_critic_coverage_score": 1.0,
 "total_latency_ms": 11.595
}

CUDA probe excerpt:

{
 "cuda_available": true,
 "device_count": 8,
 "device_name": "NVIDIA RTX 6000 Ada Generation",
 "benchmark": "fp16_matmul",
 "estimated_tflops": 196.466,
 "max_memory_allocated_gb": 0.102
}

Key artifacts:

  • runs/demo_report.md: generated cited report.
  • runs/demo_report.json: full graph state, metrics, evidence, and traces.
  • runs/benchmark_summary.json: aggregate benchmark results.
  • runs/benchmark_*.json: per-query benchmark traces.
  • runs/gpu_probe.json: CUDA device metadata and lightweight matmul probe.

Example Output

The demo query produces a report with sections for architecture, grounding, reliability controls, deployment, and a source map. Example metric excerpt:

{
 "report_words": 721,
 "evidence_items": 12,
 "unique_sources": 5,
 "grounded_citation_rate": 1.0,
 "critic_coverage_score": 1.0,
 "critic_verdict": "pass",
 "device": "cuda:0"
}

Extension Ideas

  • Replace BM25 with hybrid retrieval using embeddings and a vector database.
  • Swap the deterministic writer for an LLM-backed writer while preserving the critic and evaluator interfaces.
  • Add a LangGraph adapter around the existing node contracts.
  • Add human approval nodes for high-impact reports or automation actions.
  • Add OpenTelemetry traces and a small FastAPI service layer.

Tests

pytest -q

Current status:

5 passed

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

AltStyle によって変換されたページ (->オリジナル) /