Name	Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows	.github/workflows
docs	docs
examples/corpus	examples/corpus
runs	runs
scripts	scripts
src/agentic_research_ops	src/agentic_research_ops
tests	tests
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
pyproject.toml	pyproject.toml

Agentic Research Ops

Production-style multi-agent research automation over private documents.

This project has agent orchestration, tool boundaries, retrieval, critique, report generation, tracing, evaluation, CI, and reproducible artifacts.

What It is exactly

agentic-research-ops turns a broad research request into a cited technical report using a graph of specialized agents:

Agent	Responsibility	Output
Planner	Decomposes the user request into focused research tasks	Typed task plan
Retriever	Searches a private markdown corpus with BM25	Ranked evidence cards
Critic	Checks source coverage, citation density, and missing support	Pass/revise verdict
Writer	Produces a report using only approved evidence IDs	Markdown report
Evaluator	Scores grounding, evidence count, trace metadata, and latency	JSON metrics

The default runtime is dependency-light and offline-friendly, so recruiters can clone and run it without API keys. The architecture is intentionally compatible with LangGraph-style orchestration, hosted LLMs, local transformer models, and vector databases.

Source-grounded report writing with stable citation IDs.
Critic gate before final answer delivery.
Reproducible benchmark over a fixed private-doc corpus.
JSON traces for every agent node.
CI test suite covering retrieval, orchestration, and evaluation.

Repository Structure

agentic-research-ops/
├── examples/corpus/ # Private technical knowledge base
├── src/agentic_research_ops/ # Agent graph, retrieval, CLI, evaluation
├── tests/ # Unit and end-to-end tests
├── docs/architecture.md # System design notes
├── runs/ # Generated benchmark and report artifacts
├── scripts/run_demo.py # One-command demo entry point
├── pyproject.toml
└── README.md

Quickstart

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
pytest -q

Run one report:

agentic-research run \
 --query "Design a reliable multi-agent research assistant for private technical documents." \
 --corpus examples/corpus \
 --output-dir runs \
 --stem demo_report

Run the benchmark:

agentic-research benchmark \
 --corpus examples/corpus \
 --output-dir runs

Record CUDA hardware metadata and a lightweight FP16 matmul probe:

agentic-research gpu-probe --output-dir runs

If you do not install the package, use:

PYTHONPATH=src python -m agentic_research_ops benchmark --corpus examples/corpus --output-dir runs

Latest Local Results

Generated on this server with Python 3.13.5 and PyTorch 2.12.0+cu130. The unsandboxed runtime sees 8 NVIDIA RTX 6000 Ada Generation GPUs.

{
 "queries": 3,
 "device": "cuda:0",
 "avg_evidence_items": 12.0,
 "avg_grounded_citation_rate": 1.0,
 "avg_critic_coverage_score": 1.0,
 "total_latency_ms": 11.595
}

CUDA probe excerpt:

{
 "cuda_available": true,
 "device_count": 8,
 "device_name": "NVIDIA RTX 6000 Ada Generation",
 "benchmark": "fp16_matmul",
 "estimated_tflops": 196.466,
 "max_memory_allocated_gb": 0.102
}

Key artifacts:

runs/demo_report.md: generated cited report.
runs/demo_report.json: full graph state, metrics, evidence, and traces.
runs/benchmark_summary.json: aggregate benchmark results.
runs/benchmark_*.json: per-query benchmark traces.
runs/gpu_probe.json: CUDA device metadata and lightweight matmul probe.

Example Output

The demo query produces a report with sections for architecture, grounding, reliability controls, deployment, and a source map. Example metric excerpt:

{
 "report_words": 721,
 "evidence_items": 12,
 "unique_sources": 5,
 "grounded_citation_rate": 1.0,
 "critic_coverage_score": 1.0,
 "critic_verdict": "pass",
 "device": "cuda:0"
}

Extension Ideas

Replace BM25 with hybrid retrieval using embeddings and a vector database.
Swap the deterministic writer for an LLM-backed writer while preserving the critic and evaluator interfaces.
Add a LangGraph adapter around the existing node contracts.
Add human approval nodes for high-impact reports or automation actions.
Add OpenTelemetry traces and a small FastAPI service layer.

Tests

pytest -q

Current status:

5 passed

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parastoopil/agentic-research-ops

Folders and files

Latest commit

History

Repository files navigation

Agentic Research Ops

What It is exactly

Repository Structure

Quickstart

Latest Local Results

Example Output

Extension Ideas

Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Research Ops

What It is exactly

Repository Structure

Quickstart

Latest Local Results

Example Output

Extension Ideas

Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages