Evolutionary self-improvement for Hermes Agent.
Hermes Agent Self-Evolution uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve and optimize Hermes Agent's skills, tool descriptions, system prompts, and code — producing measurably better versions through reflective evolutionary search.
No GPU training required. Everything operates via API calls — mutating text, evaluating results, and selecting the best variants. ~2ドル-10 per optimization run.
Read current skill/prompt/tool ──► Generate eval dataset
│
▼
GEPA Optimizer ◄── Execution traces
│ ▲さんかく
▼ │
Candidate variants ──► Evaluate
│
Constraint gates (tests, size limits, benchmarks)
│
▼
Best variant ──► PR against hermes-agent
GEPA reads execution traces to understand why things fail (not just that they failed), then proposes targeted improvements. ICLR 2026 Oral, MIT licensed.
# Install git clone https://github.com/NousResearch/hermes-agent-self-evolution.git cd hermes-agent-self-evolution pip install -e ".[dev]" # Point at your hermes-agent repo export HERMES_AGENT_REPO=~/.hermes/hermes-agent # Evolve a skill (synthetic eval data) python -m evolution.skills.evolve_skill \ --skill github-code-review \ --iterations 10 \ --eval-source synthetic # Or use real session history from Claude Code, Copilot, and Hermes python -m evolution.skills.evolve_skill \ --skill github-code-review \ --iterations 10 \ --eval-source sessiondb
| Phase | Target | Engine | Status |
|---|---|---|---|
| Phase 1 | Skill files (SKILL.md) | DSPy + GEPA | ✅ Implemented |
| Phase 2 | Tool descriptions | DSPy + GEPA | 🔲 Planned |
| Phase 3 | System prompt sections | DSPy + GEPA | 🔲 Planned |
| Phase 4 | Tool implementation code | Darwinian Evolver | 🔲 Planned |
| Phase 5 | Continuous improvement loop | Automated pipeline | 🔲 Planned |
| Engine | What It Does | License |
|---|---|---|
| DSPy + GEPA | Reflective prompt evolution — reads execution traces, proposes targeted mutations | MIT |
| Darwinian Evolver | Code evolution with Git-based organisms | AGPL v3 (external CLI only) |
Every evolved variant must pass:
- Full test suite —
pytest tests/ -qmust pass 100% - Size limits — Skills ≤15KB, tool descriptions ≤500 chars
- Caching compatibility — No mid-conversation changes
- Semantic preservation — Must not drift from original purpose
- PR review — All changes go through human review, never direct commit
See PLAN.md for the complete architecture, evaluation data strategy, constraints, benchmarks integration, and phased timeline.
MIT — © 2026 Nous Research