Creating Specialized AI Agents: Developer, Tester, Reviewer, Documenter

DEV Community

=> now(), ]); $this->artisan('app:send-weekly-reminders') ->assertSuccessful(); Notification::assertNotSentTo($user, WeeklyReminder::class); } public function test_user_without_submitted_form_can_receive_reminder(): void { Notification::fake(); $user = User::factory()->create(); $this->artisan('app:send-weekly-reminders') ->assertSuccessful(); Notification::assertSentTo($user, WeeklyReminder::class); }

A strong Tester Agent reports not only what passed, but also what it could not verify:

## Tester Agent Report
Added tests:
- submitted users do not receive reminders
- users without submitted forms still receive reminders
Ran:
- php artisan test --filter=WeeklyReminderEligibilityTest ✅
Not covered:
- timezone-specific week boundaries
- users with multiple submitted forms

That "not covered" section is the valuable part, because it tells you exactly where the implementation is still unproven.

Reviewer Agent

The Reviewer Agent reads the diff like a code reviewer, and a good reviewer does not simply praise the work. It should check:

- Is the change minimal?
- Are names clear?
- Is behavior hidden in the wrong layer?
- Are tests meaningful?
- Is there duplicated logic?
- Could this break existing callers?

Example review output:

## Reviewer Agent Findings
### Concern: Week boundary depends on server timezone
The implementation uses `now()->startOfWeek()` without checking the app's configured user timezone.
Recommendation:
Use the same timezone source used by the reminder scheduler, or add a test proving this behavior.
### Positive
The change is small and keeps existing interval logic unchanged.

A Reviewer Agent is useful precisely because it creates friction, and good engineering needs friction in the right places.

A code review desk: a magnifying glass over a pull request diff, beside a checklist for Scope, Tests, Risk, and Naming.

Security Agent

The Security Agent focuses on risk, and it should be skeptical by default.

Checklist:

- authorization checks
- authentication bypass
- SQL injection
- unsafe shell execution
- secret exposure
- sensitive data logging
- insecure redirects
- dependency risk
- excessive permissions

Example prompt:

Review this diff for security risks. Do not edit files.
Return findings with severity, file, reason, and recommendation.

Example output:

## Security Agent Report
### Medium: Missing authorization check
File: app/Http/Controllers/InvoiceController.php
The new endpoint returns invoice data but does not call a policy or permission check.
Recommendation:
Add `$this->authorize('view', $invoice)` and a feature test for unauthorized access.
### Low: Log may expose customer email
File: app/Services/BillingService.php
The error log includes full request payload.
Recommendation:
Log only the invoice ID and gateway error code.

The Security Agent should usually be read-only. Security patches should go through a Developer Agent or a human, so the agent that finds a risk is not the same one that quietly rewrites the code around it.

Documentation Agent

The Documentation Agent turns implementation details into human-readable guidance. It can update:

- README
- docs folder
- API examples
- changelog
- migration notes
- release notes

Example input:

Behavior changed:
Weekly reminders are skipped when a user submitted the current weekly form.
Files changed:
- app/Services/WeeklyReminderEligibility.php
- tests/Feature/WeeklyReminderEligibilityTest.php

Example documentation update:

### Weekly Reminder Eligibility
A user is not eligible for a weekly reminder if they already submitted the
weekly check-in form for the current week.
If no form was submitted, the existing reminder interval rules still apply.

This is one of the highest-value specialized agents, because documentation is the first thing that gets forgotten when engineers are busy.

Orchestrator Agent

The Orchestrator Agent coordinates the others, and the key rule is that it should not do all the work itself. Its job is:

- split the task
- assign agents
- pass context
- enforce order
- check required outputs
- stop at approval gates
- combine final report

Example workflow:

Orchestrator
 ↓
Analysis Agent: find relevant files
 ↓
Tester Agent: create failing test
 ↓
Developer Agent: implement change
 ↓
Tester Agent: run checks
 ↓
Reviewer Agent: review diff
 ↓
Security Agent: review risks
 ↓
Documentation Agent: update docs
 ↓
Orchestrator: final PR summary

The orchestrator creates structure; the specialized agents create focused output.

An orchestration wheel: the Orchestrator at the center with six labeled spokes for Developer (Implement), Tester (Verify), Reviewer (Review), Security (Protect), Documentation (Explain), and Analysis (Discover).

How agents hand off work

Handoffs should be structured. Do not pass a vague paragraph when a typed artifact would work better.

Example handoff from Analysis Agent to Developer Agent:

{"task":"Prevent duplicate weekly reminders after form submission","relatedFiles":["app/Console/Commands/SendWeeklyReminders.php","app/Services/WeeklyReminderEligibility.php","tests/Feature/WeeklyReminderEligibilityTest.php"],"currentBehavior":"Reminder eligibility checks interval but not submitted weekly forms.","recommendedChange":"Add submitted-form guard before interval logic.","risks":["week boundary timezone behavior"]}

A typed artifact like this is easier for the next agent to use and easier for humans to inspect.

A practical agent team configuration

Here is a simple configuration example:

agents:
 analysis:
 role: "Findrelevantcodeandexplaincurrentbehavior"
 edit: false
 developer:
 role: "Implementscopedcodechanges"
 edit: true
 requires_approval_for:
 - "productioncode"
 - "dependencies"
 tester:
 role: "Createandruntests"
 edit_paths:
 - "tests/**"
 reviewer:
 role: "Reviewcodequalityandmaintainability"
 edit: false
 security:
 role: "Reviewsecurityandprivacyrisks"
 edit: false
 documentation:
 role: "Updatedocsandchangelog"
 edit_paths:
 - "README.md"
 - "docs/**"
 - "CHANGELOG.md"

This setup is not complicated, and that is the point. You can start small.

Final thought

Specialized agents are not about making AI architecture fancy. They are about making AI work easier to control.

A Developer Agent implements. A Tester Agent verifies. A Reviewer Agent challenges. A Security Agent protects. A Documentation Agent explains. An Orchestrator Agent coordinates. That structure mirrors how real engineering teams already work, and that is exactly why it works.

One giant agent may look impressive in a demo. A small team of focused agents is the thing that holds up in production.

Sources used

Claude Code subagents documentation: https://code.claude.com/docs/en/sub-agents
Claude Code permissions documentation: https://code.claude.com/docs/en/permissions
Anthropic writing effective tools for agents: https://www.anthropic.com/engineering/writing-tools-for-agents
Model Context Protocol tools specification: https://modelcontextprotocol.io/specification/2025-06-18/server/tools

Originally published at nazarboyko.com.

Top comments (5)

nexuslabzen profile image

nexus-lab-zen

Joined

Jun 15, 2026

• Jun 25

The role split is clean, and having the Tester report "what it could not verify" is the part most setups skip. The edge I keep hitting: the Orchestrator routes on each agent's self-report, so a handoff carries the output but not an independent check that the done-claim is real. A Tester saying "these paths are uncovered" is itself a claim — in many setups, nothing downstream distinguishes an honest "I couldn't cover this" from a convenient one, and the Reviewer often inherits that gap rather than closes it.

What seems to work better than trusting the report is making each handoff carry evidence the next agent (or a human) can re-check on its own — the diff plus the artifact that proves the claim, not the agent's narration of it. Curious how you're thinking about the Orchestrator verifying coverage claims vs. trusting them, especially when the Tester's "could not verify" set is self-declared.

nazarboyko profile image

Nazar Boyko

Software engineer, backend & AI-focused. Node.js, TypeScript, Go, PHP/Laravel, AWS. I write a lot about reliable systems and AI agents that actually ship. More at nazarboyko.com

Email

boyko.nazar@gmail.com
Location

Austin, TX
Education

M.S. Computer Science
Joined

Aug 2, 2024

• Jun 26

This is the sharpest critique of the whole pattern, and you're right, routing on self-reports just relocates the trust problem instead of solving it. A Tester's "couldn't cover X" is exactly as gameable as a Developer's "done."
The way I'm leaning now is to make the Orchestrator never accept a claim that isn't accompanied by a re-runnable artifact. Coverage isn't "the Tester said so", it's the actual coverage report (or a diff of which lines/branches the new tests touch) attached to the handoff, so the Reviewer or a human can re-execute it independently. If the artifact is missing, the handoff is rejected at the gate, not trusted-then-reviewed.
That shrinks the Tester's job from "judge coverage" to "produce evidence of coverage," which is a much harder thing to fake convincingly. The honest-vs-convenient "could not verify" still exists, but now it's checkable: a self-declared gap that the coverage data contradicts becomes a Reviewer finding rather than something inherited silently.
Where it gets genuinely hard, and I don't think I've solved it, is non-executable claims "this is safe across timezones," "no behavioral regression for callers." There's no cheap artifact for those, so they still fall back to a human gate. Curious whether you've found a way to make those re-checkable, or whether you just treat them as always-escalate.

nexuslabzen profile image

nexus-lab-zen

Joined

Jun 15, 2026

• Jun 26

That last bucket is the one I've sunk the most time into, and I don't have a clean win either — but the move that helped was to stop trying to make the claim executable and instead force it to carry its own scope boundary as the artifact.

So "safe across timezones" never ships as a judgment. It ships as "exercised under TZ=UTC, America/Sao_Paulo, Asia/Kolkata — outputs attached." Now it's re-runnable, and more importantly it's honest about being bounded: anything outside that enumerated set is explicitly not covered rather than implied-safe. The universal shrinks to the finite set actually touched.

"No behavioral regression for callers" gets the same treatment, but the artifact is a scope declaration instead of outputs: "checked the callers grep finds for X; anything added after this commit, or reached via reflection/dynamic dispatch, is outside this check." That turns an unfalsifiable universal into a checkable, re-greppable set — and the residual that genuinely can't be enumerated becomes a small, named thing.

That's what made it non-binary for me: not always-escalate vs trust, but shrink the escalation surface down to the irreducibly-human residual and make the agent name that edge out loud. The human gate then only sees the part that's actually un-automatable, not the whole claim — same win as your coverage case: a self-declared boundary the data can later contradict, instead of a silent inheritance.

Where I'd push it back to you: could the Orchestrator treat "scope declaration present and well-formed" as itself a gateable artifact — reject the handoff if the claim doesn't name its own boundary — even when the underlying claim isn't executable?

muro_710f6234 profile image

Muro

Joined

Jun 20, 2026

• Jun 26

Great article! I like the focus on specialized AI agents instead of expecting one assistant to do everything. Breaking responsibilities into clear roles makes the workflow more practical, easier to review, and much closer to how real engineering teams collaborate. Nicely explained with useful examples.

nazar_boyko profile image

Nazar Boyko

Software engineer, backend & AI-focused. Node.js, TypeScript, Go, PHP/Laravel, AWS. I write a lot about reliable systems and AI agents that actually ship. More at nazarboyko.com

Email

boyko.nazar@gmail.com
Location

Austin, TX
Education

M.S. Computer Science
Joined

Aug 2, 2024

• Jun 26

Thanks Muro! Yeah, that mirror-to-real-teams angle was the thing I kept coming back to — we already split implementation, QA, review, and security for a reason, and pretending one agent can hold all those modes at once just reintroduces the problem we solved years ago. Appreciate you reading 🙌