Copied to Clipboard
Multi Agent Design
The multi-agent deployment consists of 5 agents:
- Researcher
- Judge
- Orchestrator
- Content Builder
- Course Builder
For a detailed analysis of the multi-agent architecture- this article provides the background information:
Multi-Agent A2A with the Agent Development Kit(ADK), Cloud Run, and Gemini CLI
Running/Testing/Debugging Locally
The main Makefile has been extended with extensive targets for managing the agents on the local development environment.
First check for local running agents:
✦ I will run the make local-status command to show you the current status of all local services.
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ Shell Checking the status of local processes. │
│ │
│ --- Local Process Status --- │
│ Service Port Status PID │
│ Frontend 5173 STOPPED - │
│ Backend 8000 STOPPED - │
│ Researcher 8001 STOPPED - │
│ Judge 8002 STOPPED - │
│ Builder 8003 STOPPED - │
│ Orchestrator 8004 STOPPED - │
│ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ All local processes are currently STOPPED.
Then all the agents can be started together:
xbill@penguin:~/gemini-cli-aws/multi-fargate/agents$ make start
Stopping any existing agent and server processes...
Starting all agents in background...
Waiting for sub-agents to start...
All agents started. Logs: researcher.log, judge.log, content_builder.log, orchestrator.log
Starting App Backend in background...
Starting Frontend dev server in background...
All services started. Logs: researcher.log, judge.log, content_builder.log, orchestrator.log, backend.log, frontend.log
Frontend: http://localhost:5173
Backend: http://localhost:8000
make local-status again to verify.
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ Shell Verifying corrected local-status with updated pattern. │
│ │
│ --- Local Process Status --- │
│ Service Port Status PID │
│ Frontend 5173 STOPPED - │
│ Backend 8000 RUNNING 20045 │
│ Researcher 8001 RUNNING 19668 │
│ Judge 8002 RUNNING 19673 │
│ Builder 8003 RUNNING 19676 │
│ Orchestrator 8004 RUNNING 20037 │
│ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ The backend is now showing as RUNNING, but the frontend has reverted to STOPPED. It appears the Vite process is exiting after its initial start.
I'll check the frontend.log again.
Then connect to the front end locally:
And the entire agent system will run in the local environment:
Local Logging / Debugging
Gemini CLI has full access to the local agent logs for debugging and troubleshooting:
> review the local logs for the multi-agent run
✦ I will check run_local.sh and the project structure to determine where logs are stored or how they are generated.
This will verify the local operation of the code:
✦ The local logs confirm a successful multi-agent run for the topic "the history of the internet." Here is a summary of the execution flow:
* Orchestrator (orchestrator.log): Successfully managed the pipeline. It captured the topic, coordinated the Research-Judge loop, and escalated to
content generation after the first iteration.
* Researcher (researcher.log): Gathered approximately 9.2KB of research data using the google_search tool. It completed its task with a 200 OK
response.
* Judge (judge.log): Evaluated the findings and gave a "pass" status. The judge noted that the research was "comprehensive, accurate, and
well-structured," making it an "excellent foundation for educational content."
* Content Builder (content_builder.log): Successfully transformed the validated research into a structured Markdown course module and finished
execution.
* Backend & Frontend (backend.log, frontend.log): The FastAPI backend (port 8000) and Vite frontend (port 5173) are running and communicating with
the orchestrator as expected.
Notable Observation:
* A minor warning appeared in researcher.log: Researcher callback: 'response' not found or has no parts. However, this did not impact the
successful completion of the research or the overall pipeline.
All services are currently active and logged at the DEBUG level as specified in run_local.sh.
Deploying to Amazon Fargate
The project level Makefile has targets for managing the Agent deployment to serverless endpoints. A utility script checks the deployment status:
make status
You can submit the build for GKE deployment:
✦ The application has been successfully deployed to AWS Fargate. All microservice images (Researcher, Judge, Content Builder, Orchestrator, and
App) have been built and pushed to Amazon ECR, and the ECS service has been updated.
Once the containers are deployed- you can then get the endpoint:
make endpoint, to retrieve the public IP of your Fargate service.
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ Shell Retrieving the public endpoint for the Fargate service. │
│ │
│ Fargate Endpoint: http://54.221.168.167:8080 │
│ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ The public endpoint for your Fargate service is: http://54.221.168.167:8080
You can use this URL to access your application or run an end-to-end test against it with make e2e-test-fargate.
The status can then be checked:
> make status
✦ I will check the status of the AWS Fargate services.
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ Shell Checking the status of the AWS Fargate services. │
│ │
│ ----------------------------------------------------------------------------------------------------------------------------------------- │
│ | DescribeServices | │
│ +---------+----------------------+----------+---------+---------------------------------------------------------------------------------+ │
│ | Desired | Name | Running | Status | TaskDef | │
│ +---------+----------------------+----------+---------+---------------------------------------------------------------------------------+ │
│ | 1 | adk-fargate-service | 1 | ACTIVE | arn:aws:ecs:us-east-1:106059658660:task-definition/adk-course-creator-task:2 | │
│ +---------+----------------------+----------+---------+---------------------------------------------------------------------------------+ │
│ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ The AWS Fargate service adk-fargate-service is currently ACTIVE with 1 running task.
The service will be visible in the AWS console:
And the entire system can be tested:
✦ The end-to-end test against the deployed AWS Fargate endpoint was successful! The system correctly orchestrated the microservices to create a
structured "History of the Internet" course and streamed the results back via Server-Sent Events (SSE).
Deployment & Verification Summary:
- Status: The adk-fargate-service is ACTIVE and running.
- Endpoint: http://54.221.168.167:8080
- E2E Test: Completed successfully against the remote Fargate task.
- Local Monitoring: make local-status is available and correctly reports local process states using robust pattern matching.
Your multi-agent AI Course Creator is now fully deployed and operational on AWS Fargate.
Running the Web Interface
Start a connection to the Cloud Run deployed app:
Then connect to the app :
Then use online course generator:
Review Fargate Setup
Gemini CLI was used to review the Fargate Setup:
✓ ReadFile setup-fargate.sh
✦ All agents in this project are configured to run within a single AWS Fargate task. Therefore, they share a common pool of resources:
* Total Task CPU: 1024 (1 vCPU)
* Total Task Memory: 2048 (2 GB RAM)
Components Sharing these Resources:
The following five containers share the 1 vCPU and 2 GB RAM:
1. app: The FastAPI backend and web interface.
2. orchestrator: Manages the agent workflow.
3. researcher: Performs topic research using Google Search.
4. judge: Evaluates research findings.
5. content-builder: Generates the final course modules.
This configuration is defined in the setup-fargate.sh script, which registers the ECS task definition with these specific resource allocations.
and a deep dive into the deployment pattern:
> how is this app deployed on Fargate?
✦ I will examine the deploy-fargate.sh script and the root Makefile to understand the complete build and push process for the Fargate deployment.
✓ ReadFile deploy-fargate.sh
✦ The deployment of the AI Course Creator to AWS Fargate is a multi-step process managed by two primary scripts: setup-fargate.sh and
deploy-fargate.sh.
1. Infrastructure Provisioning (setup-fargate.sh)
This script sets up the foundational AWS resources required to run the application:
* ECS Cluster: Creates a cluster named adk-fargate-cluster.
* IAM Roles: Provisions the ecsTaskExecutionRole-adk with the necessary permissions to pull images from ECR and send logs to CloudWatch.
* Security Groups: Configures an adk-fargate-sg that allows inbound traffic on port 8080 (the web app port).
* Task Definition: Defines how the 5 microservices (app, orchestrator, researcher, judge, content-builder) are run. It specifies a single
Fargate task where all 5 containers share the same network stack (awsvpc) and resource pool (1 vCPU, 2 GB RAM). This allows them to
communicate with each other using localhost.
* ECS Service: Creates a Fargate service that maintains one running instance of the task definition.
2. Build and Push Process (deploy-fargate.sh)
This script handles the continuous delivery of the application code:
* ECR Repositories: Ensures that Amazon Elastic Container Registry (ECR) repositories exist for each of the 5 microservices.
* Authentication: Logs in the local Docker daemon to the ECR registry using the AWS CLI.
* Container Builds: Iteratively builds Docker images for each service:
* researcher
* judge
* content-builder
* orchestrator
* app (FastAPI backend + Frontend)
* Image Push: Tags the built images and pushes them to their respective ECR repositories.
* Deployment Trigger: Once the images are pushed, the ECS service can be updated (e.g., via aws ecs update-service --force-new-deployment) to
pull the latest versions and restart the tasks.
Summary of Deployment Flow
1. Run setup-fargate.sh (once) to create the AWS infrastructure.
2. Run deploy-fargate.sh to build and upload the container images.
3. Monitor Status: Use make status-fargate (which likely wraps aws ecs describe-tasks) to verify the containers have successfully reached the
RUNNING state.
This architecture leverages Fargate's serverless nature to run a complex, multi-agent system without managing the underlying EC2 instances,
while the single-task approach simplifies service discovery and reduces cross-service latency.
Final Gemini CLI Code Review
As a final step — Gemini CLI was used for a full code review of the project:
✦ I've analyzed the project's architecture, security, and agent logic. I'll now provide a code review of the AI Course Creator project, focusing
on its distributed multi-agent design using ADK and A2A on AWS Fargate.
1. Architecture & Orchestration
* A2A Workflow: The system uses a clean SequentialAgent pipeline (course_creation_pipeline) in the orchestrator to chain specialized agents.
* State Management: The StateCapturer pattern is an effective way to bridge the context between independent A2A services. It intercepts LLM
responses and injects them into the session_state, which subsequent agents then consume.
* Loop Logic: Using LoopAgent with a max_iterations=2 limit for the Research-Judge cycle is a best practice to prevent infinite loops and
control costs.
* Escalation Control: The EscalationChecker correctly uses structured Pydantic output (JudgeFeedback) to decide when to break the research
loop.
2. ADK Best Practices
* Model Selection: The project correctly prioritizes gemini-2.5-flash for efficiency and reasoning, with gemini-2.5-pro as an alternative for
complex tasks.
* Tool Usage: The google_search tool in the researcher agent is well-integrated.
* Structured Outputs: Using Pydantic schemas for the judge agent ensures that the orchestration logic is robust and type-safe.
3. Security & Authentication
* Service-to-Service Auth: The authenticated_httpx.py utility is a standout feature. It handles Google OIDC tokens for secure communication
between agents, even when deployed on non-Google infrastructure like AWS Fargate.
* Environment Variables: Sensitive information like GOOGLE_API_KEY is passed via environment variables, following standard security practices.
* Dynamic A2A Cards: The a2a_utils.py middleware solves the challenge of discovering service URLs in dynamic container environments by
rewriting the Agent Card's url based on the x-forwarded-host header.
4. Infrastructure & Deployment
* Resource Efficiency: Running all 5 containers in a single Fargate task (1 vCPU, 2 GB RAM) is cost-effective for a prototype but may lead to
resource contention if multiple courses are generated simultaneously.
* Docker Strategy: Using a standardized shared/adk_app.py entry point across all agents ensures consistent logging, A2A registration, and
middleware application.
Summary
The project demonstrates a high level of technical maturity in distributed agent orchestration. It effectively adapts Google ADK patterns for a
cross-cloud environment (AWS Fargate) while maintaining strong security and clean architectural boundaries.
Summary
The Agent Development Kit (ADK) was used to build a multi-agent system with A2A support using the Gemini Flash LLM Model. This application was tested locally with Gemini CLI and then deployed to AWS Fargate. Finally, Gemini CLI was used for a complete project code review.