http://127.0.0.1:8100/mcp server.py:2055
INFO: Started server process [21826]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on
http://127.0.0.1:8100 (Press CTRL+C to quit)
Antigravity CLI with MCP Connection
The Master Agent starts a FastMCP server interface. This allows Antigravity CLI to be used as a MCP client:
xbill@penguin:~/a2a-benchmark/.agents$ more mcp_config.json
{
"mcpServers": {
"bench": {
"serverUrl": "http://127.0.0.1:8100/mcp",
"serverURL": "http://127.0.0.1:8100/mcp"
}
}
}
ADK Validation of the Master Agent
Because it is Python based- the Master Agent can also be viewed with the ADK web interface:
So What is all this Doing?
All of the setup and configuration allows you to start debugging and visualizing Agent flows using the A2A protocol. The Agents can be validated using the A2A Inspector. The Python Agents can be checked with the ADK web interfaces. The Master Agent can be validated with Antigravity CLI as a MCP client.
Time to Starting Running some Numbers!
Now that the individual Agents have been validated — open several Terminal windows and start up all the remaining agents:
- benchmark-go.sh — (port 8102)
- benchmark-python.sh-(port 8101)
- benchmark-node.sh- (port 8103)
- benchmark-rust.sh- (port 8104)
- benchmark-master.sh — Orchestrator Agent (port 8100)
Start the Benchmarking Session
Once all the agents have been started — the Master agent on port 8100 is connected over MCP with Antigravity CLI:
I have completed the benchmark run across all four language agents for primes 1 to 22.
### Summary of Actions Taken:
- Agent Startup & Verification: Automatically checked and started all four target language agents (Node.
js/TypeScript, Rust, Go, Python) and verified they were active and listening on their respective A2A JSON-RPC ports.
- Benchmark Scripting: Created a customized benchmarking script run_benchmark_1_22.py that targets prime calculations
from
$N = 1$ to $N = 22$.
- Execution: Ran the benchmark and recorded the exact computation time (within the tool call) and the total Agent-to-
Agent (A2A) network/orchestration latency (RTT) for every prime count $N$.
- Data Plotting: Generated visualizations for both calculation times ( prime_calculation_times_1_22.png ) and
overall latency ( a2a_latency_times_1_22.png ).
- Reporting: Compiled all calculations and visualizations into the analysis_results.md artifact.
Please view analysis_results.md to inspect the table and embedded plots showing the comparison of calculation speeds
and
latency characteristics!
Review Performance
The results of the benchmark are graphed:
With latency times:
Final Results
Antigravity CLI summarized the results of the Agent Benchmarking:
Formulating Key Findings
Here is a summary of the key findings from the benchmark results for primes 1 to 22:
### 1. Core Computation Performance (Lucas-Lehmer Prime Calculation)
• Rust is the fastest: Computes $N=22$ in 195.96 ms.
• Node.js (TS) is extremely close: Computes $N=22$ in 214.06 ms.
• Go performs well: Computes $N=22$ in 279.54 ms.
• Python is the slowest: Computes $N=22$ in 572.48 ms (roughly 3x slower than Rust).
### 2. A2A Framework & Protocol Overhead
• Minimal Overhead (Rust & Node.js):
• Rust and Node.js use lightweight custom HTTP wrappers. Their framework overhead is practically negligible (< 2
ms). For example, Node's total round-trip time (RTT) for $N=22$ is 215.78 ms (only 1.72 ms higher than the raw
calculation time).
• Significant Base Overhead (Go & Python):
• Both Go and Python use the official google-adk library. This introduces a fixed baseline overhead per request
(schema validation, logging, and metadata context processing).
• Go incurs a consistent baseline RTT of ~1.2s to 1.6s.
• Python incurs a consistent baseline RTT of ~2.3s to 3.0s.
and a deeper dive:
Analyzing Agent Initialization
Based on an analysis of the codebase and execution paths, here is the breakdown of why the benchmark produced these
results:
──────
### Factor 1: The Massive Latency (RTT) Difference (Framework vs. Mocking)
The most striking difference is that Rust and Node.js have near-zero overhead, whereas Go takes ~1.5s and Python
takes
~2.5s. This is entirely due to how the agents process the benchmark messages:
- Rust and Node.js (Mocked Agent Logic / Direct Execution):
• Both main.rs and server.ts implement custom JSON-RPC endpoints that intercept incoming queries and
bypass the LLM entirely.
• They use simple regex to extract the number of primes directly from the text "Calculate the first {N} Mersenne
primes" , execute the math, and return a mock agent response message. Since there are no LLM API calls, the
overhead is practically zero.
- Go and Python (Real Agentic Execution via ADK):
• Both Go (main.go) and Python (agent.py) use the official Agent Development Kit ( google-adk ) and
hook up the Gemini 2.5 Flash model.
• When they receive a message, the ADK runner executes a full agent reasoning loop. This involves two sequential
remote LLM calls (one to determine which tool to call, and another to formulate the final natural language
response).
• This remote LLM invocation via the internet accounts for the 1.5–3.0 second latency overhead.
──────
### Factor 2: Mathematical Computation Speeds (Lucas-Lehmer Loop)
At higher values (e.g., $N=22,ドル where the Mersenne number 2ドル^{9941}-1$ has 2,993 decimal digits and the Lucas-Lehmer
test requires 9,939 iterations of squaring and modular reductions), we see clear performance tiers:
- Rust (195.96 ms):
• Rust is compiled to optimized native machine code (using --release ). The num-bigint crate executes the big
integer arithmetic without garbage collection overhead, making it the most efficient compiled loop.
- Node.js (214.06 ms):
• V8's JIT compiler is extremely good at compiling JavaScript loops to native machine instructions. Additionally,
V8's native BigInt implementation is heavily optimized at the engine level (written in assembly/C++).
- Go (279.54 ms):
• While Go compiles to native code, its implementation in main.go performs heap allocations inside the hot
loop:
a := new(big.Int).Rsh(s, uint(p))
b := new(big.Int).And(s, m)
Allocating new big.Int objects inside a loop that runs thousands of times puts extra pressure on Go's runtime
allocator and garbage collector, dragging down performance compared to Rust.
- Python (572.48 ms):
• Python handles arbitrary-precision integers natively, but executing the loop in interpreted bytecode has a lot
of interpreter overhead compared to compiled or JIT-compiled languages. 9,939 iterations of the loop in bytecode
is the bottleneck.
Core Findings
For low numbers of Mersenne primes that execute quickly- all four languages are comparable. An unusual result was the good performance in the mid range from Node. This could have been from lower overhead of the Node A2A toolkit or potentially from JavaScript engine optimizations. As expected, GO and Rust did break out and delivered the best performance as the complexity scaled.
Summary
The goal of the demo/article was to get basic Agents implemented across multiple programming languages and benchmark the Agent performance finding Mersenne primes.
The Google Agent development kit (ADK) was presented along with the complimentary A2A (Agent to Agent) protocol. Three basic agents were presented — covering various combinations of programming languages and Agent implementation approaches.
Finally — a Master/Orchestrator agent was started to connect and delegate to the other agents via the A2A protocol. Antigravity CLI was used to connect to the Master Agent over MCP and execute the benchmarks.### Cross Language A2A Agent Benchmarking with Gemini 3 and Antigravity CLI