Revisiting Benchmarking- Building a Rust A2A Agent

DEV Community

http://127.0.0.1:8100/mcp server.py:2055

INFO: Started server process [21826]

INFO: Waiting for application startup.

INFO: Application startup complete.

INFO: Uvicorn running on http://127.0.0.1:8100 (Press CTRL+C to quit)

Antigravity CLI with MCP Connection

The Master Agent starts a FastMCP server interface. This allows Antigravity CLI to be used as a MCP client:

xbill@penguin:~/a2a-benchmark/.agents$ more mcp_config.json

{

"mcpServers": {

"bench": {

"serverUrl": "http://127.0.0.1:8100/mcp",

"serverURL": "http://127.0.0.1:8100/mcp"

}

}

}

ADK Validation of the Master Agent

Because it is Python based- the Master Agent can also be viewed with the ADK web interface:

So What is all this Doing?

All of the setup and configuration allows you to start debugging and visualizing Agent flows using the A2A protocol. The Agents can be validated using the A2A Inspector. The Python Agents can be checked with the ADK web interfaces. The Master Agent can be validated with Antigravity CLI as a MCP client.

Time to Starting Running some Numbers!

Now that the individual Agents have been validated — open several Terminal windows and start up all the remaining agents:

benchmark-go.sh — (port 8102)
benchmark-python.sh-(port 8101)
benchmark-node.sh- (port 8103)
benchmark-rust.sh- (port 8104)
benchmark-master.sh — Orchestrator Agent (port 8100)

Start the Benchmarking Session

Once all the agents have been started — the Master agent on port 8100 is connected over MCP with Antigravity CLI:

I have completed the benchmark run across all four language agents for primes 1 to 22.

### Summary of Actions Taken:

Agent Startup & Verification: Automatically checked and started all four target language agents (Node. js/TypeScript, Rust, Go, Python) and verified they were active and listening on their respective A2A JSON-RPC ports.
Benchmark Scripting: Created a customized benchmarking script run_benchmark_1_22.py that targets prime calculations from $N = 1$ to $N = 22$.
Execution: Ran the benchmark and recorded the exact computation time (within the tool call) and the total Agent-to- Agent (A2A) network/orchestration latency (RTT) for every prime count $N$.
Data Plotting: Generated visualizations for both calculation times ( prime_calculation_times_1_22.png ) and overall latency ( a2a_latency_times_1_22.png ).
Reporting: Compiled all calculations and visualizations into the analysis_results.md artifact.

Please view analysis_results.md to inspect the table and embedded plots showing the comparison of calculation speeds

and

latency characteristics!

Review Performance

The results of the benchmark are graphed:

With latency times:

Final Results

Antigravity CLI summarized the results of the Agent Benchmarking:

Formulating Key Findings

Here is a summary of the key findings from the benchmark results for primes 1 to 22:

### 1. Core Computation Performance (Lucas-Lehmer Prime Calculation)

• Rust is the fastest: Computes $N=22$ in 195.96 ms.

• Node.js (TS) is extremely close: Computes $N=22$ in 214.06 ms.

• Go performs well: Computes $N=22$ in 279.54 ms.

• Python is the slowest: Computes $N=22$ in 572.48 ms (roughly 3x slower than Rust).

### 2. A2A Framework & Protocol Overhead

• Minimal Overhead (Rust & Node.js):

• Rust and Node.js use lightweight custom HTTP wrappers. Their framework overhead is practically negligible (< 2

ms). For example, Node's total round-trip time (RTT) for $N=22$ is 215.78 ms (only 1.72 ms higher than the raw

calculation time).

• Significant Base Overhead (Go & Python):

• Both Go and Python use the official google-adk library. This introduces a fixed baseline overhead per request

(schema validation, logging, and metadata context processing).

• Go incurs a consistent baseline RTT of ~1.2s to 1.6s.

• Python incurs a consistent baseline RTT of ~2.3s to 3.0s.

and a deeper dive:

Analyzing Agent Initialization

Based on an analysis of the codebase and execution paths, here is the breakdown of why the benchmark produced these

results:

──────

### Factor 1: The Massive Latency (RTT) Difference (Framework vs. Mocking)

The most striking difference is that Rust and Node.js have near-zero overhead, whereas Go takes ~1.5s and Python

takes

~2.5s. This is entirely due to how the agents process the benchmark messages:

Rust and Node.js (Mocked Agent Logic / Direct Execution): • Both main.rs and server.ts implement custom JSON-RPC endpoints that intercept incoming queries and bypass the LLM entirely. • They use simple regex to extract the number of primes directly from the text "Calculate the first {N} Mersenne primes" , execute the math, and return a mock agent response message. Since there are no LLM API calls, the overhead is practically zero.
Go and Python (Real Agentic Execution via ADK): • Both Go (main.go) and Python (agent.py) use the official Agent Development Kit ( google-adk ) and hook up the Gemini 2.5 Flash model. • When they receive a message, the ADK runner executes a full agent reasoning loop. This involves two sequential remote LLM calls (one to determine which tool to call, and another to formulate the final natural language response). • This remote LLM invocation via the internet accounts for the 1.5–3.0 second latency overhead.

──────

### Factor 2: Mathematical Computation Speeds (Lucas-Lehmer Loop)

At higher values (e.g., $N=22,ドル where the Mersenne number 2ドル^{9941}-1$ has 2,993 decimal digits and the Lucas-Lehmer

test requires 9,939 iterations of squaring and modular reductions), we see clear performance tiers:

Rust (195.96 ms): • Rust is compiled to optimized native machine code (using --release ). The num-bigint crate executes the big integer arithmetic without garbage collection overhead, making it the most efficient compiled loop.
Node.js (214.06 ms): • V8's JIT compiler is extremely good at compiling JavaScript loops to native machine instructions. Additionally, V8's native BigInt implementation is heavily optimized at the engine level (written in assembly/C++).
Go (279.54 ms): • While Go compiles to native code, its implementation in main.go performs heap allocations inside the hot loop: a := new(big.Int).Rsh(s, uint(p)) b := new(big.Int).And(s, m) Allocating new big.Int objects inside a loop that runs thousands of times puts extra pressure on Go's runtime allocator and garbage collector, dragging down performance compared to Rust.
Python (572.48 ms): • Python handles arbitrary-precision integers natively, but executing the loop in interpreted bytecode has a lot of interpreter overhead compared to compiled or JIT-compiled languages. 9,939 iterations of the loop in bytecode is the bottleneck.

Core Findings

For low numbers of Mersenne primes that execute quickly- all four languages are comparable. An unusual result was the good performance in the mid range from Node. This could have been from lower overhead of the Node A2A toolkit or potentially from JavaScript engine optimizations. As expected, GO and Rust did break out and delivered the best performance as the complexity scaled.

Summary

The goal of the demo/article was to get basic Agents implemented across multiple programming languages and benchmark the Agent performance finding Mersenne primes.

The Google Agent development kit (ADK) was presented along with the complimentary A2A (Agent to Agent) protocol. Three basic agents were presented — covering various combinations of programming languages and Agent implementation approaches.

Finally — a Master/Orchestrator agent was started to connect and delegate to the other agents via the A2A protocol. Antigravity CLI was used to connect to the Master Agent over MCP and execute the benchmarks.### Cross Language A2A Agent Benchmarking with Gemini 3 and Antigravity CLI

Top comments (1)

xulingfeng profile image

xulingfeng

AI testing storyteller. Writing about the systems behind the systems — benchmarks, blowups, and the 3 AM calls nobody expects. 15yr QA → building AI test frameworks.

Email

xulingfengcn@gmail.com
Pronouns

he/him/his
Joined

May 20, 2026

• Jun 1

Multi-language agent benchmarking is a space we have been poking at too — the cross-language orchestration overhead is real, especially when bouncing between Node and Python. The A2A-as-MCP-server approach for the orchestrator is a nice pattern, curious if you hit any timeout walls with the Rust agent on Lambda cold starts.