each MCP server the user configures clogs up the context with instructions and tool definitions, whether the agent needs them for this conversation or not. More telemetry in the context window is not more signal. Past a point it is less, because the agent spends its attention budget on data nobody decided was relevant.
A Healthy Baseline Is a Decision, Not a Reading
So which signals should the agent read? That is not a question the platform answers. It is a question an operator answers.
Consider what it takes to say "this is healthy." Specifically: the p99 latency on the checkout path should sit under 300 milliseconds — except during the nightly reindex, when 900 is fine and expected. Error rate on the auth service above 0.5% is a page; the same rate on the recommendations service is a shrug because it fails open. The queue depth metric that the dashboard renders in red is actually the metric the team stopped trusting after the last migration, and the real signal moved to a different counter that the dashboard does not even show. Every one of those is a judgment call. Every one of them was made by someone who got paged, traced the incident, and decided what the line should be.
That is why the telemetry standard an agent reads against has to be a praxis output. It comes from the people who run the system, written down from the actual experience of operating it — not handed down by a platform team that provisions the dashboards and never gets paged. The platform team can give you the stream. They cannot tell you that a spike in the third service is meaningless and a flutter in the first one means drop everything, because they have never been on the wrong end of either at 2 a.m.
The Shaping Role Is Underbuilt
The infrastructure to do this is further along than the discipline to use it well. Nearly 90 percent of enterprises now run internal platforms, with observability sitting inside the dozen platform capabilities that are now table stakes, according to the field's 2025 year-end review drawing on the State of Platform Engineering and the 2025 DORA report. The same review found that AI adoption and platform maturity now drive each other, which means more agents reading more telemetry, soon.
But only about a quarter of those organizations have a dedicated platform product manager. The platform exists. The role that decides what the platform's signals mean — the person who turns raw telemetry into a standard an agent can act on — is mostly not staffed. We are wiring agents into observability streams faster than we are deciding who owns the question of which signals matter. That gap is where the bad agent behavior is going to come from.
The Standard Is Being Written by Practitioners
There is a healthier version of this already taking shape, and it is being built by the people who operate the systems, not specified from above. OpenTelemetry's AI-agent observability work, driven by its GenAI special interest group, is an effort to standardize the shape of the telemetry that agent apps emit. The point is explicit in their own framing: for a non-deterministic agent, telemetry is a feedback loop used to evaluate and improve the agent's behavior, and the community needs to set standards around that telemetry's shape to avoid lock-in.
That is the model. The standard for what an agent reads gets authored by practitioners — maintainers, on-call engineers, the people who feel the consequences — and it gets written down where the agent can use it. The research backs this up: the documented patterns for feeding telemetry to agents already include local prompt iteration, CI-based optimization, and autonomous agents that adapt using telemetry, built on named frameworks like DSPy and PromptWizard. This is not speculation about a future capability. The architecture exists. What is still missing on most teams is the human decision about what the agent should pay attention to.
What to Actually Do
If you are deciding whether to give your coding agent observability data, give it. The agent that sees production is better than the agent that does not. But do not hand it the raw firehose and call it context.
Write down the baselines. Which paths are hot, what normal looks like on each, which alerts are real and which are theater. Name the anomalies that matter and the ones the team learned to ignore. Make that document the thing the agent reads against, not the unfiltered metric stream. And put it in the hands of someone who actually runs the system, because the value of telemetry to your agent is exactly equal to the quality of the judgment that shaped it. The data is access-rent. The standard is the asset.