Inside Systems 01: Your Verification Process Did Not Break. It Was Replaced.

DEV Community

The instinct Vikram built in 2017 required being the person who passed the unsourced number. The tool creates conditions where that experience is less likely to occur. Which means the conditions that would update or reinforce that instinct are also less likely to occur.

The instinct does not disappear. It atrophies through reduced exercise in the specific conditions that maintain it.

The Calibration Problem

The cost that is already running is not located in any single output.

It is located in the calibration itself.

Vikram's judgment about what needs verification is being trained on tool outputs. When an output is accurate, he files: this kind of output does not require checking. When an output is accurate again, he files the same. Over five months of mostly accurate outputs, the prior probability that any given output needs checking has shifted downward without a deliberate decision.

This is rational updating on available evidence. The problem is that the evidence set is missing the failure cases that would correct it. The tool's errors are not randomly distributed across outputs. They cluster in specific domains, specific types of claims, specific question shapes where the training produced confident-sounding text that does not correspond to verifiable fact. Those failure modes are not visible in the surface form of the output. You encounter them by checking, and the rate at which you check has been declining.

The calibration is updating on a sample that excludes the data most relevant to calibration.

The Identity Still Feels Intact

The identity this behavior protects is worth naming directly.

Vikram considers himself someone who does not cut corners on accuracy. This is not self-flattery in his case. He has a documented history of the opposite, built over seven years including one expensive correction that produced a lasting rule. The professional identity is real and earned.

The behavior of not applying the sourcing rule to tool outputs is not experienced as cutting a corner. It is experienced as appropriate adaptation to a new tool. A different kind of process, not a reduced one.

Both things are simultaneously true: the identity is accurate, and the behavior contradicts what the identity would require if applied consistently. They coexist because the new process feels substantively different from what the rule was designed to prevent, even when the failure mode it prevents is identical.

The rule was built to catch unsourced numbers passed to clients. The tool produces unsourced numbers with excellent surface form. The rule does not automatically transfer because the experience of using the tool does not feel like the experience that created the rule.

The Internal Argument

Here is the internal argument that does not resolve cleanly.

One side: the tool is a productivity tool, using it well requires calibrating trust to actual performance, the outputs are reliable enough that applying the full verification protocol to every output would eliminate the efficiency gain that justifies using it, treating the tool with blanket suspicion is not a practical working posture.

Other side: the failure modes are invisible in the surface form, the calibration is updating on a biased sample, the instincts being maintained through reduced exercise are the ones most needed for the cases the tool gets wrong, and those cases are exactly the ones where no surface signal flags the need for checking.

Both arguments are structurally sound. Neither defeats the other.

The working resolution most people land on is implicit rather than explicit: they verify when something feels uncertain and trust when it does not. The problem with that resolution is that the tool's errors frequently do not feel uncertain. They arrive feeling like the outputs that were correct, because they were produced by the same process.

Feeling uncertain is not a reliable trigger for verification when the signal is absent in the class of errors that matter most.

The Practical Adjustment

The practical adjustment is narrower than it sounds.

Not full verification of every output. That eliminates the tool's value and is not what careful work requires even without the tool.

One question, asked after reading any output that will be passed forward: what specifically would need to be true for this to be correct, and have I confirmed any of it independently?

The question does not require answering yes. It requires being asked.

Vikram's 2017 rule was not "verify everything." It was "never forward a number you did not personally trace." The precision of the rule is what made it operational. It covered the specific failure mode he had encountered. It did not require treating everything with equal suspicion.

The equivalent for tool outputs is a similarly narrow rule, derived from the actual failure modes of the actual tool you are using, applied to the specific class of claims where those failure modes cluster.

That rule does not exist in generic form. You have to build it from your own failure cases.

Which means you have to be present when the failure cases occur.

Vikram's March Incident

Vikram found his failure case in March, eight months into regular use.

A unit cost figure in a supplier analysis. Accurate-sounding, appropriately specific, surrounded by a well-constructed paragraph. Wrong by 23 percent. The paragraph was so well constructed that he had read it as confirmation of a figure he vaguely remembered rather than as a claim requiring independent verification.

He traced the error. He updated his practice. He now asks the question after every output that contains specific quantitative claims.

The adjustment took one incident to make and five minutes per output to run.

The incident cost two days of rework and one client conversation he would have preferred not to have.

The question he did not ask for eight months was: what would I need to confirm for this to be usable?

He knew how to ask it. The tool had trained him, through eight months of mostly accurate outputs, to ask it less often.

That is what the tool actually did to his process. Not through any design intent. Through the same mechanism that any reliable system uses to reduce the perceived need for verification: it was right often enough that checking felt redundant, until the specific case where it was not.