Copied to Clipboard
how AI is benchmarked and the recent piece on
why a leaderboard can mislead.
Speed is no longer the closed labs' advantage, either. The hosting company Baseten demonstrated that it could serve the leading open model at hundreds of tokens a second on the newest chips (how they built it). "Open" no longer has to mean "slow" or "run it yourself on a sluggish home rig." Frontier-class responsiveness is available from a model whose weights are public, removing one of the last reasons businesses defaulted to closed providers.
The dynamic is straightforward: renting versus owning. A hosted model is renting -- convenient, always maintained, but the landlord can change the locks. An open model is owning -- more responsibility, more setup, but nobody can evict you. For years renting was clearly the better deal because the rentals were nicer. This week reminded everyone that eviction can come with no notice, and separately, that the houses you can own have gotten very nice indeed. The combination is what drives the surge of attention.
The caveats are real. First, the specifications these labs advertise -- how big the models are, how they are built -- are largely self-reported and have not been independently verified, so the spec sheets should be treated as marketing until outside analysis catches up. Second, matching the frontier in one test of office tasks is not matching it everywhere; these models can still trail on the hardest reasoning and the longest, messiest jobs. Third, the biggest of them demand serious, expensive hardware to run well, which means the insurance policy is genuinely practical for a company with a server budget and mostly aspirational for an individual with a single graphics card. The shift is real, but it is a shift in the strategic logic of who depends on whom -- not a claim that open has already won.
Originally published on Ground Truth, where every claim is checked against the primary source.