So what
The model market is too crowded for one leaderboard.
The Pareto frontier is the right first map because it forces the question leaderboards skip: "Is a cheaper model already good enough?" It also gives you a consistent way to read market movement:
-
Frontier shifts down and right — capability is getting cheaper at a given intelligence band.
-
Frontier gets thinner — more models are being dominated; the rational shortlist is shrinking.
-
A lab loses frontier slots while retaining benchmark rank — its premium is harder to defend on general workloads. OpenAI's current position is the clearest example of this.
The open thread: the next useful frontier is not two-dimensional. Production routing needs intelligence, price, latency, context length, coding score, reliability, and maybe modality in the same view. The hard part is not drawing that chart. The hard part is deciding which dimensions are universal enough to compare across workloads, and which ones only matter inside a specific application. That is why model evaluation is not a one-time exercise — it is the ongoing work of keeping your routing decisions honest.
Data: Artificial Analysis Intelligence Index v4.0.4, blended price 3:1 input:output. Snapshots: 2026年05月25日, 2026年05月31日, 2026年06月02日.