Skills don't work the way we think they do

Copied to Clipboard

One of the most hyped ideas in agent research is:

"Let the model write its own tools / skills."

But it is mostly a wasted effort. In this research, self-generated skills produced no meaningful improvement over baseline.

In some cases, they made performance worse.

Today's models simply cannot reliably create useful reusable procedural abstractions.

This matters because a huge part of current agent research assumes models can recursively improve by generating better skills/tools. This benchmark suggests that assumption is premature.

SkillBench chart showing self-generated skills did not meaningfully improve performance

Human-made skills work A LOT better

When Skills were carefully written by humans, performance jumped +16.2 percentage points on average.

But here's what's even more surprising:

Domain variance was extreme

Some domains saw small gains (~4-5 pp)
Others saw enormous gains (~50+ pp)

SkillBench chart showing high domain variance for human-made skills

Skills don't help the same in different fields.. They disproportionately help in structured, procedural domains.

Smaller models + skills ≈ bigger models without skills

A smaller model with curated Skills matched or exceeded a larger model without Skills.

This is huge for cost optimization:

Local agents
Edge deployment
Open-source models

Too many skills can hurt

Overly broad or verbose skill libraries degraded performance. Focused, minimal skill modules performed better.

SkillBench result showing too many skills can degrade performance

Pick your skills carefully. 2-3 skills work better than 4+ skills.

Here is my takeaway

If this paper is right (and i think it is, mostly because of my personal experiences with skill files):

Scaling alone isn't enough
Autonomy narratives are premature
Skill architecture design is now a first-class research problem

Read the full paper: https://arxiv.org/pdf/2602.12670

Top comments (0)

pic

Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Code of Conduct • Report abuse

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink.

Hide child comments as well

For further actions, you may consider blocking this person and/or reporting abuse

port

i am port, i do dev stuff and occasionally shitpost

Joined

Nov 20, 2024