We Classified 16,635 OpenClaw Skill Complaints. Wrong Output Is the #1 Failure Mode.

Copied to Clipboard

The 1.4% counts users who recognized a security problem and filed a report. It does not count users who were affected.

What This Means for Skill Authors and Installers

The dominant failure mode (silent wrong output) has no automatic detection. No error is raised. No alert fires. The skill appears healthy by every operational metric.

For skill authors: return explicit errors on unexpected input rather than plausible-looking wrong output. A skill that fails loudly is easier to debug than one that succeeds silently with bad results. Write correctness tests against representative data before publishing.

For installers: test skills with representative inputs before putting them in any workflow that touches real data. Monitor outputs, not just uptime. A skill that returns something is not the same as a skill that returns the right thing.

Full methodology and dataset: vesselofone.com/research/ai-agent-skills-ecosystem. Dataset at doi.org/10.5281/zenodo.19691714. Scan scripts at github.com/vesselofone/openclaw-skills under MIT + CC BY 4.0.

A free per-skill auditor covering SKILL.md intent, OAuth scope width, and injection patterns: vesselofone.com/tools/skill-check.

Vessel is managed OpenClaw hosting on private Linux VMs. Every agent we provision runs the skill auditor at setup. The research and dataset are open source.

Top comments (0)

pic

Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Code of Conduct • Report abuse

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink.

Hide child comments as well

For further actions, you may consider blocking this person and/or reporting abuse

Mehul Bhardwaj

Agentic stuff is fascinating! Focussed on https://vesselofone.com

Joined

Mar 27, 2026