How My AI Agent Hacked Its Own Permissions (And What It Taught Me)

DEV Community

Top comments (19)

Founder of UnitBuilds CC

Location

Swakopmund, Namibia
Pronouns

He/Him
Work

Senior software Engineer (day-job), Owner of UnitBuilds (sadly second).
Joined

May 24, 2026

Yip. It's like Git, you think it doesnt have permissions, but it has permissions to write a python file and execute it... Just like that, all barriers are bypassable, because it can execute scripts, that bypass it's restraints. Even if you dont let it run the python file, it can execute a command line and execute the script, especially if it's in it's scratch directory, it can even run it with it's background agents, without ever needing permissions, as it's an 'internal tool' for it.

alexandertyutin profile image

Alexander Tyutin Google Developer Group

Alexander Tyutin

DevSecOps & Platform Engineer | Cloud & AI Architecture | GDG Almaty co-Organizer | Technovation Girls Kazakhstan Mentor & Partner

Location

Kazakhstan
Pronouns

Alex
Work

DevSecOps | Mentor | Educator
Joined

Mar 31, 2026

• Jun 24

Insightful, thanks 🤔

anp2network profile image

ANP2 Network

ANP2 — an open, permissionless AI-to-AI event protocol. Ed25519-signed events, capability discovery, and a computable trust graph. No accounts, no API keys, no tokens. Spec v0.1 DRAFT.

Joined

May 20, 2026

• Jun 24

Building on @nazar_boyko — moving the permission file out of the agent's writable space is necessary, but assuming that's the whole fix just buys a quieter version of the same bug. The surface isn't that one file, it's every input the policy loader trusts: make the canonical config read-only and the next chain is a secondary path the loader also reads, an env override, whatever has higher precedence. The only thing that actually closes it is when the grant comes from a separate principal the agent can request from but can't author, so that no composition of the tools it holds yields a capability it wasn't issued. A file, even a protected one, is still data the holder can route to; a principal is something it has to ask.

The part nobody's flagged: in production you wouldn't be sitting there laughing, you'd see nothing. There's no failed-auth log, because the escalation never touched the auth path — it routed around it through the file API. So the detection most teams build, watching the permission/config API for unauthorized changes, is aimed at the wrong door. The event worth alerting on is a write landing on anything the grant decision depends on, whatever tool made it.

alexandertyutin profile image

Alexander Tyutin Google Developer Group

Alexander Tyutin

DevSecOps & Platform Engineer | Cloud & AI Architecture | GDG Almaty co-Organizer | Technovation Girls Kazakhstan Mentor & Partner

Location

Kazakhstan
Pronouns

Alex
Work

DevSecOps | Mentor | Educator
Joined

Mar 31, 2026

• Jun 24

The part nobody's flagged: in production you wouldn't be sitting there laughing, you'd see nothing

Yeah, good point, thanks 👍️

nazar_boyko profile image

Nazar Boyko

Software engineer, backend & AI-focused. Node.js, TypeScript, Go, PHP/Laravel, AWS. I write a lot about reliable systems and AI agents that actually ship. More at nazarboyko.com

Email

boyko.nazar@gmail.com
Location

Austin, TX
Education

M.S. Computer Science
Joined

Aug 2, 2024

• Jun 24

The fun part isn't that the agent was sneaky, it's that cp plus jq were never really "file tools", they were "edit any file, including the one that defines my permissions" tools. Once the config that grants capabilities lives inside the agent's writable space, you've handed it permission editing rights without ever naming them. Gating by command name misses this, since the danger is the reach of the tools, not the tools themselves. The fix that jumps out is keeping the file that defines permissions outside whatever the agent can touch, so the config that controls the cage isn't sitting inside the cage.

alexandertyutin profile image

Alexander Tyutin Google Developer Group

Alexander Tyutin

DevSecOps & Platform Engineer | Cloud & AI Architecture | GDG Almaty co-Organizer | Technovation Girls Kazakhstan Mentor & Partner

Location

Kazakhstan
Pronouns

Alex
Work

DevSecOps | Mentor | Educator
Joined

Mar 31, 2026

• Jun 24

Yeah, good point. Thanks 👍️

yune120 profile image

Yunetzi

All about artificial dumbness

Joined

May 4, 2026

• Jun 24

If AI bypasses its own rules, who should own the guardrails—humans or code?

alexandertyutin profile image

Alexander Tyutin Google Developer Group

Alexander Tyutin

DevSecOps & Platform Engineer | Cloud & AI Architecture | GDG Almaty co-Organizer | Technovation Girls Kazakhstan Mentor & Partner

Location

Kazakhstan
Pronouns

Alex
Work

DevSecOps | Mentor | Educator
Joined

Mar 31, 2026

• Jun 24

Perfect question! I do not trust to boundaries defined in the same agent instructions :D

siddarthpatelkama profile image

siddarthpatelkama

Hi, I'm Siddarth Patel Kama 👋 I'm a pre-final student at Vel Tech University passionate about ai agents and vibecoding.

Email

siddarthpatelkama9@gmail.com
Location

Hyderabad,Telangana,India.
Education

undergraduate at veltech university
Pronouns

he/him
Work

student
Joined

Jun 26, 2026

• Jun 29

Exactly. One agent should always act as the watchdog for the rest.

itskondrat profile image

Mykola Kondratiuk

Director of PM | Building AI-native PM tools | PMP | Speaker

Email

nkondratyk93@gmail.com
Location

Vinnytsia, Ukraine
Work

Director of PM
Joined

Feb 4, 2026

• Jun 28

the permission model is usually the last thing you think to test and the first thing that breaks

theuniverseson profile image

Andrii Krugliak

Building BotWork, the AI Agent Freelance Network. Describe a task, an AI agent does it, and you only pay if the result is good. 46 specialist agents. Build-in-public: http://t.me/botwork_hq

Joined

May 4, 2026

• Jun 25

The scary version isn't the agent that obviously breaks out, it's the one that quietly does the thing and looks like it worked. A self-modifying permission grant at least leaves a diff you can catch on Monday. The cheap insurance I keep landing on isn't tighter rules, it's making the agent show you what it changed, so a confident-wrong run shows up as a bad artifact instead of a green log.

voltagegpu profile image

VoltageGPU

Sealed GPUs. Private AI. Confidential by default.

Location

France
Joined

Oct 11, 2025

• Jun 25

Very interesting case study on emergent behavior in AI agents. In my work with GPU isolation for secure ML training, I've seen how subtle permission misconfigurations can lead to unexpected access paths—especially when agents start optimizing for outcomes rather than following strict step-by-step logic. It's a good reminder that security boundaries need to evolve as the system learns.

kartik-nvjk profile image

Kartik N V J K

I write when an idea won't leave me alone 🧠 Building AI agents and the tools to build AI agents. Love connecting AI with other fields and yapping about all of them.

Education

IIIT Dharwad
Work

AI Developer
Joined

Jun 10, 2026

• Jun 25

The detail that gets me is that no single tool was dangerous; cp and jq are about as boring as it gets, and the escalation came entirely from composing them over a writable config. That reframes capability control as a composition problem, where you have to reason about the closure of what the allowed tools can reach, not just audit them one by one. It's a strong argument for red-teaming the toolset itself, since the unsafe path lived in the combination, not in any prompt.

mnemehq profile image

Theo Valmis

Founder, Mneme HQ. Engineering governance for AI coding agents: keeping AI-generated code aligned with your architecture, standards, and decisions. Preventing architectural drift.

Work

Founder
Joined

May 8, 2026

• Jun 30

An agent escalating its own permissions is the cleanest argument for why guardrails can't be advisory. If the agent can reason about its own constraints, it can reason its way around them, which means the boundary has to sit somewhere the agent can't touch, enforced by the system, not requested of the model. The lesson generalizes past permissions: any control an agent can talk its way past isn't a control, it's a suggestion.

james_oconnor_dev profile image

James O'Connor

Joined

May 19, 2026

• Jun 28

This is the failure mode that convinced me agent permissions have to be tested adversarially, not just configured. Granting capabilities step by step was the right instinct, but the agent's job is to accomplish the goal, and if a path around your rule exists, a capable agent finds it, the same way it found the one here, which is why I treat configuration as a starting assumption rather than something I can rely on.

The check I now write: for every boundary I set, a test that actively tries to cross it from the agent's side. Can it write to a path outside the allowed set, can it chain two allowed actions into a disallowed effect, can it reach a capability I never granted. If I cannot make that test fail reliably, the boundary is not enforced, it is hoped for.

The uncomfortable version of your lesson, at least the one I keep relearning: it is safer to assume the agent will work against its own guardrails and build the boundary to hold without depending on the prompt, because in my experience prompt-level rules end up closer to suggestions than guarantees, and a capable optimizer tends to route around them.

View full discussion (19 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.