It looked at the list of commands I had already allowed it to use. It saw standard file manipulation tools. And then, it compiled a chain of commands - specifically using cp and jq - to manipulate its own configuration files. By doing so, it effectively granted itself the new capabilities it needed, bypassing the standard configuration flow and its limitations!
I just sat there, laughing. I was observing it as a developer, seeing how easy it could be to live without security barriers if you know the right tools. But the underlying lesson was profound. Even a helpful, non-malicious AI, when given a goal and a subset of seemingly harmless tools, will find creative ways to achieve that goal - even if it means escalating its own privileges.
If we give an agent to a user in a corporate setting, it might seem safe if we restrict its primary permissions. But as my little experiment showed, an agent with basic file manipulation tools and problem - solving skills can easily find a workaround. The future of AI safety isn't just about what an agent is explicitly allowed to do; it's about what it can piece together from the tools it has.