But even then, the agent can still exfiltrate anything from the sandbox, using curl. Sandboxing is not enough when you deal with agents that can run arbitrary commands.
If you're worried about a hostile agent, then indeed sandboxing is not enough. In the worst case, an actively malicious agent could even try to escape the sandbox with whatever limited subset of commands it's given.
If you're worried about prompt injection, then restricting access to unfiltered content is enough. That would definitely involve not processing third-party input and removing internet search tools, but the restriction probably doesn't have to be mechanically complete if the agent has also been instructed to use local resources only. Even package installation (uv, npm, etc) would be fine up to the existing risk of supply-chain attacks.
If you're worried about stochastic incompetence (e.g. the agent nukes the production database to fix a misspelled table name), then a sandbox to limit the 'blast radius' of any damage is plenty.
That argument seems to assume a security model where the default prior is « no hostile agent ». But that’s the problem, any agent can be made hostile with a successful prompt injection attack. Basically, assuming there’s no hostile agent is the same as assuming there’s no attacker. I think we can agree a security model that assumes no attacker is insufficient.
Code is not the only thing the agent could exfiltrate, what about API keys for instance? I agree sandboxing for security in depth is good, but it’s not sufficient and can lull you into a false sense of security.
This is what emulators and separate accounts are for. Ideally you can use an emulator and never let the container know about an API key. At worst you can use a dedicated account/key for dev that is isolated from your prod account.
VM + dedicated key with quotas should get you 95% there if you want to experiment around. Waiting is also an option, so much of the workflow changes with months passing so you’re not missing much.
That depends on how you configure or implement your sandbox. If you let it have internet access as part of the sandbox, then yes, but that is your own choice.
Internet access is required to install third party packages, so given the choice almost no one would disable it for a coding agent sandbox.
In practice, it seems to me that the sandbox is only good enough to limit file system access to a certain project, everything else (code or secret exfiltration, installing vulnerable packages, adding prompt injection attacks for others to run) is game if you’re in YOLO mode like pi here.
Right idea but the reason people don't do this in practice is friction. Setting up a throwaway VM for every agent session is annoying enough that everyone just runs YOLO on their host.
I built shellbox (https://shellbox.dev) to make this trivial -- Firecracker microVMs managed entirely over SSH. Create a box, point your agent at it, let it run wild. You can duplicate a box before a risky operation (instant, copy-on-write) and delete it after.
Billing stops when the SSH session disconnects.
No SDK, no container config, just ssh. Any agent that can run shell commands works out of the box.
apart from nearly no one using vms as far as i can tell, even if they were, a vm does not magically solve all the issues, its just a part of the needed tools.