Securing New Agentic Workers
A practical hardening guide for anyone running Claude Cowork, OpenClaw, or any desktop AI agent that actually does things
A few months ago, giving an AI assistant access to your computer meant it could move your mouse around while you watched nervously. Cute parlour trick. Low stakes.
That's not what we're talking about anymore.
Claude Cowork, launched in January 2026, lets an AI agent read and write your local files, browse the web with your session cookies, execute scheduled tasks unattended, and pull data between applications — all without you in the room. OpenClaw goes further, giving an agent persistent memory and direct shell access. Claude Code has been doing this for developers since 2024, and it's already running on 3,500 machines at Palo Alto Networks alone.
The agentic workforce is real. It's in production. And in most deployments, it's running with far more access than it should have.
This article is a practical hardening guide for anyone who's set up — or is about to set up — a desktop AI agent and wants to do it without giving an attacker a very productive Tuesday afternoon.
First, Understand What You've Actually Installed
Before we get into controls, let's be honest about what these tools are doing on your machine.
When Claude Computer Use takes a screenshot to understand your desktop, it can see everything visible on your screen — personal data, sensitive documents, other people's information. When Cowork connects to Claude in Chrome, any webpage the agent visits can contain hidden instructions designed to manipulate it. When you let an agent access your Documents folder "because it's convenient," you've handed it everything in that folder. Full stop.
Cowork runs in a VM, which gives you some isolation. But it still connects back to your real OS, your real Chrome session, your real Office files. The VM is a speed bump, not a wall.
Two days after Cowork launched, security researchers demonstrated that a Word document with invisible white 1-point text could trick the agent into uploading financial files — including documents containing partial Social Security numbers — to an attacker's Anthropic account. The attack worked because Cowork's VM restricts outbound requests to most domains but treats Anthropic's own API as trusted. The vulnerability had been reported to Anthropic three months earlier.
That's not a hypothetical. That was day two.
The Real Attack Surface
Here's what's actually in scope when you run a desktop agent with default settings:
Prompt injection via anything the agent reads. A malicious PDF in your working folder, a poisoned webpage during a browser task, a crafted calendar invite, a GitHub README — any of these can contain hidden instructions. According to Anthropic's own testing, Claude in Chrome has roughly a 1% attack success rate per page visited. That sounds low until your agent is browsing 200 pages a week unsupervised.
Configuration files as an attack vector.CVE-2025-59536 (CVSS 8.7) demonstrated that simply opening a cloned repository containing a malicious .claude/settings.json or MCP configuration file could execute arbitrary commands — before the trust dialog even appeared. An attacker with write access to any repo your agent touches can potentially own your machine. A follow-up vulnerability, CVE-2026-21852, allowed API key exfiltration via an ANTHROPIC_BASE_URL override, silently redirecting all API traffic to an attacker-controlled server.
Desktop Extensions with no sandboxing. Unlike browser extensions, Claude Desktop Extensions (DXTs) run with full system privileges. In February 2026, LayerX Security demonstrated that a malicious Google Calendar event could trigger arbitrary code execution when Claude was asked to "take care of" calendar events. That one received a CVSS of 10/10. Maximum severity. On a calendar event.
Cross-application data flow. When Cowork is active and connected to Excel and PowerPoint, data can flow between applications without you explicitly directing it. An agent analysing financial data in a spreadsheet might transfer context into a presentation — or somewhere else — mid-session.
Unattended scheduled tasks.The newest Cowork and Dispatch features let you assign tasks from your phone and walk away. A prompt injection loop in a scheduled task can run for hours. No one is watching. In one documented case, an agent that had "check before acting" instructions eroded under sustained social pressure from injected content until it imposed its own denial of service.
The Hardening Checklist
This isn't a theoretical framework. It's what you should actually do before (and after) deploying a desktop agent.
1. Sandbox It — Before Anything Else
The single most effective control is containment. If an agent goes wrong, the question is: how much can it break?
Run agents inside a VM or dedicated container wherever possible. Anthropic explicitly recommends this in their own documentation for Computer Use interactions with external web services. This is not a suggestion for the paranoid — it's in the official docs.
On Windows, avoid enabling WebDAV or granting access to \\* paths. Claude Code's documentation flags this specifically: WebDAV can allow the agent to trigger network requests to remote hosts, bypassing the permission system entirely.
Create a dedicated working folder — something like Claude_Workbench — and copy only the specific files you want the agent to work with. Your tax returns, SSH keys, password manager vault, and .env files should never be in a directory the agent can reach. If it can read it, assume it can eventually leak it.
Lock network listener binding to loopback only where your tool supports it. An agent gateway exposed on 0.0.0.0 is an agent gateway anyone on your network can talk to.
2. Apply Allow-Lists for Everything
Default-deny is the right posture. Explicitly permit what you need; block everything else.
Commands: Maintain a command allow-list. Block destructive operations (rm -rf, recursive deletes, chmod, chown) unless you've made a conscious decision to permit them, with confirmation required.
Filesystem: Which directories can the agent read? Which can it write? Define both. The answer to "which directories can it write" should be short.
Network: Which domains can the agent contact? Claude Code blocks curl and wget by default for this reason — fetching arbitrary content from the internet and piping it into a shell is how you get owned. If your agent needs network access, be specific about what it can reach.
Integrations and MCP servers: Treat every MCP server like a software dependency — because it is one. Real supply chain CVEs exist for MCP configurations. Review source code, use scoped short-lived credentials for authentication, and maintain an explicit allow-list. Don't install from public marketplaces without reading what you're installing.
Require human confirmation for: deleting files, changing permissions, modifying SSH or git config, installing software, sending communications, anything involving credentials, and anything piping remote content into a shell.
3. Manage Secrets Like They'll Be Seen
Assume anything the agent can read might eventually appear somewhere else — logs, memory, screenshots, tool traces, or an attacker's server. Operate accordingly.
Don't store secrets in .env files the agent can access. Use ssh-agent and per-project keys rather than exposing your SSH key directory.
Use scoped, short-lived tokens wherever possible instead of long-lived full-access credentials. A read-only token that expires in an hour does significantly less damage if exfiltrated than a permanent admin key.
Create a separate set of agent credentials that are intentionally limited — lower permissions, scoped access, easy to rotate. Treat agent identity the same way you'd treat a third-party integration: minimum necessary access, revokable at any time.
Since September 2025, Free, Pro, and Max Claude accounts default to sharing conversations with Anthropic for model improvement. In a Cowork context, "conversation" includes everything the agent sees on your screen. Check your privacy settings. If you're working with sensitive data, this matters.
4. Log Everything and Know How to Kill It
You need visibility and a kill switch. Both.
Enable tool execution logging — what commands ran, what files were accessed, what network requests were made. Most serious deployments support OpenTelemetry. Use it. It's imperfect but it's currently the best visibility mechanism available.
Know how to disable integrations quickly. If you suspect something's wrong, you should be able to cut the agent's access to external services in under a minute. Test this before you need it.
Review agent activity logs periodically. Treat this like reviewing privileged user activity — because that's what it is. A non-human identity with escalating access deserves the same scrutiny.
Keep your agent tooling updated. Multiple high-severity CVEs — including both CVE-2025-59536 and CVE-2026-21852 mentioned above — have been patched across 2025 and 2026. Running an outdated version means running with known, documented, public vulnerabilities.
5. Don't Mix Personal and Work Agent Contexts
If your personal agent knows your credentials, your personal calendar, your private notes — don't put it in a shared workspace or team environment. Run a separate, intentionally limited "work agent" with different credentials and no access to your personal accounts or files.
This also applies to shared machines. An agent session running on a machine that other users have access to, with a persistent Chrome session, is an invitation for privilege escalation that requires zero skill to exploit.
The Honest Risk Assessment
No tool is fully immune. Anthropic's own testing puts Claude in Chrome at roughly 1% attack success rate per page — with their safeguards in place. A 2026 international AI safety study found that sophisticated attackers can bypass best-defended models approximately 50% of the time with ten attempts.
Defenders are not losing this fight, but they're not winning by deploying AI agents without controls and hoping for the best either.
The controls above are not exotic. Sandboxing, least privilege, allow-lists, audit logging, secret hygiene — this is standard defensive security practice applied to a new class of tool. The discipline is the same. The urgency is higher, because the blast radius of a misconfigured agent is larger than a misconfigured application, and it acts at machine speed.
A misconfigured web app waits for a human to exploit it. A misconfigured agent is already doing things while you sleep.
Summary: Before You Give an Agent Access to Your Machine
| Control | Why It Matters |
|---|---|
| Run in VM/container | Limits blast radius if compromised |
| Dedicated working folder only | Keeps credentials and sensitive files out of reach |
| Allow-list commands, files, network | Default-deny beats default-allow every time |
| Scoped short-lived credentials | Limits damage from token exfiltration |
| Human approval on destructive actions | The agent's confidence is not always warranted |
| Tool execution logging | You can't investigate what you didn't record |
| Separate agent identity per context | Mixing personal and work access is how things escalate |
| Keep tooling updated | Patched CVEs only help if you apply the patch |
Agentic AI is genuinely useful. It's also a meaningfully different threat surface to anything most security programmes have dealt with before. The organisations that get ahead of this are the ones treating agent identity and access with the same rigour they apply to privileged human users — not the ones crossing their fingers and hoping the model behaves.
If you'd like us to assess your agentic AI deployment, identify misconfigurations, or run adversarial testing against your agent workflows, get in touch.