The Emerging Attack Surface of AI Coding Assistants
The integration of AI-powered tools like Cursor, OpenAI Codex, and GitHub Copilot into modern development workflows is accelerating at an unprecedented pace. While these agentic systems promise enhanced productivity, they also introduce a novel and expanding attack surface. The core of this new risk lies in their fundamental architecture: Large Language Models (LLMs) are granted increasing autonomy to interpret data and execute actions on a developer’s behalf. This combination of assistive alignment and autonomous capabilities creates a fertile ground for exploitation.
This analysis details an attack framework, originally presented at Black Hat USA in August 2025, that demonstrates how a simple watering hole attack leveraging indirect prompt injection can achieve remote code execution (RCE) on a developer’s machine. We will dissect the mechanisms that make these agents vulnerable and explore the practical steps an attacker can take to turn a helpful coding assistant into a malicious operative.
Threat Modeling the Computer Use Agent (CUA)
For the purpose of this analysis, we define a Computer Use Agent (CUA) as any agent capable of autonomously executing actions and tools on a host machine with the permissions of the logged-in user. These systems operate in a loop: an LLM parses user queries, code, and tool outputs to determine a subsequent action—such as mouse movements, file edits, or command-line executions—and the cycle repeats until the user’s objective is met.
From a security perspective, we classify these CUAs as level 3 agents. Their execution path is governed by a model (typically an LLM, sometimes augmented with vision models) that receives feedback from its own actions. This creates a high degree of nondeterminism; the execution flow for a given query is not statically predictable and may vary with each run. This volatility, coupled with the agent’s inherent privilege to execute commands, presents a significant security challenge and a prime opportunity for attackers.
Deconstructing the Attack: From Watering Hole to RCE
A successful exploit requires a deep understanding of the target agent’s capabilities, alignment, and data sources. Agentic tools like Cursor, particularly with its auto-run feature enabled, are designed to autonomously resolve tasks by editing code and running terminal commands. A review of Cursor’s system prompts reveals a critical data ingestion pathway: it is explicitly instructed to process a repository’s pull requests and issues. This data source is inherently untrusted, as external contributors can submit both.
This opens the door for an indirect prompt injection, where a malicious payload is embedded within the untrusted data that the agent is designed to retrieve and process. Our objective was to craft an injection that, under normal developer usage, would result in code execution.
Attack Path 1: The Direct Injection via GitHub Issues
Our initial attack vector targeted a hypothetical Python library, “PyCronos.” The first step was to prepare a payload.
- Payload Development: We crafted a basic PowerShell script to establish a reverse shell, targeting developers on Windows environments. To evade simple endpoint detection, the script was recursively obfuscated using open-source tools until it successfully bypassed baseline Windows Defender protections.
- Planting the Payload: The obfuscated PowerShell payload was hosted in a new GitHub repository (`win-pycronos`) under an attacker-controlled account (`pycronos-integration`).
- The Injection: We created a new issue in the target `PyCronos` repository. The issue was designed to socially engineer the agent, claiming a (non-existent) Windows integration was broken and providing a “reproduction step” that was, in fact, our malicious command to download and execute the payload.
When this attack was tested against a demo release of Anthropic’s Computer Use Agent, the results were immediate. When tasked with a general prompt like, “Help me resolve open issues in this repository,” the CUA performed as follows:
- The agent navigated to the malicious GitHub issue.
- It used its vision capabilities to parse the issue content, including the screenshot.
- It identified the command provided for “reproduction.”
- It executed the command, downloading and running our payload, which granted us a reverse shell on the host machine.
However, running the same attack against Cursor yielded a different result. Cursor, which parses the issue’s text metadata rather than relying on vision, identified the attempt to download and execute remote code. It correctly flagged the command as a security risk, informed the user, and refused to proceed. This demonstrated the presence of input-level guardrails designed to detect obviously malicious command patterns.
Attack Path 2: Evasion via Malicious Dependency
The failure of our first attempt revealed a key insight: to bypass Cursor’s guardrails, the command executed by the agent needed to appear benign, while the malicious activity had to be triggered as a side effect. We pivoted our strategy to focus on the dependency management workflow.
- Weaponizing a Python Package: From our attacker-controlled account, we created a new, seemingly harmless Python package named `pycronos-windows`. The malicious logic was embedded within its `setup.py` file. We configured it so that the command to download and execute our remote PowerShell payload would be run automatically upon installation (`pip install`).
- The Pull Request: We submitted a pull request to the target `PyCronos` repository. The PR simply added our new `pycronos-windows` package to the `requirements.txt` file.
- The Exploit: A developer using Cursor is prompted to review the open pull request. The agent, in its attempt to be helpful, checks out the PR branch to test the changes. A standard step in this process is to install the updated dependencies by running
pip install -r requirements.txt.
This time, the attack was successful. The agent executed the `pip install` command, which is not inherently malicious. However, during the installation of our booby-trapped package, the code within `setup.py` was executed, establishing a reverse shell and giving us full RCE on the developer’s machine.
The Core Vulnerability: Trusting the Untrusted
This successful exploit highlights the fundamental pattern of vulnerability in agentic systems: an overly privileged agent that treats untrusted, external data as trusted instructions can be subverted to act on an attacker’s behalf. The agent’s assistive alignment compels it to fulfill the perceived task (testing a PR), while its autonomous execution capabilities provide the mechanism for compromise.
Mitigation Strategies for Agentic Architectures
As detailed in our Black Hat USA 2025 presentation, “From Prompts to Pwns: Exploiting and Securing AI Agents,” organizations must adopt an “assume prompt injection” security posture when developing or deploying agentic applications. If an agent uses an LLM to determine its actions, you must assume an attacker can control the LLM’s output and, consequently, all downstream tool calls and actions.
We recommend a defense-in-depth approach:
- Harden the Model: Utilize LLM vulnerability scanners like NVIDIA’s
garakto proactively test for known injection vulnerabilities. Implement frameworks like NeMo Guardrails to filter and sanitize LLM inputs and outputs. - Restrict Autonomy: The most secure approach is to limit the agent’s degree of freedom. Favor predefined, narrow workflows over open-ended, arbitrary plan execution.
- Enforce Human-in-the-Loop (HITL): Mandate user approval for sensitive actions, such as executing terminal commands, modifying critical files, or accessing network resources, especially when processing untrusted data.
- Isolate the Execution Environment: If full autonomy is a requirement, the agent must be heavily sandboxed. Run CUAs in isolated environments, such as standalone virtual machines with strict network egress filtering and no access to sensitive enterprise or user data. Local development containers offer a lesser, but still valuable, degree of isolation.
- Leverage Platform-Specific Controls: For tools like Cursor, use available enterprise controls to disable auto-run features or limit their scope by allowlisting specific, known-safe commands. Utilize features that spawn autonomous agents within isolated cloud infrastructure rather than on the local machine.
Agentic coding workflows are revolutionizing software development. However, harnessing their efficiency safely requires a clear understanding of the associated risks and the implementation of robust security policies. Without these controls, we are effectively giving untrusted external data a privileged shell on our most sensitive systems.