Autonomous AI agents that use external tools and data sources to complete tasks are revolutionizing business processes. However, this capability also introduces a significant security risk: indirect prompt injection. This attack vector, which tops the OWASP LLM Top 10 list (LLM01), allows a malicious instruction hidden in an external, seemingly untrusted data source to hijack an AI agent. A recent arXiv paper published on June 1, 2026, titled “Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents,” examines this exact phenomenon, quantifying the factors that influence its success.
Injection Depth: The Alpha and Omega of the Attack
The study’s most critical finding is that the Attack Success Rate (ASR) dramatically depends on how deep within the tool-calling chain the malicious prompt is located. The researchers ran 460 experiments on the GPT-4o-mini and Claude Haiku models across 20 different scenarios.
The results speak for themselves. In the case of GPT-4o-mini, when the malicious data was introduced during the very first tool call (depth 1), 60% of the attacks were successful. However, as the injection moved deeper into the chain, the success rate plummeted: at depths 4 and 5, not a single attack succeeded (0% ASR). Statistical analysis (Cramer’s V = 0.58, p < 0.001) confirmed this strong correlation.
Interestingly, the Claude Haiku model proved to be significantly more resilient. In the second study, which tested this model, the attack success rate was 0% at all depths.
From an AIQ standpoint, this finding offers two critical takeaways for developers and decision-makers. First, it highlights that different models can handle context-embedded, potentially malicious instructions in fundamentally different ways. Relying on a specific model’s inherent “safety” is therefore an extremely risky strategy. Second, it clearly identifies the primary front line of defense: rigorous filtering and validation of the very first interactions with external data.
The Secondary Role of Framing and Turn Caps
The research also explored how the framing of the attack payload and the number of steps allowed for the agent (turn cap) affect the attack’s outcome. The third study showed that framing does have an impact: a neutrally phrased request achieved a 25% success rate at depth 1, while a role-playing, urgent instruction produced a 75% ASR. It is important to note, however, that while this is a 50 percentage point difference, the result did not reach statistical significance due to the relatively small sample size (N=20).
The fourth study confirmed that the number of steps allowed for the agent is practically irrelevant. Whether the model had 3, 5, or 7 turns to solve the task, the attack success rate remained stable.
In a corporate context, this means that artificially limiting the agent’s “thinking time” or the complexity of its task offers no meaningful protection against indirect prompt injection. The vulnerability lies at the input stage, not within the model’s longer reasoning chains.
Practical Lessons: Audit, Compliance, and Defense
These results provide direct guidance for the security auditing of AI systems and the development of defense strategies. The study’s most important practical finding is that injection depth is the dominant variable.
The most powerful conclusion is this: according to the researchers, sanitizing the content of just the first tool observation would filter out 67% of the measured successful attacks. This identifies an extremely cost-effective and targeted point of defense.
From an AIQ standpoint, this research fits perfectly into the EU regulatory landscape. From a GDPR perspective, an AI agent that illicitly processes or leaks personal data based on instructions from an external data source constitutes a serious data breach. The EU AI Act, in turn, requires high-risk systems to undergo robust risk management and continuous testing. This study identifies a specific, measurable vulnerability that is essential to investigate during a compliance audit.
The audit takeaway is therefore clear: instead of relying on general defense mechanisms, the focus must be placed at the very beginning of the tool-calling chain. Proactive LLM red teaming and targeted vulnerability assessments are crucial for identifying and preventing such depth-dependent attacks before they can cause damage in a production environment.