Prompt Injection: The Achilles’ Heel of LLMs and Corporate Defense

Prompt Injection: The Achilles’ Heel of LLMs and Corporate Defense

While public discourse on AI is dominated by spectacular phenomena like deepfakes and hallucinations, cybersecurity experts are focusing on a much more fundamental and insidious threat. Prompt injection is now considered one of the most severe vulnerabilities in AI applications. The root of the problem is that large language models (LLMs) cannot reliably distinguish between a developer’s instruction and data coming from a user or an external source. This fundamental flaw allows a malicious instruction to override the system’s original operational logic, turning the chatbot or AI agent into a tool for the attacker’s purposes.

Anatomy of the Attack: Direct and Indirect Manipulation

There are two main types of prompt injection attacks. Direct prompt injection is the simplest form, where the attacker directly gives the manipulative instruction to the chatbot. Several high-profile incidents have recently highlighted its dangers. For example, an AI chatbot for a Chevrolet dealership was successfully tricked into offering a car under ridiculous conditions, essentially for free. In another case, the customer service bot for the DPD delivery service was manipulated to start swearing and mocking its own parent company. These cases clearly show how direct attacks can easily lead to reputational and financial damage.

Do you have a question about AI security? You can reach us here:

Indirect prompt injection is a much more sophisticated and dangerous method. In this scenario, the attacker hides the malicious instruction in an external data source that the system processes—such as an email, the text of a website, a PDF document, or even source code. When the AI agent processes this tainted data, the hidden command is activated and executed. The Google DeepMind team observed a 32% increase in malicious indirect prompt injection attempts between November 2025 and February 2026. Their research also identified examples of hidden payment instructions waiting for an autonomous AI agent to execute them once it gained access to the appropriate systems.

Corporate Risks: OWASP, the EU AI Act, and GDPR

From an AIQ standpoint, prompt injection is not just a technical curiosity but a serious business and compliance risk for companies in Hungary and the European Union. The severity of this vulnerability is underscored by its position as number one (LLM01) on the OWASP LLM Top 10 list, which summarizes the most critical security risks for large language models.

In a corporate context, this means the following:

  • EU AI Act Compliance: An AI system vulnerable to prompt injection attacks cannot be considered reliable or robust. Attackers could manipulate the system to violate the core principles of the EU AI Act, for instance, by generating discriminatory or harmful content, which could lead to severe fines. The ability to defend against prompt injection will be a key compliance criterion during audits.
  • GDPR and Data Protection: Imagine an AI agent with access to a company’s customer database to summarize emails or create reports. Through an indirect prompt injection attack (e.g., via a malicious email), an attacker could instruct the agent to send all personal data to an external server. This would lead to a catastrophic data breach, resulting in significant financial and reputational consequences under GDPR.

Defending Against a “Confusable” System

To make matters worse, even the industry’s leading players see no perfect solution to the problem. As early as December 2025, OpenAI stated that the prompt injection problem will likely never be fully solvable. This is supported by a warning from the UK’s National Cyber Security Centre (NCSC), which noted that language models are inherently

“confusable”

systems by nature.

Since the vulnerability stems from the fundamental workings of LLMs, the focus of defense is shifting from technological silver bullets to risk mitigation. From an AIQ perspective, an effective defense is a multi-layered strategy built on the following elements:

  • Principle of Least Privilege: The AI application should only be granted access to the minimum data and systems necessary to perform its task. If it cannot access sensitive data or critical functions, the potential damage from a successful attack is drastically reduced.
  • Narrow and Unambiguous Instructions: System prompts should be as precise and restrictive as possible to reduce the model’s room for “creative” interpretation.
  • Human Approval: For any sensitive or irreversible action (e.g., initiating a payment, modifying a database, sending official communications), mandatory approval from a human user should be required. This is the most reliable line of defense against damage caused by autonomous agents.

In summary, prompt injection is a fundamental challenge that every organization developing and deploying AI must face. The key to defense is not a single technological fix, but a well-thought-out security framework based on minimizing access and ensuring human oversight for critical operations. LLM red teaming and security audits are essential for identifying and managing these risks.

Attila Rácz-Akácosi

Independent AI Security Specialist

Two decades of analytical and systems-oriented experience. I have been working with artificial intelligence since 2017. In recent years, I have specialized in AI/LLM security and AI Red Teaming. Systems-level thinking instead of endless vulnerability checklists.