1.2.3 New challenges of the AI revolution

2025.10.06.
AI Security Blog

Traditional cybersecurity principles, built over decades to protect deterministic, code-based systems, are fundamentally challenged by the rise of AI. The shift from explicit logic to learned behavior doesn’t just add a new layer to secure; it changes the very nature of what a vulnerability is and where you might find it. Your playbook as a security professional must evolve accordingly.

The Expanded and Permeable Attack Surface

In conventional software security, the attack surface is relatively well-defined: network ports, APIs, user input fields, and application code. You can map it, scan it, and build firewalls around it. An AI system, however, has a vastly larger and more abstract attack surface that extends far beyond the running application.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Think of it this way: you’re no longer just guarding a castle. You must now also protect the quarries where its stones were mined, the blueprints used by the architects, and the psychological state of the guards. The integrity of the final system depends on the integrity of its entire lifecycle.

Comparison of Traditional vs. AI System Attack Surfaces Traditional Attack Surface Network/Firewall Web Server / API Application Code Database SQLi, XSS AI System Attack Surface Training Data MLOps Pipeline Model Weights Inference API Data Poisoning Model Stealing Evasion Attacks

Figure 1: The AI attack surface introduces new vectors targeting the data, training process, and the model itself, in addition to traditional infrastructure.

The “Black Box” Dilemma and Emergent Flaws

A significant challenge in securing AI, particularly deep learning models, is their inherent opacity. For many complex models, we can observe their inputs and outputs, but we cannot fully explain their internal decision-making logic. This is often referred to as the “black box” problem.

Unlike traditional software, where a bug can be traced to a specific line of code, a flaw in an AI model might be an “emergent property” of the complex interplay between millions of parameters. It wasn’t explicitly programmed; it was learned. This means you can’t simply audit the code to find the vulnerability. Instead, you must probe the model’s behavior, testing its logical boundaries and looking for unexpected, exploitable responses.

New Classes of Vulnerabilities

The unique properties of machine learning models give rise to entirely new categories of attacks that have no direct parallel in traditional cybersecurity. Your red teaming exercises must now account for threats that target the integrity, confidentiality, and availability of the model’s learned logic.

Vulnerability Type Description Impact Example
Evasion (Adversarial Examples) Crafting a malicious input that is subtly modified to cause an incorrect classification by the model. A self-driving car’s vision system misclassifies a “Stop” sign as a “Speed Limit” sign due to imperceptible stickers placed on it.
Data Poisoning Manipulating the training data to introduce a hidden backdoor or bias into the trained model. An attacker subtly taints a malware dataset to ensure their specific malware is always classified as “benign” by the new model.
Model Inversion / Extraction Querying a model’s public API to reconstruct sensitive information from its training data or to steal the model itself. Recreating a facial recognition model’s training photos of individuals by repeatedly querying it.
Membership Inference Determining whether a specific data record was part of a model’s training set. A hospital’s diagnostic AI inadvertently reveals that a specific person’s medical data was used for training, violating their privacy.

The most famous of these is the adversarial example. It highlights how a model’s understanding of the world is profoundly different from a human’s. A change completely invisible to you can fundamentally alter the model’s conclusion.

Example: The Logic of an Adversarial Attack

Consider how a simple evasion attack works. The goal is not to crash the system but to trick its logic with minimal changes.

// PSEUDOCODE: Adversarial Evasion Attack

// 1. Get a baseline prediction from the target model
original_image = load_image("cat.jpg")
prediction = model.predict(original_image)
// Result: { "Cat": 0.98, "Dog": 0.01, ... }

// 2. Define a malicious goal (e.g., misclassify as "Toaster")
target_class = "Toaster"

// 3. Calculate the smallest possible change (noise) to the image
//    that pushes the model's prediction towards the target
adversarial_noise = calculate_gradient(model, original_image, target_class)
tiny_noise = adversarial_noise * 0.005 // Keep it imperceptible

// 4. Create the new, malicious input
adversarial_image = original_image + tiny_noise

// 5. Get the new prediction
new_prediction = model.predict(adversarial_image)
// Result: { "Toaster": 0.99, "Cat": 0.001, ... }
// The image still looks like a cat to a human.

The Automation of Offense

Finally, it’s crucial to recognize that AI is a dual-use technology. While we focus on defending AI systems, adversaries are actively using AI to enhance their own offensive capabilities. This creates a dangerous escalatory cycle. AI can be used to:

  • Generate highly convincing phishing emails at a massive scale.
  • Discover novel software vulnerabilities through automated code analysis (AI-powered fuzzing).
  • Automate reconnaissance and social engineering by scraping and analyzing public data.
  • Create polymorphic malware that adapts to evade detection.

This means that as a defender and red teamer, you are not just fighting a human adversary anymore. You are increasingly up against automated, adaptive, and intelligent offensive systems. The speed and scale of these AI-driven attacks necessitate a similar evolution in our defensive strategies, moving towards AI-augmented security operations and more dynamic, continuous red teaming.