20.1.3 Complexity-theoretic perspectives

2025.10.06.
AI Security Blog

Previous discussions centered on the existence of adversarial examples and the empirical nature of security. Now, we ask a more fundamental question: is finding a vulnerability computationally “easy” or “hard”? The answer determines whether AI security is a solvable engineering problem or an ongoing arms race governed by computational budgets. Computational complexity theory provides the formal language to explore this very question.

It reframes the red teaming task from a search for a single bug to an analysis of the intrinsic difficulty of the problem itself. If breaking a model is computationally cheap, no defense will ever be truly effective. If it’s expensive, we can build defenses that are not perfect, but are “good enough” by making the cost of attack prohibitively high for a rational adversary.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

The Attacker’s Problem: Easy to Verify, Hard to Find?

Consider the core task of finding an adversarial example. You are given a model, an input, and a constraint (e.g., L-infinity norm less than ε). Your goal is to find a perturbation that causes a misclassification. Complexity theory classifies problems based on the resources—typically time—required to solve them as the input size grows.

Two crucial classes are P and NP:

  • P (Polynomial Time): Problems in this class are considered “easy” or “tractable.” An algorithm exists that can solve them in a time that scales polynomially with the size of the input. If finding adversarial examples were in P, you could write an efficient, guaranteed program to break any model.
  • NP (Nondeterministic Polynomial Time): This class contains problems where, if you are given a potential solution, you can verify its correctness in polynomial time. Finding an adversarial example fits this description perfectly: if someone hands you a perturbed image, you can quickly run it through the model, check the new label, and measure the perturbation size to confirm it is a valid adversarial example.

The central, unresolved question in computer science is whether P = NP. For AI security, the implications are profound. If P = NP, it means any problem that is easy to verify is also easy to solve. This would imply that finding adversarial examples is fundamentally easy, and robust defenses might be impossible. Most computer scientists believe P ≠ NP, which suggests that there are problems—like finding adversarial examples—that are easy to check but hard to solve optimally.

This “hardness” is why red teamers rely on heuristics like gradient descent (e.g., PGD). You aren’t solving the problem of finding the *absolute best* perturbation; you’re using an efficient method to find a *good enough* one.

The Defender’s Dilemma: The Hardship of Proof

Now, flip the perspective to the defender. Their ultimate goal is not just to stop one attack, but to prove that *no* successful attack (within certain constraints) exists. This is the problem of formal verification or robust certification.

This “non-existence” proof belongs to a different complexity class: co-NP. A problem is in co-NP if a “no” instance has a short, efficiently verifiable proof. For example, if a model were truly robust, a defender could theoretically provide you with a mathematical “certificate of robustness” that you could quickly check.

Unfortunately, for many common neural networks (specifically those with ReLU activations), the problem of certifying robustness has been proven to be NP-hard. This implies that the corresponding “no” problem is co-NP-hard. In simple terms, this means that unless P = NP, there is no efficient, universal algorithm that can prove a neural network is robust. This is a crucial theoretical barrier: it tells us that for any reasonably complex model, a complete, formal guarantee of security is likely computationally intractable.

Complexity Classes in AI Security P NP co-NP Verifying an adversarial example Verifying a robustness certificate If P=NP, then finding attacks and proving robustness are both “easy”.
Figure 20.1.3-1: A conceptual map of complexity classes relevant to AI security. The core challenge lies in problems believed to be in NP or co-NP but not in P.

From Cryptography to AI: Hardness Assumptions

Modern cryptography isn’t built on proofs of absolute security, but on **hardness assumptions**. For example, RSA encryption assumes that factoring the product of two large prime numbers is computationally intractable. We don’t have a mathematical proof, but decades of research have failed to find an efficient algorithm, giving us confidence in this assumption.

AI security is moving in a similar direction. We can build systems based on the assumption that certain problems related to neural networks are hard. This provides a more solid foundation for defense than the cat-and-mouse game of patching against the latest attack.

Table 20.1.3-1: Comparison of Hardness Assumptions in Cryptography and AI Security
Domain “Easy” Problem (Verification) Assumed “Hard” Problem (Attack/Proof)
Cryptography (RSA) Multiplying two large prime numbers. Factoring their product to find the original primes.
AI Security (Attack) Given a perturbation, verifying it causes misclassification within a budget. Finding the minimal perturbation that causes misclassification (NP-hard).
AI Security (Defense) Given a certificate of robustness, verifying its mathematical validity. Generating a certificate of robustness for a large, arbitrary network (co-NP-hard).

Practical Takeaways for the AI Red Teamer

This theoretical perspective has direct consequences for your work:

  • Justification for Heuristics: Complexity theory explains why your practical, efficient attacks (like PGD) are valuable. Since finding the “perfect” attack is likely intractable, finding *any* valid attack quickly is the primary goal. Your role is not to solve an NP-hard problem optimally, but to demonstrate its feasibility.
  • Understanding Attack Ceilings: When an attack fails or takes a long time, it may not be a flaw in your tooling but a reflection of the problem’s intrinsic difficulty. This is especially true against defenses designed to create complex, non-linear loss landscapes, which are computationally expensive to navigate.
  • Challenging Security Claims: The hardness of certification is your most powerful tool for scrutinizing defensive claims. When a team asserts their model is “robust,” you can ask for the basis of that claim. Is it based on empirical results against a few known attacks, or a formal certificate? Since full certification is intractable, any claim of absolute security for a complex model is almost certainly an overstatement. Your job is to find the gaps in their empirical evidence.

Ultimately, complexity theory provides a sobering but essential foundation. It tells us that perfect, provable AI security is likely unattainable. Instead, the field will evolve towards a model of computational security, where the goal is to make attacks so computationally expensive that they are impractical for any rational adversary. Your work as a red teamer is to constantly test the validity of those economic and computational assumptions.