33.2.1 Adversarial CAPTCHA generation

2025.10.06.
AI Security Blog

Instead of creating CAPTCHAs that are universally difficult, what if you could generate challenges specifically designed to fool the very AI models built to solve them? This is the core premise of adversarial CAPTCHA generation—a defensive technique that weaponizes the principles of adversarial machine learning against automated attackers.

The Adversarial Loop: Exploiting a Solver’s Blind Spots

Traditional CAPTCHAs rely on a general assumption of what is “hard for computers.” Adversarial generation takes a more targeted approach. It assumes the adversary is using a specific type of machine learning model (or an ensemble of them) and generates challenges that intentionally trigger that model’s weaknesses or “blind spots.”

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

This process operates within a feedback loop. A CAPTCHA generator creates a challenge, which is then fed into a target solver model (a local copy maintained by the defender). The generator analyzes the solver’s output and confidence score. If the solver succeeds, the generator modifies the challenge to make it harder for that specific model, iterating until it produces a CAPTCHA that the target AI fails but a human can still solve.

CAPTCHA Generator Solver Model Target AI 1. Generate & Test 2. Calculate Loss & Update (if solver succeeds) 3. Deploy CAPTCHA (if solver fails)

This dynamic makes the CAPTCHA a moving target. As attackers develop better solvers, the defender’s generator can be retrained against these new models to find and exploit their unique vulnerabilities.

Generation Methods

From a red teamer’s perspective, understanding how these CAPTCHAs are made is the first step to breaking them. The two primary approaches are gradient-based methods and generative models.

Gradient-Based Perturbations

This technique adapts classic adversarial attacks like the Fast Gradient Sign Method (FGSM). Instead of adding subtle noise to an image to cause misclassification (an evasion attack), the system uses the model’s gradients to determine how to modify the CAPTCHA to maximize the solver’s error (a generation attack). The goal is to push the challenge just outside the solver’s learned decision boundary while keeping it recognizable to humans.

# Pseudocode for generating an adversarial CAPTCHA
function generate_adversarial_captcha(base_challenge, target_solver):
    challenge = base_challenge
    learning_rate = 0.01

    // Iterate to make the challenge harder for the solver
    for i in range(max_iterations):
        prediction = target_solver.predict(challenge)
        
        // If the solver gets it right, we need to modify the challenge
        if prediction == correct_label:
            loss = calculate_loss(prediction, correct_label)
            // Use gradients to find the direction that increases the loss most
            gradient = compute_gradient(loss, challenge)
            // Modify the challenge slightly in that direction
            challenge = challenge + learning_rate * gradient.sign()
        else:
            // Solver failed, this challenge is ready
            return challenge
            
    return challenge // Return best effort if max iterations reached
                

Generative Adversarial Networks (GANs)

A more sophisticated approach uses a GAN. In this setup:

  • The Generator network learns to produce CAPTCHA images.
  • The Discriminator network is the pre-trained target solver model you want to defeat.

The Generator’s objective is to create CAPTCHAs that the Discriminator (the solver) consistently misclassifies. The entire system is trained until the Generator can reliably produce challenges that are both human-readable and AI-proof against the specific solver model.

Red Team Analysis and Exploitation

While powerful, adversarial CAPTCHA systems introduce unique attack surfaces. As a red teamer, your goal is to identify and exploit the assumptions these systems make.

Vulnerability / Weakness Red Team Tactic
Model-Dependence The system is optimized against a specific solver architecture. Use a completely different model (e.g., if they defend against a CNN, attack with a Vision Transformer). The generated CAPTCHAs may not be robust against architectures they weren’t trained on.
Human Usability Threshold Adversarial perturbations can make CAPTCHAs difficult or ambiguous for humans. Conduct a usability study or use automated tools to measure the human failure rate. A system that blocks bots but also blocks 20% of legitimate users is a business failure.
Lack of Diversity The generator might over-optimize and produce a limited set of “hard” patterns. Collect a large sample of CAPTCHAs and analyze them for recurring adversarial artifacts. You can then train a specialized solver to recognize and ignore these specific patterns.
Probing the Feedback Loop If the system adapts in near real-time, you can probe it. Send solutions from a known model. The system might adapt, revealing information about its defensive model. This turns their defense into an oracle that helps you map out their system’s weaknesses.

Your ultimate finding might not be a direct “break” of the CAPTCHA, but rather a demonstration that the adversarial approach forces a trade-off. The defender can either make the CAPTCHA so difficult that it impacts user experience or accept that it can be defeated by an attacker who brings an unanticipated model to the fight.