4.3.3 Input Preprocessing Techniques

2025.10.06.
AI Security Blog

Before an input ever reaches your model, you have an opportunity to disrupt an attack. Think of input preprocessing as a decontamination chamber: it attempts to neutralize or remove malicious perturbations from an input before it can do any harm. This approach is powerful because it’s often model-agnostic, meaning you can apply it as a protective wrapper around an already trained system.

Unlike adversarial training or defensive distillation, which modify the model itself, input preprocessing techniques are transformations applied directly to the data. The core idea is that many adversarial perturbations are brittle and finely tuned. By applying a transformation—even a simple one—you can destroy the perturbation’s structure, causing the input to be classified correctly.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

The Preprocessing Defense Pipeline

The defensive pipeline is straightforward. Instead of feeding data directly to the model, you insert a preprocessing step. This layer applies one or more transformations to every incoming input, whether it’s benign or adversarial.

Diagram of the input preprocessing defense pipeline. Original Input Preprocessing Layer Target Model Prediction

The effectiveness of this approach hinges entirely on the chosen transformation(s). The goal is to find operations that significantly impact adversarial noise while minimally affecting the features of benign inputs.

Common Preprocessing Strategies

Several families of transformations have proven effective as input defenses. Let’s explore the most common ones.

Feature Squeezing

Feature squeezing aims to reduce the complexity of the input data, thereby “squeezing out” the adversarial perturbation. By limiting the number of available options for each feature (e.g., pixel color), you reduce the search space an attacker can exploit.

  • Color Depth Reduction: An 8-bit color image has 256 possible values per channel. A small perturbation might change a pixel value from 120 to 122. If you reduce the color depth to 4-bit (16 values), both 120 and 122 might be mapped to the same new value, effectively erasing the perturbation.
  • Smoothing/Blurring: Techniques like Gaussian blur or a median filter average the values of neighboring pixels. This smooths out the high-frequency, noise-like patterns characteristic of many adversarial perturbations.
# Pseudocode for color depth reduction (squeezing)
def reduce_color_depth(image, bit_depth=4):
    # Max value for an 8-bit image is 255
    max_val = 255 
    
    # Calculate the number of possible values for the new depth
    levels = 2 ** bit_depth
    
    # Squeeze the image by quantizing pixel values
    squeezed_image = round(image / max_val * (levels - 1))
    squeezed_image = squeezed_image * (max_val / (levels - 1))
    
    return squeezed_image

Spatial and Geometric Transformations

These methods alter the image’s spatial structure. Adversarial attacks often rely on precise pixel placements, and disrupting this layout can neutralize the threat.

  • Random Resizing and Cropping: Before feeding an image to the model, you can randomly resize it to a slightly larger dimension and then take a random crop of the original input size. The attacker cannot know the exact crop in advance, making it difficult to craft a perturbation that survives the transformation.
  • Padding: Similar to cropping, adding random padding around the image can shift the perturbation’s alignment relative to the model’s receptive fields.

Lossy Compression

Algorithms like JPEG are designed to discard information that is imperceptible to the human eye. It turns out that this is often the same kind of high-frequency, low-amplitude information that constitutes an adversarial perturbation. Compressing an adversarial image with JPEG and then decompressing it can effectively “filter out” the attack.

Evaluating Preprocessing Defenses

A summary of these techniques highlights the trade-offs you must consider as a defender.

Technique Defensive Principle Pros Cons
Feature Squeezing Reduces the search space for perturbations. Simple to implement, computationally inexpensive. Can degrade benign accuracy. Vulnerable to adaptive attacks.
Spatial Transformations Disrupts the precise alignment of perturbations. Effective against attacks assuming a fixed input structure. Randomization makes it harder to bypass. Can crop out important features. May not be suitable for all data types.
Lossy Compression (JPEG) Removes high-frequency signals imperceptible to humans. Leverages a well-understood algorithm. Often effective with minimal impact on benign inputs. Attackers can design perturbations that survive compression. Compression level is a critical parameter.
Randomization Creates a moving target by applying random transformations. Significantly increases the difficulty for an attacker to craft a successful, transferable perturbation. Can introduce unpredictable behavior. Performance might vary between inferences.

The Red Teamer’s Response: Adaptive Attacks

As a red teamer, your job is to break defenses. Input preprocessing, especially when deterministic, presents a clear target. If you know a defender is applying a specific Gaussian blur with a fixed sigma, you don’t attack the original model. You attack the entire pipeline: blur followed by model.

This is known as an adaptive attack. You incorporate the defender’s exact preprocessing step into your attack generation process. By calculating the gradient through the transformation, you can create a perturbation that is robust to that specific defense. This is a crucial concept known as bypassing “obfuscated gradients”—the defense doesn’t truly remove the model’s vulnerability but merely hides it behind a predictable filter.

This is why randomization is key to making preprocessing defenses more robust. If the defender applies a random crop or a random level of compression, the attacker cannot easily model the defense. They might have to average the attack over many possible transformations, a technique called Expectation Over Transformation (EOT), which is more complex and less certain to succeed.

Ultimately, while input preprocessing is a valuable and practical first line of defense, it’s rarely a complete solution. It serves best as one layer in a defense-in-depth strategy, compelling an attacker to work harder and increasing the chances of their attack being detected or neutralized by another mechanism.