22.5.2 Developing Input Validation

2025.10.06.
AI Security Blog

Before an input ever reaches your model’s inference logic, it passes through a critical gateway. Input validation is that gateway—your first and most fundamental line of defense against a wide range of adversarial attacks. While adversarial training hardens the model itself, input validation acts as a bouncer, rejecting malformed, unexpected, or overtly hostile data before it can do harm. Neglecting this layer is akin to leaving the front door unlocked.

The Philosophy: Assume Hostility

The core principle of secure input validation is to assume all incoming data is potentially malicious. This “zero-trust” mindset shifts the goal from simply ensuring the data is usable to rigorously verifying its integrity, structure, and constraints. An effective validation strategy doesn’t just check for errors; it enforces a strict contract of what constitutes an acceptable input.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

This contract is defined by three primary pillars:

  • Type and Structure Sanity: Does the data conform to the expected data type (e.g., float, integer) and structure (e.g., a 3-channel tensor of a specific dimension)?
  • Range and Value Constraints: Are the values within a plausible, expected range (e.g., pixel values between 0 and 255, probabilities between 0 and 1)?
  • Content Plausibility: Does the input make semantic sense? This is a more advanced check, looking for statistical anomalies or content that violates domain-specific rules.

Visualizing the Validation Pipeline

Think of input validation as a sequential filtering process. Each stage checks for a different class of problems, progressively sanitizing the input before it is passed to the model for inference.

Raw Input Type & Structure Check Range Check Content Check To Model (Sanitized)

Practical Implementation Examples

Let’s translate these principles into code. The following examples use Python and libraries like NumPy, but the concepts are language-agnostic.

1. Type and Shape Validation

The most basic check ensures you’re receiving data in the correct format. An evasion attack might involve sending a 1D vector when a 3D tensor is expected, causing a crash or undefined behavior.

import numpy as np

def validate_image_input(input_data, expected_shape=(224, 224, 3)):
    # Check if the input is a NumPy array
    if not isinstance(input_data, np.ndarray):
        raise ValueError("Input must be a NumPy array.")
    
    # Check for the correct number of dimensions
    if input_data.ndim != len(expected_shape):
        raise ValueError(f"Expected {len(expected_shape)} dimensions, got {input_data.ndim}.")
        
    # Check if the shape matches
    if input_data.shape != expected_shape:
        raise ValueError(f"Expected shape {expected_shape}, got {input_data.shape}.")
        
    return True # Validation passed

2. Range Validation

Adversarial examples often involve adding small perturbations that push pixel values slightly outside the standard [0, 255] or [0, 1] range. While the model might still process them, clipping these values can neutralize some attacks.

def sanitize_pixel_values(image_array, min_val=0.0, max_val=1.0):
    # Ensure values are within the expected normalized range [0, 1]
    # np.clip is efficient for this operation.
    sanitized_array = np.clip(image_array, min_val, max_val)
    
    # Optional: Log if clipping occurred to detect potential attacks
    if np.any(sanitized_array != image_array):
        print("Warning: Input values were clipped. Potential out-of-range attack.")
        
    return sanitized_array

3. Content and Statistical Validation

More sophisticated checks can compare the statistical properties of an input to the training distribution. A sudden shift in mean, variance, or the presence of high-frequency noise can indicate an adversarial perturbation.

# Pseudocode for a statistical check
def check_statistical_drift(input_data, training_mean, training_std, threshold=3.0):
    # Calculate statistics of the new input
    input_mean = np.mean(input_data)
    input_std = np.std(input_data)
    
    # Check if the input's stats deviate significantly from the training data
    mean_drift = abs(input_mean - training_mean)
    std_drift = abs(input_std - training_std)
    
    # If drift is more than 'threshold' standard deviations, flag it
    if mean_drift > (threshold * training_std) or std_drift > (threshold * training_std):
        return False # Potential out-of-distribution or adversarial input
        
    return True

Validation Checks Across Data Modalities

The specific checks you implement will vary significantly based on the type of data your model consumes. Here’s a quick-reference table for common modalities.

Modality Common Validation Checks Example Attack Mitigated
Image Data Shape/dimension check, pixel value range (0-255 or 0-1), data type (e.g., `uint8`, `float32`), EXIF data stripping, statistical checks (mean, variance). Evasion attacks using out-of-range pixel values; denial-of-service via malformed image files.
Text Data Character set validation (e.g., allow only ASCII or specific UTF-8 ranges), length limits, removal of control characters, check for excessive punctuation or repetition. Homograph attacks (using visually similar characters), prompt injection using hidden control characters, buffer overflows from overly long inputs.
Tabular Data Feature type enforcement (numeric, categorical), range checks for numeric features, value checks for categorical features (must be in known set), check for missing values. Data poisoning with out-of-domain categorical values; feature-space attacks that exploit extreme numerical inputs.
Audio Data Sample rate validation, bit depth check, duration limits, amplitude clipping (e.g., -1.0 to 1.0), format check (e.g., WAV, FLAC). Adversarial audio created with imperceptible high-amplitude noise; DoS via excessively large or malformed audio files.

A Foundational, Not Final, Defense

It is crucial to understand that input validation is not a panacea for adversarial threats. Sophisticated evasion attacks, like those generated by PGD, are designed to work within the valid input distribution and will pass most standard validation checks. However, by implementing a robust validation layer, you filter out a significant class of simpler, cruder, and automated attacks.

This defense forces an attacker to work harder, requiring them to craft more subtle and constrained perturbations. Input validation perfectly complements more advanced techniques like adversarial training (covered in 22.5.1) and anomaly detection (covered in 22.5.3), forming an essential part of a comprehensive, defense-in-depth security posture for your AI systems.