15.2.1 Anomaly detection systems

2025.10.06.
AI Security Blog

Moving beyond static, rule-based defenses, anomaly detection systems represent a dynamic and adaptive shield for AI. Instead of looking for specific, known attack signatures, these systems learn a model of “normal” behavior and flag any significant deviation. This makes them exceptionally valuable for identifying novel attacks—the very kind that red teams are designed to simulate and uncover.

For an AI system, “normal” is a complex, multi-dimensional concept. It can encompass the statistical distribution of input data, the latency of model inference, the activation patterns within neural network layers, or the confidence scores of predictions. Anomaly detection is the sentinel that watches these patterns and raises an alarm when they are broken.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

The Principle of ‘Normalcy’ in AI Operations

The entire premise of anomaly detection hinges on establishing a reliable baseline of normal operation. This baseline is not a single value but a rich, statistical model of the system’s expected state. The key challenge, and where many systems falter, is that this baseline is not static. It must adapt.

Concept Drift: The Moving Target

The data your AI system processes today will be different from the data it processes in six months. This evolution is called concept drift. A successful anomaly detection system must distinguish between legitimate drift (e.g., changing user preferences) and a malicious deviation (e.g., a slow data poisoning attack). Failure to do so results in a flood of false positives or, worse, missed threats.

Baselines can be established for various aspects of the ML pipeline:

  • Data Properties: Statistical moments (mean, variance), feature correlations, and data distributions of input vectors.
  • Model Behavior: Prediction distributions, confidence scores, and internal state metrics like neuron activation frequencies.
  • System Performance: Inference latency, GPU/CPU utilization, memory consumption.

Architectures for Anomaly Detection

You can implement anomaly detection using a range of techniques, from simple statistical checks to complex deep learning models. The choice depends on the specific threat model, performance requirements, and the data available.

Statistical and Probabilistic Methods

These methods are often the first line of defense due to their simplicity and low computational overhead. They model the normal data using a statistical distribution (e.g., Gaussian) and identify points that are outliers based on probability.

  • Z-Score/Standard Deviation: Effective for univariate data, such as monitoring the length of text prompts or the size of an uploaded image. Any data point falling more than a set number of standard deviations from the mean is flagged.
  • Multivariate Gaussian Distribution: Extends the same principle to multiple features, accounting for correlations between them. This can detect an input where individual features are normal, but their combination is highly improbable.

Unsupervised Machine Learning Methods

When attacks are unknown, you can’t train a classifier to find them. Unsupervised methods excel here, as they learn the inherent structure of your normal data without needing explicit labels for attacks.

Method Core Principle Best Use Case in AI Security
Clustering (e.g., DBSCAN) Normal data points form dense clusters, while anomalies are isolated points in low-density regions. Identifying out-of-distribution inputs that don’t fit into any known category of legitimate user data.
Isolation Forest Anomalies are “few and different,” making them easier to isolate in a tree structure. It builds an ensemble of trees to identify outliers. High-performance, real-time detection of anomalous API calls or feature vectors before they reach the model.
Autoencoders A neural network trained to reconstruct its input. It learns a compressed representation of normal data. Anomalous data results in high reconstruction error. Detecting sophisticated adversarial examples that are visually similar to normal data but differ in their underlying structure.

Autoencoders are particularly powerful for high-dimensional data like images or embeddings. The model is forced to learn the essential features of the normal data distribution. When an adversarial or malformed input is presented, the autoencoder struggles to rebuild it accurately from its compressed representation.

# Pseudocode for an Autoencoder-based anomaly detector
function train_anomaly_detector(normal_data):
    autoencoder = build_autoencoder_model()
    autoencoder.train(normal_data)
    
    # Calculate reconstruction errors on the same normal data to find a threshold
    reconstructions = autoencoder.predict(normal_data)
    errors = mean_squared_error(normal_data, reconstructions)
    threshold = errors.mean() + 3 * errors.std() // Example: mean + 3 std deviations
    
    return autoencoder, threshold

function is_anomalous(input_data, autoencoder, threshold):
    reconstruction = autoencoder.predict(input_data)
    error = mean_squared_error(input_data, reconstruction)
    
    return error > threshold

Deploying Anomaly Detectors Across the ML Lifecycle

A robust defense strategy doesn’t place a single detector at one point. Instead, you should layer them throughout the ML pipeline to create a defense-in-depth architecture. Each location provides a different perspective on system behavior.

Data Ingest / API Model Inference Prediction Output Application Logic Input Detector Checks input stats Latent Space Monitor Checks activations Output Detector Checks confidence

  1. Input-Layer Detection: This is your perimeter. Here, you analyze incoming data before it ever touches the core model. You can check for statistical anomalies, out-of-distribution samples, or high reconstruction error from a dedicated input autoencoder. This is your best chance to catch prompt injections, malformed inputs, and some adversarial examples.
  2. Latent-Space Monitoring: A more sophisticated approach that inspects the model’s internal state. You establish a baseline of normal activation patterns or embedding distributions in the model’s hidden layers. An adversarial input, though it looks normal on the surface, may create a highly unusual path through the network, which this monitor can detect.
  3. Output-Layer Detection: The final checkpoint. This system monitors the model’s predictions. Anomalies could include a sudden drop in average prediction confidence, a shift in the distribution of predicted classes, or outputs that violate semantic or logical rules defined by the application.

The Red Teamer’s Lens: Evading Detection

As a red teamer, your goal is to bypass these defenses. Understanding how they work is the first step to defeating them. Anomaly detection systems are not infallible and have predictable weaknesses.

  • Baseline Poisoning: If you can slowly introduce malicious data that the system incorporates into its model of “normal,” you can gradually shift the baseline. This is a slow-burn attack that makes your eventual, more aggressive attack look like a normal part of system operation.
  • Normalization Mimicry: Craft adversarial inputs that conform to the expected statistical properties of the normal data. For example, if the detector only checks the mean and variance of pixel values in an image, you can create an attack that preserves these stats while still fooling the model.
  • Exploiting the Threshold: Anomaly detectors operate on a threshold (e.g., “flag anything with a reconstruction error above 0.2”). Your goal is to craft an attack that achieves its objective while staying just below that threshold. This often involves an optimization process where you minimize both the model’s loss (to make it misclassify) and the anomaly score.

Ultimately, anomaly detection is a powerful, proactive layer in a defense-in-depth strategy. It raises the cost and complexity for an attacker. For the red teamer, it presents a challenging and realistic obstacle to overcome, forcing a move from simple attacks to more subtle and sophisticated evasion techniques.