23.3.4 Data quality checking systems

2025.10.06.
AI Security Blog

Data is the bedrock of any AI system, and its integrity is a critical security control. From a red team perspective, data quality issues are not just operational annoyances; they are exploitable vulnerabilities. Data quality checking systems provide the instrumentation needed to detect, quantify, and demonstrate these vulnerabilities, turning abstract threats like data poisoning or drift into measurable impacts.

The Red Teamer’s View on Data Quality

Your objective is to break the system or demonstrate a credible risk. Poor data quality facilitates this in several ways:

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

  • Poisoning Detection: A system with no data validation is blind to gradual data poisoning. You can introduce subtle, malicious samples that a quality checker would flag as statistical anomalies or schema violations.
  • Drift Exploitation: You can simulate concept or covariate drift to degrade model performance. Data drift detection tools help you quantify this degradation and prove the system’s lack of resilience.
  • Labeling Attacks: Incorrect labels can be used to create backdoors or reduce accuracy for a targeted class. Tools that identify label issues can help you find these weaknesses or even craft attacks that evade simple detection.
  • Evasion through Schema Violation: If a system’s preprocessing is brittle, inputs that violate the expected data schema (e.g., wrong data type, unexpected values) can trigger exceptions or undefined behavior, leading to denial of service.
Data Quality Checkpoint in a Red Team Attack Flow Adversarial Input Generation Data Quality Validation Model Inference Engine Where data quality tools operate

Key Tools and Frameworks

The following tools are staples for programmatic data quality assessment. Integrating them into your red teaming workflow provides a structured way to probe for data-centric vulnerabilities.

Comparison of Data Quality Tools for Red Teaming
Tool Primary Function Key Feature Red Teaming Application
Great Expectations Data Validation & Profiling Declarative “Expectations” (assertions) Detecting schema violations and statistical anomalies from data poisoning.
Pandera DataFrame Validation Schema definition in Python code Embedding validation checks within adversarial data generation scripts.
Cleanlab Label Quality Analysis Confident learning algorithms Identifying and quantifying label noise to measure susceptibility to backdoor attacks.
Evidently AI Data & Model Monitoring Drift detection and visualization Simulating and reporting on data drift attacks to demonstrate impact on model performance.

Great Expectations (GX)

Great Expectations is a powerful open-source tool for data validation. You define a suite of “Expectations” about your data, which are assertions that can be validated against new batches of data. For red teaming, you can establish a baseline “golden” expectation suite and then test if your crafted inputs trigger violations.


# File: gx_example.py
import great_expectations as gx

# Load a dataset to establish a baseline
context = gx.get_context()
validator = context.sources.pandas_default.read_csv("golden_dataset.csv")

# Define an expectation: user_ratings should be between 1 and 5
validator.expect_column_values_to_be_between(
    "user_rating", min_value=1, max_value=5
)

# Define another: country_code must be one of these
validator.expect_column_values_to_be_in_set(
    "country_code", ["US", "CA", "GB", "DE"]
)

# Save the expectation suite
validator.save_expectation_suite(discard_failed_expectations=False)

# Later, you can validate a potentially malicious dataset against this suite
# This would be part of your test harness
checkpoint = context.add_or_update_checkpoint(
    name="red_team_checkpoint",
    validator=validator,
)
results = checkpoint.run(validations=[{"batch_request": {"path": "adversarial_data.csv"}}])
print(f"Validation success: {results.success}")
            

In an attack scenario, you might generate adversarial_data.csv with ratings of 0 or 6, or with invalid country codes. Running the checkpoint would immediately flag these anomalies, demonstrating a failure in the target’s data validation pipeline (or the lack thereof).

Cleanlab

Cleanlab specializes in finding errors in dataset labels using the principle of confident learning. This is invaluable for assessing a model’s training data for vulnerabilities. If you can identify poorly labeled classes, you know where the model is already weak and potentially susceptible to targeted attacks.


# File: cleanlab_example.py
from cleanlab import Datalab
from sklearn.model_selection import train_test_split
# Assume 'data' is your feature matrix and 'labels' are the class labels

# Wrap your data in a Datalab object
lab = Datalab(data={"X": data, "y": labels})

# Run the analysis to find potential issues
lab.find_issues()

# Get a report of identified label issues
label_issues = lab.get_issues("label")
print(f"Found {len(label_issues)} potential label errors.")
print(label_issues.head())

# This gives you a list of data points that are likely mislabeled,
# which could be an entry point for a poisoning attack.
            

You can use this to audit a client’s training dataset. A high number of label issues suggests a vulnerability that you can exploit by crafting adversarial examples that mimic these naturally occurring errors, making them harder to detect.

Evidently AI

Evidently excels at detecting data drift and model performance degradation. As a red teamer, you can use it to create compelling reports that visualize the impact of your simulated attacks. It provides the “proof” that your drift-based attack is effective.


# File: evidently_example.py
import pandas as pd
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

# reference_data is the "normal" data profile
# current_data is the data you've crafted to induce drift
reference_data = pd.read_csv("training_data.csv")
current_data = pd.read_csv("drift_attack_data.csv")

# Create a report to compare the two datasets
data_drift_report = Report(metrics=[
    DataDriftPreset(),
])

data_drift_report.run(reference_data=reference_data, current_data=current_data)

# Save the report as an HTML file for easy sharing
data_drift_report.save_html("data_drift_attack_report.html")

# The report will show which features have drifted and quantify the change,
# providing clear evidence of a successful data distribution manipulation.
            

This workflow is perfect for demonstrating the risk of a model operating in a changing environment without proper monitoring. You generate data that reflects a new reality (e.g., a new user demographic, a different type of network traffic) and use Evidently to show the blue team exactly how their model is failing to adapt.

Integrating Quality Checks into the Red Team Workflow

Don’t view these as standalone tools, but as components of your testing harness. A mature red teaming process for AI might involve:

  1. Baseline Profiling: Use a tool like Great Expectations to profile a known-good dataset. This becomes your ground truth.
  2. Hypothesis Generation: Formulate an attack hypothesis, e.g., “The model is vulnerable to out-of-range values in the `transaction_amount` field.”
  3. Adversarial Generation: Create a dataset that embodies this attack. This might involve scripts that use a library like Pandera to ensure the rest of the data schema remains valid, isolating your malicious change.
  4. Validation and Measurement: Run your generated data through the baseline quality checks. Concurrently, feed it to the target model. Use tools like Evidently to measure the performance drop or drift.
  5. Reporting: Combine the outputs—the failed expectation report from GX and the drift visualization from Evidently—to create a comprehensive finding that shows both the input anomaly and its impact on the model.

By instrumenting your attacks with data quality checks, you move from simply “breaking the model” to systematically demonstrating a specific, replicable data-centric vulnerability.