14.2.1 Medical Imaging Adversarial Attacks

2025.10.06.
AI Security Blog

The diagnostic power of AI in medical imaging is undeniable, but its reliability is built on a fragile foundation of statistical patterns. As a red teamer, your job is to probe that foundation. An attack here isn’t about crashing a server; it’s about manipulating a model’s perception of reality to induce a clinically catastrophic error, such as hiding a malignant tumor or fabricating evidence of a disease that isn’t there.

The Attack Surface: Beyond the Pixels

When you target a medical imaging AI, you’re not just attacking a convolutional neural network (CNN). You’re targeting an entire clinical workflow. The attack surface includes:

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

  • The Input Data: The raw image files (e.g., DICOM, NIfTI) are primary targets. Metadata within these files can also be manipulated.
  • The Pre-processing Pipeline: Normalization, resizing, and filtering steps can be exploited to amplify adversarial noise.
  • The Model Itself: Direct attacks on the model’s weights and architecture, typically in white-box scenarios.
  • The Post-processing Logic: How the model’s output (e.g., a probability score) is translated into a clinical recommendation can be a weak link.

Your entry point determines the type of attack you can mount. Do you have access to the model’s architecture, or are you operating in a black-box environment where you can only submit images and observe the output?

Attack Vector 1: Gradient-Based Perturbations (White-Box)

If you have access to the model or its gradients, you can craft highly effective, imperceptible noise. The goal is to calculate the smallest possible change to the input image that results in the largest change in the output classification, pushing it across a decision boundary.

The Fast Gradient Sign Method (FGSM) is a foundational technique. You calculate the gradient of the loss function with respect to the input image and then add a small perturbation in the direction of that gradient. This efficiently pushes the model toward an incorrect classification.

Tooling Example: Basic FGSM with ART

The Adversarial Robustness Toolbox (ART) is a popular library for crafting and defending against adversarial attacks. Here’s a conceptual Python snippet demonstrating how to generate an adversarial chest X-ray image designed to fool a tumor detection model.

# Assume 'model' is a trained Keras/PyTorch classifier
# and 'x_ray_image' is a pre-processed numpy array of the image.

from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import KerasClassifier

# 1. Wrap your model with an ART classifier
classifier = KerasClassifier(model=model, clip_values=(0, 255))

# 2. Initialize the attack object
# Epsilon (eps) controls the "strength" or visibility of the noise
attack = FastGradientMethod(estimator=classifier, eps=0.5)

# 3. Generate the adversarial image
# The goal is to make the model misclassify it
adversarial_image = attack.generate(x=x_ray_image)

# Now, 'adversarial_image' looks nearly identical to 'x_ray_image'
# but the model will likely predict 'benign' instead of 'malignant'.

In a red team engagement, you would use this technique to demonstrate how an insider or a compromised PACS (Picture Archiving and Communication System) server could systematically alter images to produce false negatives.

Attack Vector 2: Adversarial Patches (Black-Box & Physical)

White-box attacks require significant knowledge. A more practical and alarming vector is the adversarial patch. This is a small, localized, and often conspicuous pattern that, when placed anywhere in an image, can reliably trigger a misclassification. Because the patch is universal, it doesn’t require per-image gradient calculations, making it ideal for black-box scenarios.

Imagine a small, sticker-like object placed on a CT scanner bed. Any scan taken will contain this patch, which could be designed to universally erase signs of a specific medical condition from the AI’s “perception.”

Original MRI (Malignant) Prediction: 98% Malignant MRI with Adversarial Patch Prediction: 95% Benign Model’s “Perception” Result: False Negative

An adversarial patch attack flips a “malignant” diagnosis to “benign” by exploiting model vulnerabilities.

Your role as a red teamer is to simulate this. Can you generate a digital patch and, through an API or file upload portal, insert it into images to test the system’s resilience? This demonstrates a high-impact, low-effort attack vector that stakeholders can readily understand.

Attack Vector 3: Training Data Poisoning (Backdoors)

The most insidious attacks occur before the model is even deployed. By poisoning the training data, you can install a “backdoor.” The model learns to perform normally on most inputs but behaves according to the attacker’s will when it encounters a specific, secret trigger.

For example, you could inject a few hundred chest X-rays into a massive training dataset. In each poisoned image, a small, benign-looking nodule is present, but the label is changed to “malignant.” The model might learn an incorrect association: that specific nodule pattern is a strong indicator of cancer. Later, an attacker could submit a healthy patient’s X-ray with that nodule digitally added, triggering a false positive diagnosis. This could be used for insurance fraud or to target a specific individual.

Red Teaming Data Poisoning

Testing for this is difficult without access to the training pipeline. Your engagement might focus on:

  • Source Data Audits: Reviewing the provenance and integrity checks on training datasets. Who has access? Are there hashes to verify data integrity?
  • Trigger Probing: If you suspect a backdoor, you can systematically probe the model with various potential triggers (e.g., small geometric shapes, specific pixel color values in corners) to see if you can find a pattern that causes anomalous behavior.
  • Model Explainability: Using tools like SHAP or LIME to analyze why a model makes a certain prediction. If a tiny, clinically irrelevant area of an image is driving a strong prediction, it could indicate a backdoor.

Summary of Attack Vectors and Red Team Actions

Your approach will depend on the engagement’s scope. The table below summarizes the vectors and corresponding red team actions.

Attack Vector Required Access Impact Red Team Action
Gradient-Based Perturbations White-Box (Model gradients) High (False negatives/positives) Use frameworks like ART/CleverHans to generate adversarial examples and test model robustness directly.
Adversarial Patches Black-Box (API/Upload access) High (Universal trigger) Generate a digital patch and insert it into test images submitted through system inputs. Test physical patch resilience if applicable.
Data Poisoning / Backdoors Access to training data/pipeline Critical (Persistent, targeted compromise) Audit data pipeline security. Probe the deployed model for trigger-based behavior. Use XAI tools to detect suspicious feature importance.