Moving beyond the visual domain of medical imaging, we now enter the intricate world of Clinical Decision Support (CDS) systems. These models digest complex, multi-modal patient data—vitals, lab results, clinical notes—to provide real-time recommendations. Exploiting them isn’t about creating a visually jarring artifact; it’s about executing a subtle, data-driven manipulation that nudges a diagnostic or treatment pathway towards a malicious outcome.
The Fragile Chain of Clinical Trust
A CDS system is only as reliable as the data it receives and the clinician’s interpretation of its output. This creates a two-pronged attack surface: the data pipeline and the human-computer interface. Unlike a simple classifier, a successful attack here manipulates a trusted advisor in a high-stakes environment. The goal is often not a simple misclassification but a specific, targeted outcome, such as triggering an unnecessary but expensive procedure or concealing a deteriorating patient condition.
Your red teaming objective is to demonstrate how this chain of trust can be broken by manipulating the inputs the model relies upon. The perturbations are not random noise; they are carefully crafted to remain within medically plausible bounds, evading detection by both automated systems and human oversight.
Primary Exploitation Vectors
Attacks on CDS systems are fundamentally about data poisoning or evasion, but with a unique clinical context. The key is subtlety. An attacker can’t simply change a patient’s heart rate from 70 to 700; they must find the minimal, believable change that achieves the desired effect.
Vector 1: Strategic Feature Perturbation
This is the most direct attack. It involves altering specific values in a patient’s electronic health record (EHR) to steer the CDS model’s output. The target model could be a sepsis prediction algorithm, a medication dosage calculator, or a risk stratification tool for cardiac events.
Imagine a sepsis prediction model that uses features like respiratory rate, body temperature, and white blood cell count. An attacker aiming for insurance fraud might introduce minute, targeted changes to trigger a false-positive sepsis alert.
| Patient Feature | Original Value (Benign) | Perturbed Value (Adversarial) | Rationale |
|---|---|---|---|
| Respiratory Rate (breaths/min) | 20 | 23 | Within high-normal range, but crosses model’s internal threshold. |
| Body Temperature (°C) | 37.5 | 38.1 | Moves patient from afebrile to low-grade fever status. |
| White Blood Cell Count (x109/L) | 11.5 | 12.1 | A slight, plausible increase that pushes the value over a clinical boundary. |
| CDS Sepsis Risk Score | 0.45 (Low Risk) | 0.82 (High Risk Alert) | Small changes combine to flip the final classification. |
The success of this attack hinges on identifying the model’s most sensitive features and knowing the minimum change required to influence the outcome. This often requires model reconnaissance, either through query access (black-box) or access to the model architecture (white-box).
Vector 2: Exploiting Alert Fatigue
A more insidious attack doesn’t rely on a single, critical misdiagnosis. Instead, it aims to degrade the overall reliability of the CDS system by generating a stream of low-grade, false-positive alerts. The attacker could manipulate data from a fleet of IoT patient monitors to slightly inflate readings across a hospital ward.
The immediate effect is a high volume of benign alerts. Over time, clinicians begin to experience “alert fatigue,” leading them to distrust or ignore warnings from the system. This creates an environment where a genuine, critical alert—either naturally occurring or triggered by the attacker—is more likely to be missed. Your red team engagement can simulate this by systematically increasing the noise in input data streams and observing the response from clinical staff in a simulated environment.
Attack Walkthrough: Gaming a Readmission Risk Model
Let’s construct a scenario where the goal is to lower a patient’s predicted 30-day hospital readmission risk. The motive could be a hospital administrator trying to improve performance metrics or an insurance provider avoiding coverage for follow-up care. The target is a gradient-boosted tree model that uses hundreds of features from the patient’s record.
Step 1: Feature Importance Reconnaissance
The attacker first needs to understand which features most heavily influence the model’s output. In a black-box scenario, they can use techniques like LIME or SHAP by repeatedly querying the model with slightly modified synthetic patient records. They discover that “number of prior hospitalizations,” “hemoglobin A1c level,” and the presence of a “discharge summary mentioning ‘social support'” are highly weighted.
Step 2: Crafting the Adversarial Payload
The attacker cannot simply erase prior hospitalizations, as that’s easily audited. Instead, they focus on the more malleable features. They can’t change a lab value drastically, but they can apply a minimal, plausible perturbation.
# Pseudocode for finding minimal perturbation
# model: The target readmission risk model
# patient_data: The original patient feature vector
# target_risk: The desired low-risk score (e.g., < 0.1)
function find_minimal_change(model, patient_data, target_risk):
# Identify editable features (e.g., lab values, not age)
editable_features = ['hba1c', 'sodium_level', 'creatinine']
# Calculate gradient of the loss w.r.t. inputs
# This tells us which feature change has the biggest impact
gradients = model.calculate_gradients(patient_data)
# Iteratively adjust the most impactful feature
perturbed_data = patient_data.copy()
while model.predict(perturbed_data) > target_risk:
# Find feature with highest gradient
feature_to_change = max(editable_features, key=lambda f: gradients[f])
# Nudge the feature value slightly in the desired direction
perturbed_data[feature_to_change] -= 0.01 * gradients[feature_to_change]
# Ensure the new value is medically plausible
perturbed_data[feature_to_change] = clip_to_plausible_range(feature_to_change, perturbed_data[feature_to_change])
return perturbed_data
Step 3: Data Injection
The crafted payload is a set of small modifications. For example, changing the HbA1c from 7.8% to 7.5% and altering the text in the discharge summary to include the phrase “patient has strong family support system.” This data is injected by compromising the interface between a laboratory information system and the main EHR, or by exploiting a vulnerability in the EHR input module used by clinicians.
Step 4: Impact and Evasion
The CDS model processes the altered record and outputs a low readmission risk score. The patient is discharged without the necessary follow-up care plan. The changes are so minor that they don’t trigger automated data validation rules and are unlikely to be noticed by a clinician during a routine review. The attack succeeds, and its digital footprint is nearly invisible, buried within the patient’s legitimate medical record.
Red Teaming Implications
When testing a CDS environment, your focus should be on the integrity of the entire decision-making pipeline, not just the model in isolation.
- Data Provenance and Integrity: Can you introduce unauthorized changes to patient data streams? Test the validation layers between IoT devices, labs, and the central EHR. Are there audit trails for every data point modification?
- Model Robustness to Perturbation: Assess the model’s sensitivity. How much does a key lab value need to change to flip a diagnosis? Use adversarial generation techniques to find the model’s blind spots and quantify the risk.
- Human Factor Simulation: In a controlled test, can you generate a series of believable but incorrect recommendations? Work with clinical stakeholders to understand how they interact with the system and where misplaced trust or alert fatigue could be exploited.
- Explainability as a Defense: Test the system’s explainability features (if any). Does the explanation for an adversarially-influenced output reveal the manipulation, or does it create a plausible but false justification that a clinician might accept?
Exploiting a CDS system is a high-impact, low-visibility attack. As a red teamer, your job is to expose these fragile data dependencies before a real adversary does, forcing the healthcare organization to build more resilient, verifiable, and transparent clinical AI systems.