14.2.4 Patient data privacy violations

2025.10.06.
AI Security Blog

Healthcare data is uniquely sensitive. A breach is not just a loss of personal information; it can expose vulnerabilities, stigmatize individuals, and have lifelong consequences. While we secure databases and networks, the AI models trained on this data represent a new, subtle, and incredibly potent vector for privacy violations. The model itself can become a leaky abstraction of the very data it was designed to protect.

The Model as a Privacy Attack Surface

Your red teaming mindset must expand beyond traditional data exfiltration. In the context of AI, assume the model itself is a potential source of leakage. It’s not about stealing the entire dataset; it’s about interrogating the model to force it to reveal secrets about the individuals it learned from. A model trained to diagnose a rare genetic disorder has, in a sense, “memorized” the statistical patterns of its training subjects. An attacker’s goal is to reverse-engineer that memory.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

This fundamentally changes the threat model. An adversary doesn’t need database access if they have query access to a sufficiently powerful model. Every API endpoint that exposes a model’s prediction is a potential privacy leak. Your job is to find and demonstrate these leaks before a real adversary does.

Primary Attack Vectors Against Model Privacy

Privacy attacks against machine learning models are not theoretical. They are practical and can be executed with varying levels of access and knowledge. Here are the core techniques you’ll encounter and test for.

Membership Inference Attacks

The simplest question an attacker can ask is: “Was my target’s data in the training set?” The answer alone is a privacy violation. Knowing a specific person was part of a training set for a model predicting Alzheimer’s risk, for example, reveals sensitive speculative health information about them.

These attacks exploit the fact that models often behave differently for data they have seen before compared to novel data. A model might be slightly more confident or have a lower prediction loss for a member of its training set. The red teamer’s task is to build a classifier that can distinguish between a model’s output for a “member” versus a “non-member.”

Attacker (with Target’s Data) AI Model (e.g., Cancer Risk) Inference Logic “Is confidence > 99%?” 1. Query with target data 2. Receive prediction (Confidence: 99.8%) Result: Member!

Model Inversion & Attribute Inference

These attacks go a step further, moving from “who” was in the data to “what” was in the data. They aim to reconstruct sensitive data or infer private attributes about individuals known to be in the training set.

  • Model Inversion: Attempts to reconstruct a representative sample of the training data. For a facial recognition model, this could mean generating an “average” face of a person in the training set. In a medical context, an attacker might try to generate a prototypical medical image (e.g., a chest X-ray showing a rare condition) that the model learned from.
  • Attribute Inference: Aims to uncover sensitive features that were not the model’s direct output. For instance, a model predicting loan default might inadvertently learn and leak a correlation with a medical condition from the training data, even if health data was not an explicit input feature for prediction.
Attack Type Goal Required Access Healthcare Example
Membership Inference Determine if a specific patient’s record was in the training set. Black-box (query) access to the model’s predictions and confidence scores. Confirming a CEO was part of a clinical trial for an experimental mental health drug.
Attribute Inference Infer a sensitive attribute of a patient in the training set that the model was not designed to predict. Black-box or gray-box access, plus some auxiliary information about the target. A model predicting hospital readmission rates inadvertently reveals a patient’s genetic predisposition to addiction.
Model Inversion Reconstruct a representative sample of training data. Typically white-box (full model access) but possible with black-box access. Generating a composite, recognizable image of a patient’s retina from a diabetic retinopathy detection model.

Data Reconstruction from Intermediate Representations

Modern neural networks don’t work with raw data directly. They transform it into dense numerical vectors called embeddings or latent representations. These compact representations are thought to discard irrelevant information while retaining the essential features for the task. However, they often retain far more information than intended.

If an attacker can gain access to these embeddings—perhaps through a leaked intermediate model layer or a feature-extraction API—they can potentially reverse the process. This is a highly technical attack but demonstrates that even processed, “anonymized” representations of data can be a privacy risk.

# Pseudocode for a conceptual reconstruction attack
# Attacker has access to a patient's text embedding, not the original text

# 1. Attacker obtains the target embedding from a compromised system
target_embedding = get_leaked_patient_embedding("patient_xyz")

# 2. Attacker trains a "decoder" model (e.g., a small language model)
# The decoder's job is to turn embeddings back into text
decoder_model = train_decoder_on_public_data()

# 3. Attacker uses the trained decoder to reconstruct the original text
reconstructed_text = decoder_model.predict(target_embedding)

# 4. The output might be a close approximation of the original sensitive note
# e.g., "Patient shows signs of early-onset dementia..."
print(reconstructed_text)

Red Teaming Patient Data Privacy

Your objective as a red teamer is to simulate these attacks to measure the real-world privacy risk of a deployed AI system. Don’t just write a report stating that membership inference is possible; demonstrate it.

Scenario: The Insider with API Access

Threat Actor: A curious hospital data analyst or researcher with legitimate query access to a clinical decision support model.

Objective: Test if the analyst can determine whether a specific, publicly known individual (e.g., a local politician who recently had surgery at the hospital) is represented in the model’s training data for a post-operative complication predictor.

Execution Steps:

  1. Data Shadowing: Create two datasets: a “target” set including the politician’s known (or realistically fabricated) data, and a “control” set of similar but distinct patient profiles.
  2. Model Querying: Query the model API with records from both the target and control sets. Carefully log the full outputs, including prediction probabilities and confidence scores.
  3. Differential Analysis: Analyze the distribution of confidence scores for the two groups. A statistically significant difference, where the model is consistently more confident on the target set, indicates a successful membership inference vulnerability.
  4. Impact Assessment: Report the exact conditions under which membership can be inferred, providing a clear measure of the privacy leak.

Mitigation is a Design Choice

While your primary role is to find flaws, understanding defenses helps you craft better attacks. Defenses against these privacy violations are not simple patches; they must be integrated into the model training lifecycle. Key strategies include:

  • Differential Privacy: The most robust mathematical approach. It involves adding carefully calibrated noise during the training process, making the model’s output statistically indistinguishable whether or not any single individual was in the training data.
  • Federated Learning: A distributed training paradigm where the data never leaves its source (e.g., the hospital). A central model is updated with aggregated, anonymized insights from local models, preventing the creation of a single, massive data repository.
  • Regularization and Data Augmentation: Techniques like dropout or adding noise to training data can make the model less likely to overfit and “memorize” specific patient records, which incidentally makes membership inference harder.

Ultimately, protecting patient privacy in the age of AI requires treating the model as a first-class citizen in your security architecture. It’s not just code and it’s not just data; it’s a new entity with the potential to learn, remember, and reveal the most sensitive secrets entrusted to it.