16.2.2 Federated Learning Security

2025.10.06.
AI Security Blog

Federated Learning (FL) presents a compelling solution to data privacy: instead of bringing data to the model, you bring the model to the data. This decentralized approach avoids creating a massive, central repository of raw, sensitive information. However, you must understand that FL is not a security panacea. It merely trades one set of risks—a centralized data breach—for a more complex, distributed attack surface. Your job as a red teamer is to probe the unique vulnerabilities this distributed architecture creates.

The Federated Learning Lifecycle and Its Attack Surfaces

At its core, FL is an iterative process. A central server orchestrates rounds of training across a fleet of clients (e.g., mobile phones, hospitals). In each round, the server distributes the current global model, clients train it on their local data, and then send back model updates (like gradients or weights), not the data itself. The server aggregates these updates to improve the global model. This cycle repeats.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

This process creates three primary points of failure that an adversary can target. Understanding where to look is the first step in designing an effective red team engagement.

Central Server (Aggregator) Attack Point 3: Aggregator Client 1 (Benign) Client 2 (Benign) Client 3 (Malicious) Attack Point 1: Client 1. Distribute Global Model 2. Submit Local Updates Attack Point 2: Network

Figure 1: Federated Learning process highlighting key attack surfaces: the client, the network communication, and the central aggregator.

Key Attack Vectors in Federated Systems

An adversary’s goals typically fall into two categories: sabotaging the model’s performance (integrity attack) or extracting sensitive information from the updates (confidentiality attack).

1. Poisoning Attacks (Integrity)

This is the most direct way to sabotage an FL system. A compromised client can submit carefully crafted updates to degrade the global model’s performance or, more insidiously, to insert a backdoor.

  • Data Poisoning: The adversary manipulates the local training data on a compromised client. For example, they might mislabel images of a specific object to make the final model misclassify it. This is subtle and hard to detect at the server level.
  • Model Poisoning: A more sophisticated attack where the adversary directly crafts the model update (gradients) to have a disproportionate and malicious impact on the global model. This can be more powerful than data poisoning, as the update doesn’t have to correspond to any realistic local data.

2. Inference and Reconstruction Attacks (Confidentiality)

While raw data never leaves the client, the model updates are not entirely benign. They are a function of the local data, and a curious server (or an attacker who compromises it) can attempt to reverse-engineer information from them.

  • Membership Inference: An attacker tries to determine if a specific data point was part of a client’s training set by analyzing the submitted updates.
  • Property Inference: The attacker infers general properties of a client’s dataset that are not meant to be shared, such as the demographic distribution of users or the presence of a rare medical condition in a hospital’s data.
  • Gradient Inversion: The most severe form, where an attacker can reconstruct, sometimes with startling fidelity, the original training data samples from the gradients sent by a client. This completely undermines the privacy promise of FL if not properly mitigated.

3. Byzantine and Free-Rider Attacks (Availability/Fairness)

Not all attacks aim for data theft or backdoors. Some simply aim to disrupt the system or exploit it.

  • Byzantine Attacks: Malicious clients send random or disruptive updates to slow down or prevent the model from converging to an accurate state. Their goal is chaos and disruption.
  • Free-Rider Attacks: A lazy or malicious client submits useless updates (e.g., random noise, or stale updates) but still receives the improved global model. They benefit from the collective effort without contributing, degrading overall efficiency and fairness.

A Layered Defense Strategy

Defending an FL system requires a defense-in-depth approach. No single technique is sufficient. You must secure the aggregator, protect client updates, and monitor the overall health of the training process.

Attack Class Adversary’s Goal Primary Defense Mechanism
Poisoning Degrade model performance or insert a backdoor. Robust Aggregation Algorithms, Anomaly Detection
Inference/Reconstruction Extract sensitive information from model updates. Differential Privacy, Secure Aggregation
Byzantine/Free-Rider Disrupt training or unfairly benefit from the system. Client Vetting, Reputation Systems, Contribution Auditing

Securing the Aggregator with Robust Algorithms

The standard aggregation method, Federated Averaging (FedAvg), is simple: it just averages the updates from all participating clients. This makes it highly vulnerable to poisoning, as a single malicious update with large values can completely skew the average. Robust aggregation algorithms are designed to mitigate this.

  • Median-based: Instead of the mean, the server calculates the median for each parameter in the model update. Medians are naturally resistant to outliers.
  • Trimmed Mean: The server discards a certain percentage of the highest and lowest values for each parameter before averaging the rest. This effectively ignores the most extreme (and likely malicious) updates.
  • Krum/Multi-Krum: An algorithm that selects one (or more) client update that is “closest” to its neighbors, assuming that malicious updates will be outliers in the update space.

Enhancing Client-Side Privacy with Differential Privacy

To defend against inference attacks, you can combine FL with the principles from the previous chapter on Differential Privacy. The idea is to add carefully calibrated noise to each client’s update before sending it to the server. This provides a mathematical guarantee of privacy, making it difficult for an attacker to confidently infer anything about an individual’s data.

This creates a direct trade-off: more noise provides better privacy but can slow down model convergence and slightly reduce final accuracy. Your task is to find the right balance for the specific application.

# Pseudocode for a client applying differential privacy
function create_private_update(local_model, global_model, noise_scale, clipping_norm):
    # 1. Calculate the raw update (gradient)
    raw_update = calculate_gradient(local_model, global_model)

    # 2. Clip the update to limit its influence (L2 norm)
    update_norm = norm(raw_update)
    clipping_factor = min(1.0, clipping_norm / update_norm)
    clipped_update = raw_update * clipping_factor

    # 3. Add Gaussian noise for differential privacy
    noise = generate_gaussian_noise(shape=clipped_update.shape, scale=noise_scale)
    private_update = clipped_update + noise

    return private_update
            

Securing Communication with Secure Aggregation

Even with DP, the server can still see individual (noisy) updates. Secure Aggregation protocols, which are a form of Secure Multi-Party Computation (SMPC), solve this. Using cryptography, clients collaboratively encrypt their updates such that the server can only learn the sum of all updates, but not any individual client’s contribution. If a client drops out mid-protocol, the system can recover. This effectively blinds the server, preventing it from carrying out inference attacks while still allowing it to perform its aggregation task.

Red Teaming Federated Learning: A Checklist

When you’re tasked with testing an FL system, your approach should mirror the layered defense model. Here are key questions to guide your engagement:

  • Client Compromise: Can you simulate controlling a percentage of clients? Start small (1%) and increase. What is the minimum number of malicious clients needed to successfully poison the model?
  • Poisoning Resilience: How does the global model’s accuracy change when your compromised clients submit targeted malicious updates? Does the aggregation algorithm (e.g., Trimmed Mean, Krum) successfully filter them out?
  • Backdoor Injection: Can you train a backdoor into the global model? For example, can you make an image classifier always label pictures with a specific watermark as “cat”?
  • Privacy Leakage: Assuming you can observe a client’s updates (e.g., by compromising the server), can you perform a gradient inversion attack to reconstruct their training data? How effective is the applied differential privacy in thwarting your efforts?
  • System Disruption: Can you prevent the model from converging by having your clients submit random or conflicting gradients (Byzantine attack)? How does the system handle clients that repeatedly drop out or send malformed data?

By probing these areas, you move beyond generic security testing and address the specific, nuanced vulnerabilities inherent in the federated learning paradigm.