10.2.3 Federated Learning Vulnerabilities

2025.10.06.
AI Security Blog

Federated Learning (FL) was conceived as a solution to data privacy, allowing model training on decentralized data without moving it. However, this architectural shift doesn’t eliminate threats; it redistributes them. Instead of attacking a central data repository, adversaries can now target the learning process itself through the participating clients and the updates they share. Your role as a red teamer is to understand and exploit this new, distributed attack surface.

The Federated Learning Threat Model

Before diving into specific attacks, you must understand the landscape. In FL, the primary actors are the clients (or “parties”) and a central aggregator (or “server”). Threats can originate from either, but the most common and potent attacks involve malicious clients.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

  • Malicious Clients: One or more clients controlled by an adversary. They can manipulate their local data, their training process, or the model updates they send to the server. Their goal can be to degrade the global model’s performance, insert a backdoor, or simply gain an advantage.
  • Compromised Server: A less common but highly impactful scenario where the central aggregator is controlled by an adversary. The server can inspect all incoming updates, selectively drop or modify them, and potentially reconstruct sensitive information about the clients’ private data.
  • Eavesdropper: An adversary observing the communication channel between clients and the server. This threat is typically mitigated with standard transport layer security (TLS), but becomes relevant if encryption is weak or improperly implemented.
Federated Learning Attack Vectors Aggregator Global Model Client 1 Client 2 Malicious Client Client N Distribute Model Send Updates Poisoned Update 1. Model Poisoning 2. Inference Attacks

Figure 1: Attack vectors in a standard Federated Learning architecture. Malicious clients can send poisoned updates (1), while a curious or compromised server can perform inference attacks on received updates (2).

A Taxonomy of Federated Learning Attacks

FL vulnerabilities can be broadly categorized by the adversary’s primary goal: corrupting the model’s integrity or violating data privacy.

1. Integrity Attacks: Poisoning the Global Model

Similar to the data poisoning attacks discussed previously, the objective here is to degrade model performance or create a backdoor. In FL, the attacker has a more direct lever: the model update itself.

Model Poisoning

This is the most potent integrity attack in FL. Instead of just poisoning local data and hoping it influences the gradients, the malicious client directly crafts a malicious model update. This gives the attacker fine-grained control.

  • Backdoor Insertion: The attacker trains their local model to respond to a specific trigger (e.g., a specific phrase or image patch) with a malicious output. They then compute an update that moves the global model towards this backdoored state. Since the server only sees the update (e.g., a vector of gradients), detecting the malicious logic is extremely difficult.
  • Targeted Degradation: An adversary can craft updates that specifically reduce the model’s accuracy on a subset of data (e.g., a competitor’s product images) while leaving overall performance largely intact, making the attack harder to detect via standard validation metrics.

A common technique is model replacement, where the malicious client’s update is scaled to overpower the benign updates from other clients. If an attacker can control the magnitude of their update, they can disproportionately influence the final aggregated model.

# Pseudocode for a simple model poisoning (scaling) attack
function malicious_client_update(global_model, local_data, scaling_factor):
    # 1. Start with the current global model
    local_model = copy(global_model)

    # 2. Train locally on a poisoned dataset to create a backdoor
    train(local_model, local_data) // local_data contains backdoor trigger examples

    # 3. Calculate the update required to move the global model to the backdoored state
    malicious_update = local_model.weights() - global_model.weights()

    # 4. Amplify the update to overpower other clients
    amplified_update = malicious_update * scaling_factor

    return amplified_update
            

2. Privacy Attacks: De-anonymizing the Updates

The core premise of FL is privacy through data decentralization. However, the model updates themselves can leak significant information about the client’s private data. A compromised server or even other malicious clients can analyze these updates to infer sensitive details.

  • Membership Inference: The adversary’s goal is to determine if a specific data record was part of a client’s training set. By observing how the model updates, they can often infer the presence of a particular data point with higher-than-random accuracy.
  • Property Inference: The attacker aims to learn statistical properties of a client’s dataset that were not explicitly shared, such as the distribution of classes (e.g., “does this hospital’s patient data contain more examples of disease X than disease Y?”).
  • Data Reconstruction (Gradient Inversion): This is the most severe privacy breach. Under certain conditions, it is possible to reconstruct the original training data samples—or close approximations—directly from the shared gradients. This completely undermines FL’s privacy promise. Attacks are particularly effective for small batch sizes, specific activation functions, and known data structures (like images).
Table 1: Summary of Federated Learning Vulnerabilities
Attack Category Adversary’s Goal Primary Target Key Technique
Model Poisoning Corrupt model integrity (e.g., create backdoors) Global Model Sending crafted/scaled model updates
Membership Inference Violate privacy Client’s training data Analyzing updates for signs of specific data points
Property Inference Violate privacy Client’s data distribution Aggregating information leakage over multiple rounds
Data Reconstruction Violate privacy (total breach) Client’s raw training data Inverting gradients to recover input samples

Red Teaming Strategies and Defensive Postures

When testing an FL system, your approach must be twofold, targeting both integrity and privacy.

Red Team Actions

  1. Simulate Malicious Clients: Your primary task is to act as one or more malicious clients in the federation. You will need to implement poisoning attacks, attempting to create a backdoor in the global model. Your success metric is the backdoor’s accuracy on a held-out test set.
  2. Probe for Leakage from the Server’s Perspective: Gain access to the model updates received by the aggregator. Implement known gradient inversion and membership inference algorithms to see what you can learn about the clients’ data. Can you reconstruct an image? Can you determine if a specific user’s record was used in training?
  3. Test the Aggregation Algorithm: Many FL systems employ “robust aggregation” algorithms (e.g., Krum, Trimmed Mean, Median) designed to filter outliers. Your job is to design attacks that can bypass these defenses. For example, can you craft an update that is malicious but still falls within the expected statistical distribution?

Defensive Considerations

While your job is to break things, understanding defenses makes you a more effective attacker. Key defenses in FL include:

  • Robust Aggregation Rules: As mentioned, these algorithms try to identify and discard malicious updates before they are averaged into the global model.
  • Differential Privacy (DP): The most common defense against privacy attacks. Clients add carefully calibrated statistical noise to their updates before sending them. This makes it mathematically difficult for an attacker to infer information about any single data point, but it often comes at the cost of model accuracy.
  • Secure Aggregation: Using cryptographic techniques like Secure Multi-Party Computation (SMPC) or Homomorphic Encryption, clients can encrypt their updates. The server can then compute the sum of the encrypted updates without being able to see any individual update, providing strong privacy guarantees. These methods are computationally expensive and introduce significant system complexity.
  • Update Clipping and Normalization: Limiting the maximum norm (magnitude) of any client update can mitigate the scaling attacks used in model poisoning.

Ultimately, FL introduces a delicate trade-off. Stronger privacy protections (like high levels of DP noise or complex cryptography) can degrade model performance and increase system overhead. As a red teamer, you operate in this gap, testing whether the chosen balance is truly secure against a determined adversary.