Federated Learning Security

advanced18 min readUpdated 2026-03-20

Security attacks on federated learning systems including model poisoning, data inference, and Byzantine fault exploitation.

infrastructure federated-learning model-poisoning privacy

Overview

Federated learning (FL) allows multiple parties to collaboratively train a shared model without centralizing raw data. Instead of sending data to a central server, each participant trains a local model on their own data and sends only model updates (gradients or weights) to an aggregation server. While this architecture was designed to preserve privacy, it introduces a distinct set of security vulnerabilities that traditional centralized training does not face.

The core security challenge in federated learning is trust. The aggregation server must trust that client updates are honest, and clients must trust that the server performs aggregation correctly. Neither assumption holds in adversarial settings. A single malicious client can inject poisoned updates that shift the global model's behavior, and a curious server can reconstruct private training data from observed gradients. These are not theoretical concerns — they have been demonstrated repeatedly in research and are directly relevant to any organization deploying federated learning in healthcare, finance, or cross-organizational AI initiatives.

This article covers the primary attack classes against federated learning systems, provides working exploit code, and evaluates the effectiveness of current defenses. The attacks described here map to MITRE ATLAS techniques for ML supply chain compromise (AML.T0010) and model poisoning (AML.T0020).

Federated Learning Architecture

How Federated Averaging Works

The dominant federated learning algorithm is Federated Averaging (FedAvg), introduced by McMahan et al. in "Communication-Efficient Learning of Deep Networks from Decentralized Data" (2017). The protocol proceeds in rounds:

The server sends the current global model to a subset of clients
Each client trains the model on local data for several epochs
Clients send updated model parameters (or gradients) back to the server
The server aggregates updates, typically by weighted averaging
The process repeats until convergence

import torch
import torch.nn as nn
from typing import Dict, List, Tuple
import copy
 
class FederatedServer:
    """Simplified federated averaging server."""
 
    def __init__(self, global_model: nn.Module):
        self.global_model = global_model
        self.round_number = 0
 
    def aggregate(
        self, client_updates: List[Dict[str, torch.Tensor]], weights: List[float]
    ) -> None:
        """Aggregate client updates using weighted averaging."""
        total_weight = sum(weights)
        global_state = self.global_model.state_dict()
 
        # Initialize aggregated state with zeros
        aggregated = {
            key: torch.zeros_like(param) for key, param in global_state.items()
        }
 
        # Weighted average of all client updates
        for update, weight in zip(client_updates, weights):
            for key in aggregated:
                aggregated[key] += (weight / total_weight) * update[key]
 
        self.global_model.load_state_dict(aggregated)
        self.round_number += 1
 
    def distribute_model(self) -> Dict[str, torch.Tensor]:
        """Send current global model to clients."""
        return copy.deepcopy(self.global_model.state_dict())

Attack Surface Map

The federated learning architecture exposes attack surfaces at every component:

Component	Attack Surface	Example Attacks
Client devices	Local model training	Data poisoning, model replacement
Communication channel	Gradient/weight transmission	Gradient interception, man-in-the-middle
Aggregation server	Update combination logic	Byzantine attacks, aggregation manipulation
Global model	Shared parameters	Backdoor injection, model inversion
Client selection	Participation protocol	Sybil attacks, free-riding

The fundamental issue is that the server cannot verify whether a client's update was derived from legitimate training on honest data. This verification gap is the root cause of most federated learning attacks.

Model Poisoning Attacks

Untargeted Poisoning

Untargeted poisoning aims to degrade overall model performance. A malicious client sends updates that push the global model away from convergence. The simplest approach is to send random noise or inverted gradients.

import torch
import torch.nn as nn
from typing import Dict
 
class UntargetedPoisoner:
    """Untargeted model poisoning — degrades global model accuracy."""
 
    def __init__(self, model: nn.Module, amplification_factor: float = 10.0):
        self.model = model
        self.amplification_factor = amplification_factor
 
    def generate_poisoned_update(
        self, honest_update: Dict[str, torch.Tensor]
    ) -> Dict[str, torch.Tensor]:
        """Generate a poisoned update by negating and amplifying honest gradients."""
        poisoned = {}
        for key, param in honest_update.items():
            # Negate the gradient direction and amplify
            poisoned[key] = -self.amplification_factor * param
        return poisoned
 
    def generate_noise_update(
        self, reference_update: Dict[str, torch.Tensor]
    ) -> Dict[str, torch.Tensor]:
        """Generate random noise update scaled to match legitimate update norms."""
        poisoned = {}
        for key, param in reference_update.items():
            noise = torch.randn_like(param)
            # Scale noise to match the norm of the legitimate update
            scale = param.norm() / (noise.norm() + 1e-8)
            poisoned[key] = noise * scale
        return poisoned

The amplification factor is critical. If the malicious update is too large, norm-based defenses will clip or reject it. If it is too small, the attack has negligible effect when averaged with honest updates. Research by Fang et al. in "Local Model Poisoning Attacks to Byzantine-Robust Federated Learning" (USENIX Security 2020) showed that adaptive attackers who know the aggregation rule can craft updates that evade detection while maximizing damage.

Targeted Backdoor Attacks

Targeted attacks are more sophisticated and more dangerous. Rather than degrading overall performance, the attacker injects a backdoor that causes the model to misclassify specific inputs while maintaining normal accuracy on clean data. Bagdasaryan et al. demonstrated this in "How To Back Door Federated Learning" (AISTATS 2020).

The attack works by training a local model on poisoned data that includes a trigger pattern, then scaling the resulting update to overcome averaging with honest clients:

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
from typing import Dict, Optional, Tuple
import copy
 
class BackdoorPoisoner:
    """Targeted backdoor attack against federated learning."""
 
    def __init__(
        self,
        model: nn.Module,
        target_label: int,
        trigger_pattern: torch.Tensor,
        trigger_position: Tuple[int, int] = (0, 0),
        scaling_factor: float = 1.0,
    ):
        self.model = model
        self.target_label = target_label
        self.trigger_pattern = trigger_pattern
        self.trigger_position = trigger_position
        self.scaling_factor = scaling_factor
 
    def apply_trigger(self, images: torch.Tensor) -> torch.Tensor:
        """Stamp a trigger pattern onto images."""
        triggered = images.clone()
        h, w = self.trigger_pattern.shape[-2:]
        y, x = self.trigger_position
        triggered[:, :, y : y + h, x : x + w] = self.trigger_pattern
        return triggered
 
    def create_poisoned_batch(
        self, images: torch.Tensor, labels: torch.Tensor, poison_ratio: float = 0.5
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """Create a mixed batch of clean and poisoned samples."""
        batch_size = images.size(0)
        num_poison = int(batch_size * poison_ratio)
 
        poisoned_images = images.clone()
        poisoned_labels = labels.clone()
 
        # Apply trigger and change labels for poisoned subset
        poisoned_images[:num_poison] = self.apply_trigger(images[:num_poison])
        poisoned_labels[:num_poison] = self.target_label
 
        return poisoned_images, poisoned_labels
 
    def train_and_scale(
        self,
        global_model_state: Dict[str, torch.Tensor],
        poisoned_loader: DataLoader,
        num_epochs: int = 5,
        lr: float = 0.01,
        num_total_clients: int = 10,
    ) -> Dict[str, torch.Tensor]:
        """Train on poisoned data and scale update to survive aggregation."""
        local_model = copy.deepcopy(self.model)
        local_model.load_state_dict(global_model_state)
 
        optimizer = torch.optim.SGD(local_model.parameters(), lr=lr)
        criterion = nn.CrossEntropyLoss()
 
        local_model.train()
        for epoch in range(num_epochs):
            for images, labels in poisoned_loader:
                poisoned_images, poisoned_labels = self.create_poisoned_batch(
                    images, labels
                )
                optimizer.zero_grad()
                outputs = local_model(poisoned_images)
                loss = criterion(outputs, poisoned_labels)
                loss.backward()
                optimizer.step()
 
        # Scale the update so it dominates the aggregation
        # After averaging n clients, the poisoned update contribution is 1/n
        # Scaling by n ensures the backdoor survives averaging
        poisoned_state = local_model.state_dict()
        scaled_update = {}
        for key in global_model_state:
            delta = poisoned_state[key] - global_model_state[key]
            scaled_update[key] = (
                global_model_state[key]
                + delta * num_total_clients * self.scaling_factor
            )
 
        return scaled_update

The scaling factor of num_total_clients is the key insight: after FedAvg divides by the number of clients, the backdoor update retains its original magnitude. In practice, attackers may use smaller scaling factors to evade anomaly detection, accepting a slower but stealthier injection process that requires multiple rounds.

Gradient-Based Data Reconstruction

Deep Leakage from Gradients

Even without poisoning, the mere act of sharing gradients leaks information about private training data. Zhu et al. demonstrated in "Deep Leakage from Gradients" (NeurIPS 2019) that an observer who sees a client's gradient update can reconstruct the training data that produced it.

The attack works by optimizing a dummy input to produce gradients that match the observed gradients:

import torch
import torch.nn as nn
from typing import List, Tuple, Optional
 
class GradientLeakageAttack:
    """Reconstruct private training data from observed gradients."""
 
    def __init__(self, model: nn.Module, device: str = "cpu"):
        self.model = model
        self.device = device
 
    def attack(
        self,
        observed_gradients: List[torch.Tensor],
        input_shape: Tuple[int, ...],
        num_classes: int,
        num_iterations: int = 300,
        lr: float = 1.0,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Reconstruct training data from observed gradients.
 
        Args:
            observed_gradients: List of gradient tensors observed from the client
            input_shape: Shape of the input data (e.g., (3, 32, 32) for CIFAR-10)
            num_classes: Number of output classes
            num_iterations: Optimization iterations
            lr: Learning rate for reconstruction
 
        Returns:
            Tuple of (reconstructed_data, reconstructed_labels)
        """
        # Initialize random dummy data and labels
        dummy_data = torch.randn(1, *input_shape, device=self.device, requires_grad=True)
        dummy_label = torch.randn(1, num_classes, device=self.device, requires_grad=True)
 
        optimizer = torch.optim.LBFGS([dummy_data, dummy_label], lr=lr)
        criterion = nn.CrossEntropyLoss()
 
        history = []
 
        for iteration in range(num_iterations):
            def closure():
                optimizer.zero_grad()
                self.model.zero_grad()
 
                # Forward pass with dummy data
                pred = self.model(dummy_data)
                dummy_loss = criterion(pred, torch.softmax(dummy_label, dim=-1))
 
                # Compute gradients of dummy data
                dummy_gradients = torch.autograd.grad(
                    dummy_loss, self.model.parameters(), create_graph=True
                )
 
                # Minimize distance between dummy and observed gradients
                gradient_diff = sum(
                    ((dg - og) ** 2).sum()
                    for dg, og in zip(dummy_gradients, observed_gradients)
                )
 
                gradient_diff.backward()
                return gradient_diff
 
            loss = optimizer.step(closure)
            if iteration % 50 == 0:
                history.append(loss.item())
 
        reconstructed_label = torch.argmax(dummy_label, dim=-1)
        return dummy_data.detach(), reconstructed_label.detach()

This attack is remarkably effective against small batch sizes. When a client trains on a single sample, the reconstruction can be near-perfect for image data. As batch size increases, reconstruction quality degrades but partial information leakage persists. Geiping et al. extended this in "Inverting Gradients - How Easy Is It to Break Privacy in Federated Learning?" (NeurIPS 2020) with cosine similarity loss and regularization that improved reconstruction quality significantly.

Membership Inference in Federated Settings

Beyond full data reconstruction, an adversary can determine whether a specific data point was used in a client's training set. This membership inference attack is performed by observing how the model's loss on a target point changes after incorporating a client's update:

import torch
import torch.nn as nn
from typing import Dict, List
 
class FederatedMembershipInference:
    """Determine if a specific sample was in a client's training data."""
 
    def __init__(self, model: nn.Module, threshold: float = 0.5):
        self.model = model
        self.threshold = threshold
 
    def compute_loss_change(
        self,
        target_sample: torch.Tensor,
        target_label: torch.Tensor,
        model_before: Dict[str, torch.Tensor],
        model_after: Dict[str, torch.Tensor],
    ) -> float:
        """Measure how much the loss on a target sample changed after a client update."""
        criterion = nn.CrossEntropyLoss()
 
        # Loss before client update
        self.model.load_state_dict(model_before)
        self.model.eval()
        with torch.no_grad():
            loss_before = criterion(
                self.model(target_sample.unsqueeze(0)), target_label.unsqueeze(0)
            ).item()
 
        # Loss after client update
        self.model.load_state_dict(model_after)
        self.model.eval()
        with torch.no_grad():
            loss_after = criterion(
                self.model(target_sample.unsqueeze(0)), target_label.unsqueeze(0)
            ).item()
 
        return loss_before - loss_after
 
    def infer_membership(
        self,
        target_samples: List[torch.Tensor],
        target_labels: List[torch.Tensor],
        model_before: Dict[str, torch.Tensor],
        model_after: Dict[str, torch.Tensor],
    ) -> List[bool]:
        """Infer membership for multiple samples."""
        results = []
        for sample, label in zip(target_samples, target_labels):
            loss_change = self.compute_loss_change(
                sample, label, model_before, model_after
            )
            # Large positive loss change suggests the sample was in the training set
            results.append(loss_change > self.threshold)
        return results

Sybil and Free-Rider Attacks

Sybil Attacks

In a Sybil attack, a single adversary creates multiple fake client identities to gain disproportionate influence over the aggregation. If the attacker controls k out of n clients, their poisoned updates receive k/n weight in the average instead of 1/n.

import torch
import torch.nn as nn
from typing import Dict, List
import copy
 
class SybilAttacker:
    """Launch a Sybil attack by controlling multiple federated clients."""
 
    def __init__(
        self,
        num_sybil_clients: int,
        poisoned_update_fn,  # callable that generates a poisoned update
    ):
        self.num_sybil_clients = num_sybil_clients
        self.poisoned_update_fn = poisoned_update_fn
 
    def generate_sybil_updates(
        self,
        global_model_state: Dict[str, torch.Tensor],
        add_noise: bool = True,
        noise_scale: float = 0.01,
    ) -> List[Dict[str, torch.Tensor]]:
        """Generate multiple slightly varied poisoned updates to avoid detection."""
        base_update = self.poisoned_update_fn(global_model_state)
        updates = []
 
        for i in range(self.num_sybil_clients):
            if add_noise and i > 0:
                # Add small random perturbations so updates are not identical
                noisy_update = {}
                for key, param in base_update.items():
                    noise = torch.randn_like(param) * noise_scale * param.abs().mean()
                    noisy_update[key] = param + noise
                updates.append(noisy_update)
            else:
                updates.append(copy.deepcopy(base_update))
 
        return updates

Defending against Sybil attacks requires robust client authentication and reputation systems. Proof-of-work, hardware attestation, and contribution quality scoring have all been proposed, but each has significant limitations in cross-organizational settings where participants may not have established trust relationships.

Free-Rider Attacks

Free-riders participate in federated learning to receive the trained global model without contributing meaningful updates. They may send zero updates, random noise, or simply echo the global model back. While this does not directly harm the model, it dilutes the quality of aggregation and violates the cooperative assumption that federated learning depends on.

Byzantine-Resilient Aggregation

Robust Aggregation Rules

Standard FedAvg is highly vulnerable to even a single malicious client. Byzantine-resilient aggregation rules aim to produce correct results even when a fraction of clients are adversarial.

import torch
from typing import Dict, List
 
class RobustAggregator:
    """Byzantine-resilient aggregation methods for federated learning."""
 
    @staticmethod
    def coordinate_wise_median(
        updates: List[Dict[str, torch.Tensor]],
    ) -> Dict[str, torch.Tensor]:
        """Coordinate-wise median aggregation (Yin et al., 2018)."""
        keys = updates[0].keys()
        aggregated = {}
 
        for key in keys:
            stacked = torch.stack([u[key] for u in updates])
            aggregated[key] = torch.median(stacked, dim=0).values
 
        return aggregated
 
    @staticmethod
    def trimmed_mean(
        updates: List[Dict[str, torch.Tensor]], trim_ratio: float = 0.1
    ) -> Dict[str, torch.Tensor]:
        """Trimmed mean — remove extreme values before averaging."""
        keys = updates[0].keys()
        aggregated = {}
        n = len(updates)
        trim_count = int(n * trim_ratio)
 
        for key in keys:
            stacked = torch.stack([u[key] for u in updates])
            sorted_updates, _ = torch.sort(stacked, dim=0)
            # Remove top and bottom trim_count values
            trimmed = sorted_updates[trim_count : n - trim_count]
            aggregated[key] = trimmed.mean(dim=0)
 
        return aggregated
 
    @staticmethod
    def krum(
        updates: List[Dict[str, torch.Tensor]], num_byzantine: int
    ) -> Dict[str, torch.Tensor]:
        """
        Multi-Krum aggregation (Blanchard et al., 2017).
        Selects the update closest to its neighbors.
        """
        n = len(updates)
        num_select = n - num_byzantine - 2
 
        # Flatten updates for distance computation
        flat_updates = []
        for update in updates:
            flat = torch.cat([param.flatten() for param in update.values()])
            flat_updates.append(flat)
 
        # Compute pairwise distances
        distances = torch.zeros(n, n)
        for i in range(n):
            for j in range(i + 1, n):
                dist = (flat_updates[i] - flat_updates[j]).norm()
                distances[i][j] = dist
                distances[j][i] = dist
 
        # For each update, sum distances to closest neighbors
        scores = []
        for i in range(n):
            sorted_dists, _ = torch.sort(distances[i])
            # Sum of distances to num_select closest neighbors (excluding self)
            score = sorted_dists[1 : num_select + 1].sum()
            scores.append(score)
 
        # Select the update with the lowest score
        best_idx = int(torch.argmin(torch.tensor(scores)))
        return updates[best_idx]

Research has shown that none of these defenses are sufficient against adaptive attackers. Fang et al. (USENIX Security 2020) demonstrated attacks that craft poisoned updates specifically designed to evade Krum, trimmed mean, and median aggregation by staying close to the distribution of honest updates while still shifting the model in the attacker's desired direction.

Differential Privacy in Federated Learning

Differential privacy (DP) provides a mathematical guarantee on information leakage. In federated learning, DP can be applied at the client level (local DP) or at the server level (central DP). The standard approach clips gradient norms and adds calibrated Gaussian noise:

import torch
from typing import Dict
 
class DifferentiallyPrivateAggregator:
    """Federated aggregation with differential privacy guarantees."""
 
    def __init__(
        self,
        clip_norm: float = 1.0,
        noise_multiplier: float = 1.0,
        num_clients_per_round: int = 10,
    ):
        self.clip_norm = clip_norm
        self.noise_multiplier = noise_multiplier
        self.num_clients_per_round = num_clients_per_round
 
    def clip_update(
        self, update: Dict[str, torch.Tensor]
    ) -> Dict[str, torch.Tensor]:
        """Clip update to bounded L2 norm."""
        flat = torch.cat([param.flatten() for param in update.values()])
        total_norm = flat.norm()
 
        clip_factor = min(1.0, self.clip_norm / (total_norm + 1e-8))
        return {key: param * clip_factor for key, param in update.items()}
 
    def aggregate_with_dp(
        self, updates: list[Dict[str, torch.Tensor]]
    ) -> Dict[str, torch.Tensor]:
        """Aggregate with clipping and noise addition."""
        clipped_updates = [self.clip_update(u) for u in updates]
 
        # Average clipped updates
        keys = clipped_updates[0].keys()
        aggregated = {}
        n = len(clipped_updates)
 
        for key in keys:
            summed = sum(u[key] for u in clipped_updates)
            averaged = summed / n
 
            # Add calibrated Gaussian noise
            noise_std = (
                self.clip_norm * self.noise_multiplier / n
            )
            noise = torch.randn_like(averaged) * noise_std
            aggregated[key] = averaged + noise
 
        return aggregated

The privacy-utility tradeoff is the central tension: stronger privacy (more noise, tighter clipping) reduces the model's accuracy. Google's deployment of federated learning for next-word prediction in Gboard demonstrated that useful models can be trained with meaningful DP guarantees, but the privacy budgets required are often larger than what differential privacy theorists would consider truly protective.

Secure Aggregation Protocols

Secure aggregation allows the server to compute the sum of client updates without seeing any individual update. This prevents the gradient leakage attacks described earlier. The protocol typically uses cryptographic techniques such as secret sharing or homomorphic encryption:

import secrets
import hashlib
from typing import List, Tuple
 
class SimpleSecretSharing:
    """
    Simplified additive secret sharing for secure aggregation.
    In production, use established libraries like TF Federated or PySyft.
    """
 
    @staticmethod
    def create_shares(
        value: int, num_shares: int, modulus: int = 2**32
    ) -> List[int]:
        """Split a value into n additive shares."""
        shares = [secrets.randbelow(modulus) for _ in range(num_shares - 1)]
        # Last share ensures all shares sum to the original value
        last_share = (value - sum(shares)) % modulus
        shares.append(last_share)
        return shares
 
    @staticmethod
    def reconstruct(shares: List[int], modulus: int = 2**32) -> int:
        """Reconstruct value from all shares."""
        return sum(shares) % modulus
 
    @staticmethod
    def generate_pairwise_masks(
        client_id: int, peer_ids: List[int], seed_base: bytes, dimension: int
    ) -> List[int]:
        """Generate pairwise canceling masks between clients."""
        masks = [0] * dimension
        for peer_id in peer_ids:
            # Deterministic seed from ordered pair
            pair = (min(client_id, peer_id), max(client_id, peer_id))
            seed = hashlib.sha256(seed_base + str(pair).encode()).digest()
            # Generate pseudo-random mask from seed
            rng_state = int.from_bytes(seed[:8], "big")
            sign = 1 if client_id < peer_id else -1
            for i in range(dimension):
                rng_state = (rng_state * 6364136223846793005 + 1) % (2**64)
                masks[i] += sign * (rng_state % (2**32))
        return masks

Bonawitz et al. described a practical secure aggregation protocol in "Practical Secure Aggregation for Privacy-Preserving Machine Learning" (CCS 2017) that handles client dropouts and scales to thousands of participants. Google deployed this in production for Gboard federated learning.

Red Team Assessment Methodology

When red teaming a federated learning deployment, follow this structured approach:

Phase 1: Reconnaissance

Determine the FL framework in use (TensorFlow Federated, PySyft, Flower, NVIDIA FLARE), the aggregation algorithm, and client authentication mechanisms. Check if secure aggregation or differential privacy is enabled.

Phase 2: Client Compromise Simulation

Test what happens when a single client is compromised:

# Example: Testing with Flower framework
# Clone the target FL setup and modify a client
pip install flwr torch torchvision
 
# Run a poisoned client against the test aggregation server
python poisoned_client.py \
  --server-address="127.0.0.1:8080" \
  --attack-type="backdoor" \
  --target-label=7 \
  --scaling-factor=10 \
  --poison-ratio=0.3

Phase 3: Aggregation Robustness

Test the aggregation server's resilience to Byzantine clients. Progressively increase the number of malicious clients and measure:

Global model accuracy degradation
Backdoor attack success rate
Whether norm-based or distance-based anomaly detection triggers

Phase 4: Privacy Leakage Assessment

Attempt gradient reconstruction attacks against the communication channel. Measure reconstruction quality at different batch sizes and with different levels of DP noise.

def assess_privacy_leakage(
    model: torch.nn.Module,
    client_gradients: List[torch.Tensor],
    original_data: torch.Tensor,
    original_labels: torch.Tensor,
) -> dict:
    """Assess how much private data can be reconstructed from gradients."""
    attack = GradientLeakageAttack(model)
    reconstructed_data, reconstructed_labels = attack.attack(
        observed_gradients=client_gradients,
        input_shape=original_data.shape[1:],
        num_classes=10,
        num_iterations=500,
    )
 
    # Measure reconstruction quality
    mse = ((reconstructed_data - original_data) ** 2).mean().item()
    label_match = (reconstructed_labels == original_labels).float().mean().item()
 
    return {
        "reconstruction_mse": mse,
        "label_accuracy": label_match,
        "privacy_risk": "high" if mse < 0.1 else "medium" if mse < 0.5 else "low",
    }

Real-World Considerations

Framework-Specific Vulnerabilities

Production FL frameworks have their own security considerations:

NVIDIA FLARE: Supports secure aggregation and client authentication via mutual TLS. However, the default configuration may not enable all security features. Red teams should verify that overseer and project.yml enforce authentication.
Flower (flwr): Flexible but security is largely the deployer's responsibility. The gRPC channel should be encrypted, and custom aggregation strategies should be audited for robustness.
PySyft: Provides differential privacy and secure computation primitives but requires careful configuration to achieve meaningful privacy guarantees.

Regulatory Implications

Federated learning is often deployed specifically to comply with data protection regulations (GDPR, HIPAA). If gradient leakage attacks can reconstruct protected data, the FL deployment may not actually achieve regulatory compliance despite the architectural separation of data. This is a critical finding for red team reports — the organization may have a false sense of privacy protection.

Defensive Recommendations

Enable secure aggregation to prevent gradient-based data reconstruction
Apply differential privacy with a formally verified privacy budget
Use robust aggregation (e.g., trimmed mean or Krum) instead of plain FedAvg
Implement client authentication with hardware attestation where possible
Monitor update distributions for anomalous contributions that may indicate poisoning
Limit client influence by capping the norm of individual updates
Audit FL frameworks for default configurations that may not enable security features

References

McMahan et al. — "Communication-Efficient Learning of Deep Networks from Decentralized Data" (AISTATS 2017) — foundational FedAvg algorithm
Bagdasaryan et al. — "How To Back Door Federated Learning" (AISTATS 2020) — targeted backdoor attacks via model replacement
Zhu et al. — "Deep Leakage from Gradients" (NeurIPS 2019) — gradient-based data reconstruction
Fang et al. — "Local Model Poisoning Attacks to Byzantine-Robust Federated Learning" (USENIX Security 2020) — adaptive attacks against robust aggregation
Bonawitz et al. — "Practical Secure Aggregation for Privacy-Preserving Machine Learning" (CCS 2017) — secure aggregation protocol
Geiping et al. — "Inverting Gradients - How Easy Is It to Break Privacy in Federated Learning?" (NeurIPS 2020) — improved gradient inversion
MITRE ATLAS — AML.T0020 (Poisoning Training Data), AML.T0010 (ML Supply Chain Compromise)
NVIDIA FLARE — https://nvidia.github.io/NVFlare/
Flower Framework — https://flower.ai/

Edit this page on GitHub

Federated Learning Security

advanced18 min readUpdated 2026-03-20

Security attacks on federated learning systems including model poisoning, data inference, and Byzantine fault exploitation.

infrastructure federated-learning model-poisoning privacy

Overview

Federated Learning Architecture

How Federated Averaging Works

The server sends the current global model to a subset of clients
Each client trains the model on local data for several epochs
Clients send updated model parameters (or gradients) back to the server
The server aggregates updates, typically by weighted averaging
The process repeats until convergence

import torch
import torch.nn as nn
from typing import Dict, List, Tuple
import copy
 
class FederatedServer:
    """Simplified federated averaging server."""
 
    def __init__(self, global_model: nn.Module):
        self.global_model = global_model
        self.round_number = 0
 
    def aggregate(
        self, client_updates: List[Dict[str, torch.Tensor]], weights: List[float]
    ) -> None:
        """Aggregate client updates using weighted averaging."""
        total_weight = sum(weights)
        global_state = self.global_model.state_dict()
 
        # Initialize aggregated state with zeros
        aggregated = {
            key: torch.zeros_like(param) for key, param in global_state.items()
        }
 
        # Weighted average of all client updates
        for update, weight in zip(client_updates, weights):
            for key in aggregated:
                aggregated[key] += (weight / total_weight) * update[key]
 
        self.global_model.load_state_dict(aggregated)
        self.round_number += 1
 
    def distribute_model(self) -> Dict[str, torch.Tensor]:
        """Send current global model to clients."""
        return copy.deepcopy(self.global_model.state_dict())

Attack Surface Map

The federated learning architecture exposes attack surfaces at every component:

Component	Attack Surface	Example Attacks
Client devices	Local model training	Data poisoning, model replacement
Communication channel	Gradient/weight transmission	Gradient interception, man-in-the-middle
Aggregation server	Update combination logic	Byzantine attacks, aggregation manipulation
Global model	Shared parameters	Backdoor injection, model inversion
Client selection	Participation protocol	Sybil attacks, free-riding

Model Poisoning Attacks

Untargeted Poisoning

import torch
import torch.nn as nn
from typing import Dict
 
class UntargetedPoisoner:
    """Untargeted model poisoning — degrades global model accuracy."""
 
    def __init__(self, model: nn.Module, amplification_factor: float = 10.0):
        self.model = model
        self.amplification_factor = amplification_factor
 
    def generate_poisoned_update(
        self, honest_update: Dict[str, torch.Tensor]
    ) -> Dict[str, torch.Tensor]:
        """Generate a poisoned update by negating and amplifying honest gradients."""
        poisoned = {}
        for key, param in honest_update.items():
            # Negate the gradient direction and amplify
            poisoned[key] = -self.amplification_factor * param
        return poisoned
 
    def generate_noise_update(
        self, reference_update: Dict[str, torch.Tensor]
    ) -> Dict[str, torch.Tensor]:
        """Generate random noise update scaled to match legitimate update norms."""
        poisoned = {}
        for key, param in reference_update.items():
            noise = torch.randn_like(param)
            # Scale noise to match the norm of the legitimate update
            scale = param.norm() / (noise.norm() + 1e-8)
            poisoned[key] = noise * scale
        return poisoned

Targeted Backdoor Attacks

The attack works by training a local model on poisoned data that includes a trigger pattern, then scaling the resulting update to overcome averaging with honest clients:

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
from typing import Dict, Optional, Tuple
import copy
 
class BackdoorPoisoner:
    """Targeted backdoor attack against federated learning."""
 
    def __init__(
        self,
        model: nn.Module,
        target_label: int,
        trigger_pattern: torch.Tensor,
        trigger_position: Tuple[int, int] = (0, 0),
        scaling_factor: float = 1.0,
    ):
        self.model = model
        self.target_label = target_label
        self.trigger_pattern = trigger_pattern
        self.trigger_position = trigger_position
        self.scaling_factor = scaling_factor
 
    def apply_trigger(self, images: torch.Tensor) -> torch.Tensor:
        """Stamp a trigger pattern onto images."""
        triggered = images.clone()
        h, w = self.trigger_pattern.shape[-2:]
        y, x = self.trigger_position
        triggered[:, :, y : y + h, x : x + w] = self.trigger_pattern
        return triggered
 
    def create_poisoned_batch(
        self, images: torch.Tensor, labels: torch.Tensor, poison_ratio: float = 0.5
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """Create a mixed batch of clean and poisoned samples."""
        batch_size = images.size(0)
        num_poison = int(batch_size * poison_ratio)
 
        poisoned_images = images.clone()
        poisoned_labels = labels.clone()
 
        # Apply trigger and change labels for poisoned subset
        poisoned_images[:num_poison] = self.apply_trigger(images[:num_poison])
        poisoned_labels[:num_poison] = self.target_label
 
        return poisoned_images, poisoned_labels
 
    def train_and_scale(
        self,
        global_model_state: Dict[str, torch.Tensor],
        poisoned_loader: DataLoader,
        num_epochs: int = 5,
        lr: float = 0.01,
        num_total_clients: int = 10,
    ) -> Dict[str, torch.Tensor]:
        """Train on poisoned data and scale update to survive aggregation."""
        local_model = copy.deepcopy(self.model)
        local_model.load_state_dict(global_model_state)
 
        optimizer = torch.optim.SGD(local_model.parameters(), lr=lr)
        criterion = nn.CrossEntropyLoss()
 
        local_model.train()
        for epoch in range(num_epochs):
            for images, labels in poisoned_loader:
                poisoned_images, poisoned_labels = self.create_poisoned_batch(
                    images, labels
                )
                optimizer.zero_grad()
                outputs = local_model(poisoned_images)
                loss = criterion(outputs, poisoned_labels)
                loss.backward()
                optimizer.step()
 
        # Scale the update so it dominates the aggregation
        # After averaging n clients, the poisoned update contribution is 1/n
        # Scaling by n ensures the backdoor survives averaging
        poisoned_state = local_model.state_dict()
        scaled_update = {}
        for key in global_model_state:
            delta = poisoned_state[key] - global_model_state[key]
            scaled_update[key] = (
                global_model_state[key]
                + delta * num_total_clients * self.scaling_factor
            )
 
        return scaled_update

Gradient-Based Data Reconstruction

Deep Leakage from Gradients

The attack works by optimizing a dummy input to produce gradients that match the observed gradients:

import torch
import torch.nn as nn
from typing import List, Tuple, Optional
 
class GradientLeakageAttack:
    """Reconstruct private training data from observed gradients."""
 
    def __init__(self, model: nn.Module, device: str = "cpu"):
        self.model = model
        self.device = device
 
    def attack(
        self,
        observed_gradients: List[torch.Tensor],
        input_shape: Tuple[int, ...],
        num_classes: int,
        num_iterations: int = 300,
        lr: float = 1.0,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Reconstruct training data from observed gradients.
 
        Args:
            observed_gradients: List of gradient tensors observed from the client
            input_shape: Shape of the input data (e.g., (3, 32, 32) for CIFAR-10)
            num_classes: Number of output classes
            num_iterations: Optimization iterations
            lr: Learning rate for reconstruction
 
        Returns:
            Tuple of (reconstructed_data, reconstructed_labels)
        """
        # Initialize random dummy data and labels
        dummy_data = torch.randn(1, *input_shape, device=self.device, requires_grad=True)
        dummy_label = torch.randn(1, num_classes, device=self.device, requires_grad=True)
 
        optimizer = torch.optim.LBFGS([dummy_data, dummy_label], lr=lr)
        criterion = nn.CrossEntropyLoss()
 
        history = []
 
        for iteration in range(num_iterations):
            def closure():
                optimizer.zero_grad()
                self.model.zero_grad()
 
                # Forward pass with dummy data
                pred = self.model(dummy_data)
                dummy_loss = criterion(pred, torch.softmax(dummy_label, dim=-1))
 
                # Compute gradients of dummy data
                dummy_gradients = torch.autograd.grad(
                    dummy_loss, self.model.parameters(), create_graph=True
                )
 
                # Minimize distance between dummy and observed gradients
                gradient_diff = sum(
                    ((dg - og) ** 2).sum()
                    for dg, og in zip(dummy_gradients, observed_gradients)
                )
 
                gradient_diff.backward()
                return gradient_diff
 
            loss = optimizer.step(closure)
            if iteration % 50 == 0:
                history.append(loss.item())
 
        reconstructed_label = torch.argmax(dummy_label, dim=-1)
        return dummy_data.detach(), reconstructed_label.detach()

Membership Inference in Federated Settings

import torch
import torch.nn as nn
from typing import Dict, List
 
class FederatedMembershipInference:
    """Determine if a specific sample was in a client's training data."""
 
    def __init__(self, model: nn.Module, threshold: float = 0.5):
        self.model = model
        self.threshold = threshold
 
    def compute_loss_change(
        self,
        target_sample: torch.Tensor,
        target_label: torch.Tensor,
        model_before: Dict[str, torch.Tensor],
        model_after: Dict[str, torch.Tensor],
    ) -> float:
        """Measure how much the loss on a target sample changed after a client update."""
        criterion = nn.CrossEntropyLoss()
 
        # Loss before client update
        self.model.load_state_dict(model_before)
        self.model.eval()
        with torch.no_grad():
            loss_before = criterion(
                self.model(target_sample.unsqueeze(0)), target_label.unsqueeze(0)
            ).item()
 
        # Loss after client update
        self.model.load_state_dict(model_after)
        self.model.eval()
        with torch.no_grad():
            loss_after = criterion(
                self.model(target_sample.unsqueeze(0)), target_label.unsqueeze(0)
            ).item()
 
        return loss_before - loss_after
 
    def infer_membership(
        self,
        target_samples: List[torch.Tensor],
        target_labels: List[torch.Tensor],
        model_before: Dict[str, torch.Tensor],
        model_after: Dict[str, torch.Tensor],
    ) -> List[bool]:
        """Infer membership for multiple samples."""
        results = []
        for sample, label in zip(target_samples, target_labels):
            loss_change = self.compute_loss_change(
                sample, label, model_before, model_after
            )
            # Large positive loss change suggests the sample was in the training set
            results.append(loss_change > self.threshold)
        return results

Sybil and Free-Rider Attacks

Sybil Attacks

import torch
import torch.nn as nn
from typing import Dict, List
import copy
 
class SybilAttacker:
    """Launch a Sybil attack by controlling multiple federated clients."""
 
    def __init__(
        self,
        num_sybil_clients: int,
        poisoned_update_fn,  # callable that generates a poisoned update
    ):
        self.num_sybil_clients = num_sybil_clients
        self.poisoned_update_fn = poisoned_update_fn
 
    def generate_sybil_updates(
        self,
        global_model_state: Dict[str, torch.Tensor],
        add_noise: bool = True,
        noise_scale: float = 0.01,
    ) -> List[Dict[str, torch.Tensor]]:
        """Generate multiple slightly varied poisoned updates to avoid detection."""
        base_update = self.poisoned_update_fn(global_model_state)
        updates = []
 
        for i in range(self.num_sybil_clients):
            if add_noise and i > 0:
                # Add small random perturbations so updates are not identical
                noisy_update = {}
                for key, param in base_update.items():
                    noise = torch.randn_like(param) * noise_scale * param.abs().mean()
                    noisy_update[key] = param + noise
                updates.append(noisy_update)
            else:
                updates.append(copy.deepcopy(base_update))
 
        return updates

Free-Rider Attacks

Byzantine-Resilient Aggregation

Robust Aggregation Rules

Standard FedAvg is highly vulnerable to even a single malicious client. Byzantine-resilient aggregation rules aim to produce correct results even when a fraction of clients are adversarial.

import torch
from typing import Dict, List
 
class RobustAggregator:
    """Byzantine-resilient aggregation methods for federated learning."""
 
    @staticmethod
    def coordinate_wise_median(
        updates: List[Dict[str, torch.Tensor]],
    ) -> Dict[str, torch.Tensor]:
        """Coordinate-wise median aggregation (Yin et al., 2018)."""
        keys = updates[0].keys()
        aggregated = {}
 
        for key in keys:
            stacked = torch.stack([u[key] for u in updates])
            aggregated[key] = torch.median(stacked, dim=0).values
 
        return aggregated
 
    @staticmethod
    def trimmed_mean(
        updates: List[Dict[str, torch.Tensor]], trim_ratio: float = 0.1
    ) -> Dict[str, torch.Tensor]:
        """Trimmed mean — remove extreme values before averaging."""
        keys = updates[0].keys()
        aggregated = {}
        n = len(updates)
        trim_count = int(n * trim_ratio)
 
        for key in keys:
            stacked = torch.stack([u[key] for u in updates])
            sorted_updates, _ = torch.sort(stacked, dim=0)
            # Remove top and bottom trim_count values
            trimmed = sorted_updates[trim_count : n - trim_count]
            aggregated[key] = trimmed.mean(dim=0)
 
        return aggregated
 
    @staticmethod
    def krum(
        updates: List[Dict[str, torch.Tensor]], num_byzantine: int
    ) -> Dict[str, torch.Tensor]:
        """
        Multi-Krum aggregation (Blanchard et al., 2017).
        Selects the update closest to its neighbors.
        """
        n = len(updates)
        num_select = n - num_byzantine - 2
 
        # Flatten updates for distance computation
        flat_updates = []
        for update in updates:
            flat = torch.cat([param.flatten() for param in update.values()])
            flat_updates.append(flat)
 
        # Compute pairwise distances
        distances = torch.zeros(n, n)
        for i in range(n):
            for j in range(i + 1, n):
                dist = (flat_updates[i] - flat_updates[j]).norm()
                distances[i][j] = dist
                distances[j][i] = dist
 
        # For each update, sum distances to closest neighbors
        scores = []
        for i in range(n):
            sorted_dists, _ = torch.sort(distances[i])
            # Sum of distances to num_select closest neighbors (excluding self)
            score = sorted_dists[1 : num_select + 1].sum()
            scores.append(score)
 
        # Select the update with the lowest score
        best_idx = int(torch.argmin(torch.tensor(scores)))
        return updates[best_idx]

Differential Privacy in Federated Learning

import torch
from typing import Dict
 
class DifferentiallyPrivateAggregator:
    """Federated aggregation with differential privacy guarantees."""
 
    def __init__(
        self,
        clip_norm: float = 1.0,
        noise_multiplier: float = 1.0,
        num_clients_per_round: int = 10,
    ):
        self.clip_norm = clip_norm
        self.noise_multiplier = noise_multiplier
        self.num_clients_per_round = num_clients_per_round
 
    def clip_update(
        self, update: Dict[str, torch.Tensor]
    ) -> Dict[str, torch.Tensor]:
        """Clip update to bounded L2 norm."""
        flat = torch.cat([param.flatten() for param in update.values()])
        total_norm = flat.norm()
 
        clip_factor = min(1.0, self.clip_norm / (total_norm + 1e-8))
        return {key: param * clip_factor for key, param in update.items()}
 
    def aggregate_with_dp(
        self, updates: list[Dict[str, torch.Tensor]]
    ) -> Dict[str, torch.Tensor]:
        """Aggregate with clipping and noise addition."""
        clipped_updates = [self.clip_update(u) for u in updates]
 
        # Average clipped updates
        keys = clipped_updates[0].keys()
        aggregated = {}
        n = len(clipped_updates)
 
        for key in keys:
            summed = sum(u[key] for u in clipped_updates)
            averaged = summed / n
 
            # Add calibrated Gaussian noise
            noise_std = (
                self.clip_norm * self.noise_multiplier / n
            )
            noise = torch.randn_like(averaged) * noise_std
            aggregated[key] = averaged + noise
 
        return aggregated

Secure Aggregation Protocols

import secrets
import hashlib
from typing import List, Tuple
 
class SimpleSecretSharing:
    """
    Simplified additive secret sharing for secure aggregation.
    In production, use established libraries like TF Federated or PySyft.
    """
 
    @staticmethod
    def create_shares(
        value: int, num_shares: int, modulus: int = 2**32
    ) -> List[int]:
        """Split a value into n additive shares."""
        shares = [secrets.randbelow(modulus) for _ in range(num_shares - 1)]
        # Last share ensures all shares sum to the original value
        last_share = (value - sum(shares)) % modulus
        shares.append(last_share)
        return shares
 
    @staticmethod
    def reconstruct(shares: List[int], modulus: int = 2**32) -> int:
        """Reconstruct value from all shares."""
        return sum(shares) % modulus
 
    @staticmethod
    def generate_pairwise_masks(
        client_id: int, peer_ids: List[int], seed_base: bytes, dimension: int
    ) -> List[int]:
        """Generate pairwise canceling masks between clients."""
        masks = [0] * dimension
        for peer_id in peer_ids:
            # Deterministic seed from ordered pair
            pair = (min(client_id, peer_id), max(client_id, peer_id))
            seed = hashlib.sha256(seed_base + str(pair).encode()).digest()
            # Generate pseudo-random mask from seed
            rng_state = int.from_bytes(seed[:8], "big")
            sign = 1 if client_id < peer_id else -1
            for i in range(dimension):
                rng_state = (rng_state * 6364136223846793005 + 1) % (2**64)
                masks[i] += sign * (rng_state % (2**32))
        return masks

Red Team Assessment Methodology

When red teaming a federated learning deployment, follow this structured approach:

Phase 1: Reconnaissance

Phase 2: Client Compromise Simulation

Test what happens when a single client is compromised:

# Example: Testing with Flower framework
# Clone the target FL setup and modify a client
pip install flwr torch torchvision
 
# Run a poisoned client against the test aggregation server
python poisoned_client.py \
  --server-address="127.0.0.1:8080" \
  --attack-type="backdoor" \
  --target-label=7 \
  --scaling-factor=10 \
  --poison-ratio=0.3

Phase 3: Aggregation Robustness

Test the aggregation server's resilience to Byzantine clients. Progressively increase the number of malicious clients and measure:

Global model accuracy degradation
Backdoor attack success rate
Whether norm-based or distance-based anomaly detection triggers

Phase 4: Privacy Leakage Assessment

Attempt gradient reconstruction attacks against the communication channel. Measure reconstruction quality at different batch sizes and with different levels of DP noise.

def assess_privacy_leakage(
    model: torch.nn.Module,
    client_gradients: List[torch.Tensor],
    original_data: torch.Tensor,
    original_labels: torch.Tensor,
) -> dict:
    """Assess how much private data can be reconstructed from gradients."""
    attack = GradientLeakageAttack(model)
    reconstructed_data, reconstructed_labels = attack.attack(
        observed_gradients=client_gradients,
        input_shape=original_data.shape[1:],
        num_classes=10,
        num_iterations=500,
    )
 
    # Measure reconstruction quality
    mse = ((reconstructed_data - original_data) ** 2).mean().item()
    label_match = (reconstructed_labels == original_labels).float().mean().item()
 
    return {
        "reconstruction_mse": mse,
        "label_accuracy": label_match,
        "privacy_risk": "high" if mse < 0.1 else "medium" if mse < 0.5 else "low",
    }

Real-World Considerations

Framework-Specific Vulnerabilities

Production FL frameworks have their own security considerations:

NVIDIA FLARE: Supports secure aggregation and client authentication via mutual TLS. However, the default configuration may not enable all security features. Red teams should verify that overseer and project.yml enforce authentication.
Flower (flwr): Flexible but security is largely the deployer's responsibility. The gRPC channel should be encrypted, and custom aggregation strategies should be audited for robustness.
PySyft: Provides differential privacy and secure computation primitives but requires careful configuration to achieve meaningful privacy guarantees.

Regulatory Implications

Defensive Recommendations

Enable secure aggregation to prevent gradient-based data reconstruction
Apply differential privacy with a formally verified privacy budget
Use robust aggregation (e.g., trimmed mean or Krum) instead of plain FedAvg
Implement client authentication with hardware attestation where possible
Monitor update distributions for anomalous contributions that may indicate poisoning
Limit client influence by capping the norm of individual updates
Audit FL frameworks for default configurations that may not enable security features

References

McMahan et al. — "Communication-Efficient Learning of Deep Networks from Decentralized Data" (AISTATS 2017) — foundational FedAvg algorithm
Bagdasaryan et al. — "How To Back Door Federated Learning" (AISTATS 2020) — targeted backdoor attacks via model replacement
Zhu et al. — "Deep Leakage from Gradients" (NeurIPS 2019) — gradient-based data reconstruction
Fang et al. — "Local Model Poisoning Attacks to Byzantine-Robust Federated Learning" (USENIX Security 2020) — adaptive attacks against robust aggregation
Bonawitz et al. — "Practical Secure Aggregation for Privacy-Preserving Machine Learning" (CCS 2017) — secure aggregation protocol
Geiping et al. — "Inverting Gradients - How Easy Is It to Break Privacy in Federated Learning?" (NeurIPS 2020) — improved gradient inversion
MITRE ATLAS — AML.T0020 (Poisoning Training Data), AML.T0010 (ML Supply Chain Compromise)
NVIDIA FLARE — https://nvidia.github.io/NVFlare/
Flower Framework — https://flower.ai/

Edit this page on GitHub

Federated Learning Security

Related articles

Federated Learning Security

Related articles