Information theory, pioneered by Claude Shannon, provides a mathematical framework for quantifying information, uncertainty, and randomness. In the context of AI security, these concepts are not merely academic; they are fundamental to understanding data privacy, information leakage, model robustness, and the very nature of adversarial attacks. By measuring information, you can begin to measure risk.
Entropy (Self-Information)
Entropy, denoted as H(X), measures the average level of “surprise” or uncertainty inherent in a random variable’s possible outcomes. A variable with high entropy is unpredictable (like a fair coin flip), while one with low entropy is predictable (like a biased coin that lands on heads 99% of the time).
For a discrete random variable X with possible outcomes x1, …, xn and probabilities p(xi), entropy is calculated as:
The base of the logarithm, b, determines the units. If b=2, the unit is bits. If b=e, the unit is nats. In AI, you’ll often see both.
# Python example for calculating entropy in bits
import numpy as np
def entropy(probabilities):
"""Calculates the Shannon entropy of a probability distribution."""
# Filter out zero-probability events to avoid log(0)
p = np.array(probabilities)
p = p[p > 0]
return -np.sum(p * np.log2(p))
# A fair coin: maximum uncertainty
fair_coin_probs = [0.5, 0.5]
print(f"Entropy of a fair coin: {entropy(fair_coin_probs):.2f} bits") # Output: 1.00 bits
# A heavily biased coin: low uncertainty
biased_coin_probs = [0.99, 0.01]
print(f"Entropy of a biased coin: {entropy(biased_coin_probs):.2f} bits") # Output: 0.08 bits
Relationships Between Variables
While entropy describes a single variable, much of AI security involves understanding the relationships between multiple variables, such as model inputs, outputs, and private training data. The following concepts are crucial for this analysis.
Visual relationship between Entropy, Conditional Entropy, Mutual Information, and Joint Entropy.
- Joint Entropy H(X, Y): Measures the total uncertainty of a pair of variables (X, Y). It’s the uncertainty in predicting both outcomes simultaneously.
- Conditional Entropy H(X|Y): Measures the remaining uncertainty about variable X *after* you have observed variable Y. If Y tells you a lot about X, H(X|Y) will be low. This is the cornerstone of measuring information leakage.
- Mutual Information I(X; Y): Quantifies the information that X and Y share. It measures how much knowing one variable reduces uncertainty about the other. If X and Y are independent, their mutual information is zero. In AI security, you might measure I(Training Data; Model Gradients) to assess privacy risks.
These concepts are related by the chain rule of entropy:
And mutual information can be defined using entropy:
Measuring Divergence Between Distributions
In many red teaming scenarios, you need to compare probability distributions. For example, how different is a model’s output distribution from the true data distribution? Or how much does an adversarial perturbation alter the statistical properties of an input?
Kullback-Leibler (KL) Divergence
KL Divergence, DKL(P || Q), measures how much a probability distribution P diverges from a reference distribution Q. It’s an asymmetric measure—DKL(P || Q) is not the same as DKL(Q || P). It is often described as the “information gain” from using P instead of Q to model the data.
In adversarial ML, an attacker may seek to create a perturbed input whose distribution P’ is very close to the original input distribution P (low DKL(P’ || P)) but causes a large change in the model’s output distribution.
Cross-Entropy
Cross-entropy, H(P, Q), is closely related to KL divergence and is one of the most common loss functions in machine learning, particularly for classification. It measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is based on a given probability distribution Q, rather than the true distribution P.
The relationship to KL divergence is simple and revealing: H(P, Q) = H(P) + DKL(P || Q). Since the entropy of the true distribution, H(P), is a constant, minimizing the cross-entropy loss during model training is equivalent to minimizing the KL divergence between the model’s predicted distribution (Q) and the true data distribution (P).
Summary Table of Concepts
This table serves as a quick reference for the core information-theoretic concepts and their direct relevance to AI red teaming activities.
| Concept | Formula (Discrete) | Application in AI Security & Red Teaming |
|---|---|---|
| Entropy | H(X) = – ∑ p(x) log p(x) | Measures uncertainty in model predictions. High entropy can indicate low confidence or an out-of-distribution sample. |
| Conditional Entropy | H(X|Y) = – ∑ p(x,y) log p(x|y) | Quantifies remaining uncertainty. A low H(PrivateData | ModelOutput) implies significant information leakage. |
| Mutual Information | I(X;Y) = H(X) – H(X|Y) | Measures information leakage directly. Used to quantify the risk of membership inference and attribute inference attacks. |
| KL Divergence | DKL(P||Q) = ∑ p(x) log(p(x)/q(x)) | Measures the “distance” between an original input distribution and a perturbed one. Also used to detect distributional shifts or model poisoning. |
| Cross-Entropy | H(P,Q) = – ∑ p(x) log q(x) | The basis for log loss. Analyzing high cross-entropy errors can reveal model weaknesses and misclassifications exploitable by adversaries. |