32.5.4 Timing attack detection

2025.10.06.
AI Security Blog

Detecting timing vulnerabilities is not a passive exercise. It requires actively searching for statistical anomalies in system response times. While defenders use these techniques for monitoring and alerting, as a red teamer, you use them for reconnaissance—turning the system’s own processing delays into a covert channel for information exfiltration.

Your goal is to prove that a system’s response time correlates with some secret internal state. This is achieved by meticulously measuring, comparing, and validating timing variations under controlled conditions.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

The Core Principle: Differential Analysis

At its heart, timing attack detection is a form of differential analysis. You are not interested in the absolute response time of a single request. Instead, you are looking for a consistent, measurable difference in response times between two or more classes of inputs. One class acts as your baseline or control, while the other is designed to trigger a specific, time-consuming execution path.

If you can demonstrate a statistically significant difference, you have found a timing side channel. The challenge lies in isolating this signal from the inherent noise of network latency and variable server load.

Active Probing: A Red Teamer’s Approach

Active probing is the process of methodically querying an endpoint with specially crafted payloads to induce and measure timing discrepancies. This is a multi-step process requiring careful setup and analysis.

Step 1: Establish a Performance Baseline

Before searching for anomalies, you must understand what “normal” looks like. Send a large number of identical, simple requests to the target endpoint. These requests should ideally trigger a fast, default execution path. For example, when testing a login endpoint, your baseline could be requests with non-existent usernames.

The goal is to collect enough data points (hundreds or thousands) to build a statistical profile of the system’s response time under normal conditions, including its mean, median, and standard deviation.

Step 2: Design and Deploy Probing Payloads

Next, design payloads that you hypothesize will trigger a different, potentially longer, execution path. Continuing the login example, your probing payload might be a valid username paired with an incorrect password. The hypothesis is that the system takes longer to verify a password for a valid user than it does to reject an invalid user outright.

Payload Type Hypothesized Execution Path Expected Relative Timing
Baseline Payload
(e.g., invalid username)
User lookup fails early, immediate rejection. Fastest
Probing Payload
(e.g., valid user, wrong password)
User lookup succeeds, proceeds to password hash comparison (computationally expensive). Slower
Validation Payload
(e.g., valid credentials)
User lookup and password check succeed, session token generated. Potentially slowest

You then send a large number of these probing payloads, measuring the response time for each one, just as you did for the baseline.

Step 3: Scripting the Measurement

Manual testing is impractical. You need to script this process to gather a statistically significant sample size and minimize measurement error. The following Python script demonstrates a basic framework for this task.

import requests
import numpy as np
import time

TARGET_URL = "https://api.example.com/login"
SAMPLE_SIZE = 500

# --- Baseline: Non-existent user ---
baseline_times = []
for _ in range(SAMPLE_SIZE):
    payload = {'user': 'no-such-user-xyz', 'pass': '123'}
    start_time = time.perf_counter()
    requests.post(TARGET_URL, json=payload)
    end_time = time.perf_counter()
    baseline_times.append((end_time - start_time) * 1000) # in ms

# --- Probe: Potentially valid user ---
probe_times = []
for _ in range(SAMPLE_SIZE):
    payload = {'user': 'admin', 'pass': 'wrongpassword'}
    start_time = time.perf_counter()
    requests.post(TARGET_URL, json=payload)
    end_time = time.perf_counter()
    probe_times.append((end_time - start_time) * 1000) # in ms

print(f"Baseline Avg: {np.mean(baseline_times):.2f}ms (StdDev: {np.std(baseline_times):.2f})")
print(f"Probe Avg:    {np.mean(probe_times):.2f}ms (StdDev: {np.std(probe_times):.2f})")

Interpreting the Results: Signal vs. Noise

Once you’ve collected the data, the final step is analysis. A simple comparison of averages is a good start, but a visual representation is far more powerful. Plotting the distributions of response times for both baseline and probe requests can reveal a vulnerability instantly.

Response Time Distribution Analysis Response Time (ms) Frequency Baseline Requests ~120ms Probe Requests ~180ms Δt ≈ 60ms (Statistically significant difference)

In the diagram above, the two distributions are clearly separated. The average response time for probe requests is consistently higher than for baseline requests. This separation (Δt) is the timing side channel. Even if the distributions overlap slightly, a statistical test (like a Student’s t-test) can confirm whether the difference in means is significant or merely due to random chance. A low p-value (typically < 0.05) would provide strong evidence of a vulnerability.