Detecting timing vulnerabilities is not a passive exercise. It requires actively searching for statistical anomalies in system response times. While defenders use these techniques for monitoring and alerting, as a red teamer, you use them for reconnaissance—turning the system’s own processing delays into a covert channel for information exfiltration.
Your goal is to prove that a system’s response time correlates with some secret internal state. This is achieved by meticulously measuring, comparing, and validating timing variations under controlled conditions.
The Core Principle: Differential Analysis
At its heart, timing attack detection is a form of differential analysis. You are not interested in the absolute response time of a single request. Instead, you are looking for a consistent, measurable difference in response times between two or more classes of inputs. One class acts as your baseline or control, while the other is designed to trigger a specific, time-consuming execution path.
If you can demonstrate a statistically significant difference, you have found a timing side channel. The challenge lies in isolating this signal from the inherent noise of network latency and variable server load.
Active Probing: A Red Teamer’s Approach
Active probing is the process of methodically querying an endpoint with specially crafted payloads to induce and measure timing discrepancies. This is a multi-step process requiring careful setup and analysis.
Step 1: Establish a Performance Baseline
Before searching for anomalies, you must understand what “normal” looks like. Send a large number of identical, simple requests to the target endpoint. These requests should ideally trigger a fast, default execution path. For example, when testing a login endpoint, your baseline could be requests with non-existent usernames.
The goal is to collect enough data points (hundreds or thousands) to build a statistical profile of the system’s response time under normal conditions, including its mean, median, and standard deviation.
Step 2: Design and Deploy Probing Payloads
Next, design payloads that you hypothesize will trigger a different, potentially longer, execution path. Continuing the login example, your probing payload might be a valid username paired with an incorrect password. The hypothesis is that the system takes longer to verify a password for a valid user than it does to reject an invalid user outright.
| Payload Type | Hypothesized Execution Path | Expected Relative Timing |
|---|---|---|
| Baseline Payload (e.g., invalid username) |
User lookup fails early, immediate rejection. | Fastest |
| Probing Payload (e.g., valid user, wrong password) |
User lookup succeeds, proceeds to password hash comparison (computationally expensive). | Slower |
| Validation Payload (e.g., valid credentials) |
User lookup and password check succeed, session token generated. | Potentially slowest |
You then send a large number of these probing payloads, measuring the response time for each one, just as you did for the baseline.
Step 3: Scripting the Measurement
Manual testing is impractical. You need to script this process to gather a statistically significant sample size and minimize measurement error. The following Python script demonstrates a basic framework for this task.
import requests import numpy as np import time TARGET_URL = "https://api.example.com/login" SAMPLE_SIZE = 500 # --- Baseline: Non-existent user --- baseline_times = [] for _ in range(SAMPLE_SIZE): payload = {'user': 'no-such-user-xyz', 'pass': '123'} start_time = time.perf_counter() requests.post(TARGET_URL, json=payload) end_time = time.perf_counter() baseline_times.append((end_time - start_time) * 1000) # in ms # --- Probe: Potentially valid user --- probe_times = [] for _ in range(SAMPLE_SIZE): payload = {'user': 'admin', 'pass': 'wrongpassword'} start_time = time.perf_counter() requests.post(TARGET_URL, json=payload) end_time = time.perf_counter() probe_times.append((end_time - start_time) * 1000) # in ms print(f"Baseline Avg: {np.mean(baseline_times):.2f}ms (StdDev: {np.std(baseline_times):.2f})") print(f"Probe Avg: {np.mean(probe_times):.2f}ms (StdDev: {np.std(probe_times):.2f})")
Interpreting the Results: Signal vs. Noise
Once you’ve collected the data, the final step is analysis. A simple comparison of averages is a good start, but a visual representation is far more powerful. Plotting the distributions of response times for both baseline and probe requests can reveal a vulnerability instantly.
In the diagram above, the two distributions are clearly separated. The average response time for probe requests is consistently higher than for baseline requests. This separation (Δt) is the timing side channel. Even if the distributions overlap slightly, a statistical test (like a Student’s t-test) can confirm whether the difference in means is significant or merely due to random chance. A low p-value (typically < 0.05) would provide strong evidence of a vulnerability.