26.3.3 Performance profiling tools

2025.10.06.
AI Security Blog

Performance profiling extends beyond optimizing for speed. In a red teaming context, it’s about discovering computational and memory bottlenecks that can be exploited. A model that consumes excessive resources under specific inputs is vulnerable to denial-of-service (DoS) attacks. These tools help you quantify the operational cost of an AI system and identify its performance edge cases.

The Profiling Workflow

Your goal is to measure key performance indicators (KPIs) like latency, throughput, CPU usage, GPU utilization, and memory consumption during model inference. The general process involves instrumenting the code that executes the model, running your tests (including adversarial inputs), and analyzing the collected metrics.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Model Execution Profiling Tool Raw Metrics (Time, Mem, etc.) Analysis

Core Profiling Tools

The tools you choose will depend on the target hardware (CPU, GPU) and the level of granularity you need. Here are some standard, widely used options.

1. CPU and Execution Time: `cProfile`

Python’s built-in `cProfile` is a deterministic profiler that provides a detailed breakdown of function call times. It’s excellent for identifying which parts of your data preprocessing or model inference pipeline are the most time-consuming.

Use Case: Pinpointing slow functions that might be exacerbated by adversarial inputs, leading to a time-based DoS.

import cProfile
import pstats

# Assume 'run_inference(data)' is your target function
def profile_inference_task(data):
    profiler = cProfile.Profile()
    profiler.enable()
    
    # Execute the function you want to profile
    run_inference(data)
    
    profiler.disable()
    stats = pstats.Stats(profiler).sort_stats('cumulative')
    stats.print_stats(10) # Print the top 10 cumulative time consumers

# Example usage with some dummy data
# profile_inference_task(my_adversarial_sample)
            

2. Memory Usage: `memory-profiler`

This external Python library provides line-by-line memory consumption analysis. It’s invaluable for detecting memory leaks or operations that cause sudden spikes in RAM usage, which are prime targets for resource exhaustion attacks.

Use Case: Identifying if certain inputs cause the model to allocate excessive memory, potentially crashing the host system.

# You must install it first: pip install memory-profiler
from memory_profiler import profile

# Add the @profile decorator to the function to monitor
@profile
def load_and_run_model(data):
    # This simulates loading a large model into memory
    model = load_model_from_disk() 
    
    # Inference step
    result = model.predict(data)
    
    # The profiler will report memory usage for this function's scope
    return result

# When you run this script from the command line with a special flag,
# it will output the memory usage report.
# python -m memory_profiler your_script.py
            

3. GPU Performance: PyTorch Profiler & `nvidia-smi`

When models run on GPUs, CPU profilers are insufficient. You need tools that can inspect operations on the GPU itself.

PyTorch Profiler

Frameworks like PyTorch and TensorFlow have built-in profilers that can trace both CPU and GPU operations. The PyTorch profiler is a powerful tool for understanding kernel execution times, memory transfers between CPU and GPU, and operator-level performance.

Use Case: Analyzing how adversarial inputs affect specific GPU kernels or data transfer patterns. An attack might trigger an inefficient operation on the GPU, creating a bottleneck.

import torch
# model = YourPyTorchModel()
# inputs = your_input_tensor

with torch.profiler.profile(
    schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
    on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/model'),
    record_shapes=True,
    with_stack=True
) as prof:
    for step in range(10):
        model(inputs)
        prof.step() # Advance the profiler to the next step

# The results can be viewed in TensorBoard for detailed analysis.
            

NVIDIA System Management Interface (`nvidia-smi`)

This command-line utility provides real-time monitoring of NVIDIA GPU devices. It’s a quick way to check GPU utilization, memory usage, and temperature during an attack.

Use Case: A simple, high-level check to see if your attack is successfully stressing the GPU resources.

# Watch GPU status, updating every second
# Useful to run in a separate terminal during your test.
$ nvidia-smi -l 1

# Output shows GPU Util, Memory-Usage, Temp, Power, etc.
            

Summary of Tools for Red Teaming

Choosing the right tool depends on your specific objective. This table summarizes the primary tools and their relevance to AI red teaming.

Tool Profiling Target Use Case in Red Teaming Output Type
cProfile CPU Execution Time Identify algorithmic complexity attacks, time-based DoS vulnerabilities. Function call statistics (text report).
memory-profiler RAM / Memory Usage Detect memory leaks or inputs causing excessive memory allocation (resource exhaustion). Line-by-line memory usage (text report).
PyTorch/TF Profiler GPU Kernels, CPU/GPU Ops Detailed analysis of on-device performance; find inputs that trigger inefficient GPU operations. Trace files for visualization (e.g., TensorBoard, Chrome Tracing).
nvidia-smi System-level GPU High-level, real-time monitoring of GPU stress during an attack. Live command-line text output.
API Latency Scripts Network Latency Measure the performance impact of attacks against a hosted model API. Custom logs, timing statistics.

By integrating these profiling tools into the benchmark runner frameworks discussed previously, you can systematically measure the performance impact of various attacks, providing quantitative evidence of a system’s vulnerability to resource-based exploits.