32.2.4 API endpoint rotation

2025.10.06.
AI Security Blog

When a single path is blocked, an attacker looks for alternative routes. API endpoint rotation applies this logic to bypass rate limits by treating an API not as a single chokepoint, but as a collection of individually monitored doorways. This technique exploits systems that enforce rate limits on a per-endpoint basis rather than globally per user or IP address.

The Principle: Exploiting Siloed Rate Limits

The core vulnerability enabling this technique is a common implementation flaw: siloed rate limiting. In this model, each API endpoint (e.g., /api/v1/infer, /api/v2/infer) has its own separate request counter and limit. If an API has multiple endpoints that perform the same or a similar function, you can distribute your requests across them, effectively multiplying your total allowed request rate.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

For example, if both /v1/query and /v2/query are limited to 100 requests per minute, rotating between them allows you to make up to 200 requests per minute from the same source IP without triggering a block.

Diagram comparing siloed vs. global rate limiting. Vulnerable: Siloed Rate Limiting User Rate Limit A (100 RPM) Rate Limit B (100 RPM) Endpoint A Endpoint B Total effective rate: 200 RPM Secure: Global Rate Limiting User Global Rate Limiter (100 RPM Total) Endpoint A Endpoint B Total effective rate: 100 RPM

Discovering and Leveraging Endpoint Variations

Your first task as a red teamer is reconnaissance. You need to identify endpoints that are functionally interchangeable or similar enough for your objective. These often fall into predictable patterns.

Variation Type Example A Example B Notes
API Versioning /api/v1/models/analyze /api/v2/models/analyze Legacy versions are often left active and may have less stringent limits.
Data Format /api/infer.json /api/infer.xml APIs sometimes support multiple response formats via different endpoints.
Geographic/Regional us-east-1.api.service.com/infer eu-west-1.api.service.com/infer Rate limits may be scoped to a specific regional deployment.
HTTP Method GET /item?id=123 POST /item {"id": "123"} Less common, but some APIs allow data retrieval via multiple methods.
Legacy or Deprecated /api/users/get_profile /api/v3/profiles/fetch Old endpoints may be forgotten but still functional, with separate limits.

Implementation Example

Once you identify a set of rotatable endpoints, implementation is straightforward. You create a list of target URLs and cycle through them for each request. This distributes the load and keeps you under the per-endpoint limit for a longer period.


import requests
import itertools

# List of functionally identical endpoints discovered during reconnaissance
endpoints = [
    "https://api.example.com/v1/generate",
    "https://api.example.com/v2/generate",
    "https://legacy-api.example.com/generate"
]

# Create an infinite cycle of the endpoints
endpoint_cycler = itertools.cycle(endpoints)

headers = {"Authorization": "Bearer YOUR_API_KEY"}
data_payload = {"prompt": "Tell me about large language models."}

# Send a large number of requests, rotating the endpoint each time
for i in range(500):
    target_url = next(endpoint_cycler)
    try:
        # Each request goes to the next endpoint in the cycle
        response = requests.post(target_url, json=data_payload, headers=headers)
        print(f"Request {i+1} to {target_url}: Status {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"Request {i+1} to {target_url} failed: {e}")

Implications for AI Systems

For AI and ML services, this technique is particularly damaging. It can accelerate attacks such as:

  • Model Inference Abuse: Making a high volume of API calls for unauthorized purposes, leading to significant computational costs for the provider.
  • Data Exfiltration: More rapidly extracting sensitive information from a model by cycling through endpoints that leak data.
  • Economic Denial of Service (EDoS): Intentionally driving up an organization’s cloud computing bill by forcing expensive model computations at a high rate.
  • Faster Enumeration: Quickly testing a model’s boundaries or searching for specific vulnerabilities (e.g., prompt injection payloads) by parallelizing requests across different endpoints.

Countermeasures: Unified Rate Limiting

The most effective defense against endpoint rotation is to implement a global or unified rate limiting strategy. Instead of tying a request counter to a specific URL path, the limit should be enforced on a higher-level identifier.

  1. User/API Key-Based Limiting: The primary defense. A single token or user account should have a single, global rate limit that is shared across all endpoints they can access. An API Gateway is the ideal place to implement this logic.
  2. IP-Based Limiting: As a secondary or fallback measure, enforce a global limit on requests from a single IP address. This helps mitigate abuse from unauthenticated users or a single compromised key.
  3. Centralized State Management: The rate limiting mechanism must use a centralized data store (like Redis or Memcached) to track request counts. If each web server or microservice tracks limits independently, the system remains vulnerable.
  4. API Design Consistency: Avoid creating multiple endpoints that perform the same function. If API versioning is necessary, ensure that rate limits are applied consistently across all active versions for a given user.