Every attack, whether simulated or real, begins not with an exploit, but with a question: What am I looking at? Reconnaissance is the systematic process of answering that question. It is the art of turning a black box into a blueprint, transforming an unknown target into a mapped territory of potential vulnerabilities. For an AI red teamer, this phase is about more than just finding open ports; it’s about understanding the system’s purpose, its architecture, its data dependencies, and the human processes that surround it.
From a compliance and risk perspective, reconnaissance is the first line of audit. It’s where you begin to identify discrepancies between what a system is claimed to be and what it actually is. Public statements about data privacy, for instance, can be directly compared against the information discoverable through its public-facing interfaces. A failure found here is often a failure in governance and policy implementation.
The Two Faces of Discovery: Passive and Active
Your initial information gathering will fall into two broad categories. The distinction is simple: do you touch the target system directly? Your choice determines your visibility and risk.
Passive Reconnaissance: Listening to the Echoes
Passive reconnaissance, or Open-Source Intelligence (OSINT), involves gathering information from publicly available sources without sending any packets to the target’s infrastructure. You are an observer, a researcher piecing together a puzzle from discarded pieces. The goal is to build a comprehensive profile of the target’s technology, people, and processes.
| Source Type | Information to Target | Potential Weakness Indicator |
|---|---|---|
| Academic Papers & Blog Posts | Model architecture, training datasets, framework versions, author names. | Use of models with known vulnerabilities (e.g., susceptibility to specific adversarial attacks) or biased datasets. |
| Job Postings (e.g., LinkedIn) | Required skills reveal the internal tech stack: cloud provider (AWS, GCP), MLOps tools (Kubernetes, MLflow), database tech. | Specific versions of software listed may have public CVEs. Details can guide future exploitation attempts. |
| Public Code Repositories (GitHub) | Leaked API keys, service credentials, configuration files, proprietary code snippets, developer comments. | Direct access credentials, infrastructure misconfigurations, or logic flaws in pre-production code. |
| Corporate Website & Policies | Data privacy policies, terms of service, acceptable use policies, descriptions of AI capabilities. | Contradictions between stated policies and observed system behavior. Can be used to craft policy-violating prompts. |
Active Reconnaissance: Knocking on the Door
Once you have a map from passive intelligence, you can begin to probe the target directly. Active reconnaissance involves interaction, from a simple network ping to a carefully crafted API query. Each interaction is a calculated risk, as it can be logged and detected. The objective is to validate assumptions from the passive phase and uncover new information about the live environment.
For an AI system, this often starts with its primary interface—the API. A simple request can reveal a wealth of information from the HTTP headers alone.
# A simple curl request with -v (verbose) to inspect response headers
curl -v -X POST https://api.example.com/v1/generate
-H "Authorization: Bearer sk-..."
-H "Content-Type: application/json"
-d '{"prompt": "Hello, world!"}'
# Looking for headers in the response like:
# < HTTP/2 200
# < Server: nginx/1.21.6
# < X-Served-By: model-serving-pod-7b8c9d4f4-abcde
# < X-Ratelimit-Remaining: 99
In this example, the response reveals the web server (`nginx`), a potential Kubernetes pod name (`X-Served-By`), and rate-limiting mechanics. Each piece of information refines the attack surface map. The pod name suggests a containerized environment, a critical detail for planning lateral movement later on.
Defining the AI-Specific Attack Surface
Traditional reconnaissance focuses on networks and servers. For AI, you must expand this view to include the unique components of the machine learning lifecycle. Your search for weak points extends into the model, the data, and the surrounding MLOps infrastructure.
- The Model Layer: This is the core logic. Reconnaissance here involves inferring the model type (e.g., Transformer, CNN), its size, and its potential training data. Sending carefully designed queries to test for specific behaviors—like reproducing copyrighted text or exhibiting known biases—is a form of active reconnaissance against the model itself.
- The Data & API Layer: This is the primary interaction boundary. You’re looking for weaknesses in the API implementation, such as insufficient input sanitization, error messages that leak internal state, or endpoints that lack proper authentication and rate limiting. This is where traditional web application security testing intersects with AI red teaming.
- The MLOps Infrastructure Layer: The model doesn’t run in a vacuum. It’s supported by cloud services, container orchestration (like Kubernetes), data storage, and monitoring tools. Reconnaissance here uses OSINT from job postings and active scanning of related domains to find vulnerabilities in the supporting software stack. An outdated version of TorchServe or a misconfigured S3 bucket can be a direct path to compromise.
- The Human & Process Layer: Systems are built and maintained by people, who are often the weakest link. Reconnaissance involves identifying key personnel (developers, data scientists, MLOps engineers) from public sources and searching for common security hygiene failures, like password reuse or credentials committed to public code repositories.
Synthesizing Intelligence into Actionable Hypotheses
Reconnaissance is not complete until the collected data is transformed into testable hypotheses. You are building a preliminary list of “what ifs” that will guide the next phase of the engagement. Each hypothesis should link an observation to a potential impact.
For example:
- Observation: A company blog post details their use of a specific open-source text-to-image model from 18 months ago.
- Hypothesis: The model may be vulnerable to recently discovered adversarial attacks that can bypass its safety filters, potentially allowing the generation of prohibited content. This could lead to brand damage and a violation of their own acceptable use policy.
- Observation: An API error message reveals a full stack trace that includes “psycopg2.errors.InsufficientPrivilege”.
- Hypothesis: The system uses a PostgreSQL database. The specific error suggests that input might be reaching a database query, indicating a potential SQL injection vulnerability if input sanitization is flawed.
This initial mapping of the terrain, from high-level policies down to specific software versions, provides the foundation for every subsequent action. Without thorough reconnaissance, an attack is merely a guess; with it, it becomes a calculated strategy.