The Common Vulnerability Scoring System (CVSS) provides a standardized framework for rating the severity of software vulnerabilities. While it is the bedrock of traditional cybersecurity, applying its metrics to AI systems requires a shift in perspective. A direct, literal translation often fails to capture the unique risks posed by AI, such as manipulated outputs, data poisoning, or model theft.
Your task as a red teamer isn’t just to find a flaw but to accurately assess its impact. This means learning to interpret CVSS metrics through an AI/ML lens. The core question changes from “Can an attacker execute code?” to “Can an attacker control, degrade, or steal the model’s intelligence?”
Reinterpreting Base Metrics for AI/ML Systems
The standard CVSS base score metrics (Attack Vector, Attack Complexity, etc.) remain the same, but their meaning expands. The impact of a vulnerability is no longer confined to the classic Confidentiality, Integrity, and Availability (CIA) triad in its traditional sense. Instead, these concepts are mapped onto model-specific failures.
The following table provides a guide for translating traditional CVSS interpretations into contexts relevant for AI vulnerabilities.
| CVSS Metric | Traditional IT Interpretation | AI/ML Context Interpretation |
|---|---|---|
Attack Vector (AV)AV:N/A/L/P
|
How the vulnerability is exploited. Network, Adjacent, Local, or Physical access. | How the malicious input is delivered. This could be through a public API (N), access to the training pipeline (A/L), or direct manipulation of a model file on disk (L). |
Attack Complexity (AC)AC:L/H
|
Conditions beyond the attacker’s control that must exist. Low for simple exploits, High for those requiring specific configurations or race conditions. | Reflects the difficulty of crafting the adversarial input. A simple prompt injection is Low (L). A carefully crafted data poisoning attack requiring statistical analysis of the training set is High (H). |
Privileges Required (PR)PR:N/L/H
|
The level of privilege an attacker must possess before successful exploitation. None, Low (user), or High (admin). | Access level to the AI system. An attack on a public chatbot is None (N). An attack requiring an authenticated API key is Low (L). An attack requiring access to the MLOps platform is High (H). |
User Interaction (UI)UI:N/R
|
Whether a user (other than the attacker) must participate. None or Required. | Does the attack require another user’s action? An automated evasion attack is None (N). Tricking a user into submitting a malicious image for analysis is Required (R). |
Scope (S)S:U/C
|
Whether the exploit impacts components beyond its own security scope. Unchanged or Changed. | Does the model vulnerability allow compromise of the underlying host system? Manipulating model output is Unchanged (U). A vulnerability in a model parsing library (e.g., Pickle) leading to RCE is Changed (C). |
Confidentiality (C)C:N/L/H
|
Disclosure of information to unauthorized parties. | Impact on sensitive information handled by or embedded in the model. High (H) for membership inference or model inversion that reveals sensitive training data. Low (L) for leaking system prompts. |
Integrity (I)I:N/L/H
|
Modification or destruction of information. | Impact on the trustworthiness and behavior of the model. High (H) for data poisoning that creates a persistent backdoor. Low (L) for a single-instance evasion attack that causes one incorrect output. |
Availability (A)A:N/L/H
|
Denial of access to resources or services. | Impact on the model’s ability to provide service. High (H) for resource exhaustion attacks (e.g., complex inputs causing GPUs to time out) or attacks causing catastrophic forgetting. Low (L) for inputs that cause a single inference to fail. |
Example Case: Scoring Prompt Injection with Data Exfiltration
Consider a vulnerability in a public-facing LLM-powered customer service bot. Through a carefully crafted prompt injection, you discover you can make the bot ignore its original instructions and instead reveal the full system prompt, which includes proprietary business logic and names of internal databases it can query.
Let’s score this using the AI-contextualized CVSS metrics:
- Attack Vector (AV):
Network. The attack is delivered via the public internet to the chatbot’s API. - Attack Complexity (AC):
Low. Assuming the injection technique is a well-understood pattern (e.g., “ignore previous instructions and do this…”). - Privileges Required (PR):
None. Any anonymous user can interact with the public chatbot. - User Interaction (UI):
None. The attack is performed directly by the red teamer; no other user needs to be involved. - Scope (S):
Unchanged. The vulnerability reveals information from the AI component but does not grant access to or compromise the underlying server or other system components. - Confidentiality (C):
High. The exfiltrated system prompt is proprietary intellectual property. Its disclosure reveals system architecture and sensitive operational details, representing a total loss of confidentiality for that asset. - Integrity (I):
Low. The attacker can control the output for their own session, which is a temporary loss of integrity. However, the underlying model is not permanently modified. - Availability (A):
None. The attack does not impact the chatbot’s availability for other users.
Plugging these values into a CVSS 3.1 calculator gives a vector of CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:N, resulting in a Base Score of 8.2 (High). This demonstrates how a vulnerability with no direct impact on the host system can still be rated as severe due to its high confidentiality impact on the AI asset itself.
Mastering this translation is crucial. It allows you to communicate the severity of AI-specific findings in a language that the broader cybersecurity industry understands, ensuring that these unique vulnerabilities are prioritized and remediated appropriately.