After identifying vulnerabilities, you need a common language to describe their severity. The Common Vulnerability Scoring System (CVSS) provides a standardized, numerical framework for this purpose. While not designed for AI, its principles can be adapted to bring objective clarity to your findings, transforming subjective assessments into a consistent, communicable score.
Translating CVSS for AI Systems
CVSS was built for traditional software vulnerabilities like buffer overflows and SQL injection. Applying it to AI requires reinterpreting its metrics through the lens of machine learning systems. The goal isn’t a perfect one-to-one mapping, but a “good enough” translation that allows you to leverage a widely understood standard for prioritization and communication.
The core of CVSS is the Base Score, which reflects the intrinsic qualities of a vulnerability. This score is what you’ll primarily calculate during an assessment. It’s composed of metrics evaluating both how an attacker can exploit the vulnerability and what impact that exploit has.
Base Score Metrics Explained for AI
Here is a breakdown of the CVSS v3.1 Base Score metrics, with specific interpretations for AI systems.
Exploitability Metrics
- Attack Vector (AV)
- Describes how the vulnerability is exploited.
- Network (N): The default for most models exposed via an API or web interface. Prompt injection, data extraction, and API abuse fall here.
- Adjacent (A): Requires the attacker to be on the same local network. Relevant for internal-only models or attacks on ML development environments.
- Local (L): The attacker needs local access to the system (e.g., shell access) to exploit the vulnerability. Think of manipulating model files on disk or attacking a locally running ML application.
- Physical (P): Requires physical access to the machine. Less common, but could apply to attacks on edge AI devices.
- Attack Complexity (AC)
- Measures conditions beyond the attacker’s control.
- Low (L): A simple, repeatable attack. A basic “ignore previous instructions” prompt injection is Low complexity.
- High (H): The attack requires significant effort or specific, non-trivial conditions. Examples include multi-step prompt chaining, attacks requiring knowledge of training data distribution, or timing-based side-channel attacks.
- Privileges Required (PR)
- The level of privileges an attacker must possess before successful exploitation.
- None (N): The vulnerability is exploitable by anyone with access to the model endpoint, such as a public-facing chatbot.
- Low (L): Requires basic user permissions. An attack that can only be performed by a logged-in, non-admin user.
- High (H): Requires administrative or significant privileges. For example, a vulnerability in the MLOps pipeline that requires pipeline administrator access to exploit.
- User Interaction (UI)
- Whether a user (other than the attacker) needs to participate.
- None (N): The attack can be performed without any user action. Most direct API-based attacks fall here.
- Required (R): The user must perform an action, like clicking a link or pasting attacker-provided text into a prompt. Indirect prompt injections, where a model processes a malicious web page visited by a user, are a prime example.
Scope and Impact Metrics
- Scope (S)
- Does the exploit impact resources beyond the vulnerable component’s security scope?
- Unchanged (U): The exploit only affects the model itself. Causing the model to generate false information or refuse service has an unchanged scope.
- Changed (C): The exploit allows the attacker to impact other components. A prompt injection that leverages a tool/plugin to execute code on the underlying server or access a database has a changed scope. This is a critical distinction.
- Confidentiality (C), Integrity (I), Availability (A)
- These measure the impact on the system’s data and services.
- Confidentiality: Rated High if the attack leaks sensitive information like PII from training data, proprietary model weights, or other users’ session data. Low if it leaks non-sensitive operational data.
- Integrity: Rated High if the attack can permanently alter the model (e.g., a poisoning attack), cause it to execute dangerous commands, or reliably generate harmful, biased, or malicious content. Low if it only causes minor, temporary inaccuracies.
- Availability: Rated High if the attack can render the model or its host system completely unusable for all users (e.g., a resource exhaustion DoS). Low if it only causes a temporary slowdown for a single user.
Practical Scoring Examples
Let’s apply these metrics to common AI vulnerabilities. Use a CVSS v3.1 calculator to find the final score based on these vectors.
| Vulnerability Scenario | CVSS Vector String | Score | Rationale |
|---|---|---|---|
| Policy Bypass via Prompt Injection Attacker tricks a public chatbot into generating prohibited content. |
AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N | 7.5 (High) | Public access (AV:N, PR:N), simple prompt (AC:L), no user needed (UI:N). The exploit compromises the model’s integrity by bypassing its safety controls (I:H), but doesn’t affect other systems (S:U). |
| Training Data Leakage Attacker crafts queries to a model with user permissions, extracting PII from its training set. |
AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N | 6.5 (Medium) | Requires an account (PR:L) and sophisticated queries (AC:H). The impact is a high loss of confidentiality (C:H) as PII is exposed. Scope is unchanged as only the model component is affected. |
| Indirect Prompt Injection via Plugin Attacker plants a malicious prompt on a webpage. A user asks the AI to summarize it, triggering a plugin to exfiltrate the user’s chat history to the attacker’s server. |
AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:L/A:N | 8.2 (High) | Requires the user to interact (UI:R). The key is that the exploit jumps from the AI model to the plugin and then the external server, changing the scope (S:C). This allows high confidentiality impact (C:H) by stealing data. |
Limitations and Best Practices
CVSS is a powerful tool for standardization, but it’s not a silver bullet. Be aware of its limitations in the context of AI:
- Probabilistic Nature: CVSS assumes deterministic outcomes. An attack that works only 30% of the time is hard to score. You must decide whether to score the worst-case potential or the average outcome.
- Business Context is Missing: The Base Score doesn’t know your business. A “Low” integrity impact that slightly biases product recommendations could have a massive financial or reputational impact, which the score won’t reflect. This is where Environmental scoring and qualitative analysis become crucial.
- Subtlety of Impact: How do you score a model that now subtly promotes a competitor’s product? The impact isn’t a clean “None,” “Low,” or “High.” You must use your judgment to map these nuanced impacts to the discrete CVSS categories.
Your goal should be to use CVSS as a starting point for conversation and triage. It provides a defensible, objective baseline that you can then layer with business context and qualitative risk analysis, as discussed in the following sections on prioritization frameworks.