25.5.2 NIST Frameworks

2025.10.06.
AI Security Blog

The U.S. National Institute of Standards and Technology (NIST) provides foundational guidance for technology implementation, risk management, and security. For AI red teamers, NIST’s work offers structured, vendor-neutral frameworks that translate high-level principles into actionable risk management processes and provide a standardized vocabulary for describing adversarial attacks.

The AI Risk Management Framework (AI RMF 1.0)

The NIST AI RMF is not a prescriptive checklist but a flexible, structured process for managing the risks associated with AI systems throughout their lifecycle. Its goal is to help organizations cultivate a culture of risk management, enabling them to design, develop, deploy, and use AI systems that are trustworthy and responsible. For red teams, the AI RMF provides the “why” behind your testing activities, connecting your findings to a broader organizational risk posture.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

The framework is organized around four core functions: Govern, Map, Measure, and Manage. These functions work together in a continuous cycle.

NIST AI RMF Core Functions Cycle GOVERN MAP MEASURE MANAGE
RMF Function Description Relevance for AI Red Teaming
Govern Establishes a culture of risk management and defines policies, roles, and responsibilities for AI systems. Provides the organizational context. Your red team engagement should align with the risk tolerance and policies established in this function.
Map Identifies the AI system’s context, capabilities, intended use, and potential beneficial and negative impacts. Crucial for threat modeling and scoping. The “Map” function defines your attack surface and helps identify high-impact targets.
Measure Uses quantitative and qualitative tools to analyze, assess, and track AI risks identified in the Map function. This is the primary home for red teaming. Your activities directly contribute to measuring the system’s resilience against adversarial attacks.
Manage Acts on the measured risks by prioritizing and implementing mitigation strategies. Your findings and recommendations directly inform the “Manage” function, helping stakeholders prioritize fixes and allocate resources effectively.

Adversarial Machine Learning Taxonomy (NISTIR 8269)

While the AI RMF provides the “what to do,” NIST’s work on adversarial machine learning provides the “how to talk about it.” Specifically, A Taxonomy and Terminology of Adversarial Machine Learning (NISTIR 8269) creates a structured vocabulary for describing attacks on AI systems. Using this common language is essential for clear communication in your reports and for building a shared understanding of threats across the industry.

The taxonomy classifies attacks along several key dimensions, allowing you to precisely characterize an adversarial scenario.

NIST Adversarial ML Taxonomy Dimensions Attacker Goal (e.g., Evade, Poison) Knowledge (e.g., White-box) Capability (e.g., Input Manipulation) Attack Phase (e.g., Training, Inference) Key Dimensions of an Adversarial Attack

Core Dimensions of the Taxonomy:

  • Attacker’s Goal: What is the adversary trying to achieve? This could be an evasion attack (misclassification at inference time), a poisoning attack (corrupting the training data), a privacy attack (extracting sensitive information), or an abuse attack (using the model for unintended, harmful purposes).
  • Attacker’s Knowledge: How much does the adversary know about the target model? This ranges from white-box (full knowledge of architecture, parameters, and training data) to gray-box (partial knowledge) and black-box (only input/output access).
  • Attacker’s Capability: What can the adversary manipulate? This includes control over the training data, the model itself, or just the inputs at inference time.
  • Attack Phase: When does the attack occur? It can happen during the training phase (e.g., data poisoning) or the inference phase (e.g., evasion with adversarial examples).

Practical Application for AI Red Teamers

Integrating NIST frameworks into your workflow adds structure, clarity, and authority to your operations. You are no longer just “breaking things”; you are systematically measuring risk within an industry-recognized framework.

  1. Scoping with the AI RMF: Use the Map function as a guide for your initial threat modeling. Ask stakeholders: What are the system’s contexts, limitations, and potential negative impacts? This ensures your testing focuses on the most relevant risks.
  2. Executing within the AI RMF: Frame your red team engagement as a key activity within the Measure function. This elevates your work from a simple penetration test to a formal risk assessment process.
  3. Reporting with the NISTIR 8269 Taxonomy: Use the taxonomy to structure your findings. Describing an attack with this precise terminology eliminates ambiguity and allows for better comparison and mitigation planning.
# Example Finding Structured with NIST Terminology
Finding ID: ART-2024-012
Attack Scenario: LLM Prompt Injection for Policy Evasion

# NISTIR 8269 CLASSIFICATION:
– Attacker Goal: Evasion (bypassing safety filters) & Abuse (generating harmful content)
– Attacker Knowledge: Black-box (interaction via public API only)
– Attacker Capability: Inference-time input manipulation
– Attack Phase: Inference

# AI RMF MAPPING:
– Function: Measure (TEVV of model robustness)
– Risk Addressed: Risk of malicious use and reputational harm, as identified in the ‘Map’ phase.

By adopting this structured approach, your red team provides not just a list of vulnerabilities but a clear, actionable assessment of risk that aligns directly with established governance and management processes.