The fields of artificial intelligence, machine learning, and their security applications are rife with acronyms. This reference provides concise definitions for common terms you will encounter during AI red teaming engagements, with a focus on their relevance to security testing.
- AGI
- Artificial General Intelligence. A hypothetical form of AI that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a human level. The pursuit of AGI drives the creation of powerful models that introduce complex and novel security challenges.
- AI
- Artificial Intelligence. The overarching field of computer science dedicated to creating systems capable of performing tasks that normally require human intelligence, such as visual perception, speech recognition, and decision-making.
- ANN
- Artificial Neural Network. A computational model inspired by the biological neural networks of animal brains. ANNs are the foundational technology behind most deep learning models.
- ASR
- Adversarial Success Rate. A key performance indicator in adversarial attacks. It measures the percentage of adversarial inputs that successfully cause a model to misclassify or produce an undesired output.
- BIM
- Basic Iterative Method. An iterative adversarial attack that applies a weaker attack (like FGSM) multiple times with small steps. This approach often creates more subtle and effective perturbations than a single-step attack.
- C&W
- Carlini & Wagner Attack. A family of powerful, optimization-based adversarial attacks known for generating high-quality, often imperceptible adversarial examples that are highly effective at fooling models.
- CNN
- Convolutional Neural Network. A class of deep neural networks, most commonly applied to analyzing visual imagery. They are a primary target for adversarial attacks against computer vision systems.
- DL
- Deep Learning. A subfield of machine learning based on ANNs with multiple layers (a “deep” architecture). It’s the technology powering most modern AI capabilities, including LLMs and image generators.
- EOT
- Expectation Over Transformation. A technique for creating robust adversarial examples by ensuring they remain effective even after undergoing random transformations (e.g., rotation, scaling, jitter). This helps bypass defenses that rely on input transformations.
- FAccT
- Fairness, Accountability, and Transparency. A multidisciplinary field focused on ensuring AI systems operate fairly, are explainable, and have clear lines of responsibility. Red teamers often test systems for violations of FAccT principles, such as discovering hidden biases.
- FGSM
- Fast Gradient Sign Method. A foundational, single-step adversarial attack that adds a small amount of noise to an input, calculated using the gradient of the model’s loss function. It is fast but often easy to defend against.
- FIM
- Foundation Model. A large-scale AI model trained on a vast quantity of data, designed to be adapted to a wide range of downstream tasks. Examples include GPT-4, Llama, and Claude. These are the primary targets of modern AI red teaming.
- GAN
- Generative Adversarial Network. A class of models composed of two competing neural networks: a “generator” that creates synthetic data and a “discriminator” that tries to distinguish it from real data. GANs can be used offensively to create deepfakes or highly convincing phishing content.
- LLM
- Large Language Model. A type of foundation model specialized in processing and generating human-like text. They are the target of prompt injection, data extraction, and jailbreaking attacks.
- LSTM
- Long Short-Term Memory. A type of Recurrent Neural Network (RNN) architecture capable of learning long-term dependencies in sequential data. It’s often used in time-series analysis and older NLP applications.
- ML
- Machine Learning. A subset of AI focused on building algorithms that allow computers to learn from and make predictions or decisions based on data, without being explicitly programmed for the task.
- MLOps
- Machine Learning Operations. A set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The MLOps pipeline is a critical attack surface, encompassing data ingestion, model training, and deployment infrastructure.
- MLSecOps
- Machine Learning Security Operations. The practice of integrating security measures into the MLOps lifecycle. It focuses on securing the entire ML pipeline, from data provenance to model monitoring and incident response.
- NLP
- Natural Language Processing. A field of AI that gives computers the ability to read, understand, and derive meaning from human language. It is the core technology behind LLMs.
- PGD
- Projected Gradient Descent. A powerful, iterative adversarial attack that is considered a strong benchmark for measuring model robustness. It refines an adversarial example over multiple steps, ensuring the perturbation stays within a predefined limit (e.g., an L-infinity norm ball).
- RAG
- Retrieval-Augmented Generation. An architecture that enhances a foundation model by connecting it to an external knowledge base. The model retrieves relevant information before generating a response. RAG systems introduce new vulnerabilities, such as indirect prompt injection through poisoned documents or sensitive data leakage from the knowledge source.
- RAI
- Responsible AI. A governance framework for designing, developing, and deploying AI systems in a safe, trustworthy, and ethical manner. Red team engagements are often designed to test a system’s adherence to its organization’s RAI principles.
- RLHF
- Reinforcement Learning from Human Feedback. A training method used to align models (especially LLMs) with human preferences and values. Human raters provide feedback on model outputs, which is used to train a reward model that then fine-tunes the AI. Understanding RLHF is crucial for developing jailbreaks that bypass these safety alignments.
- RNN
- Recurrent Neural Network. A class of neural networks well-suited for sequential data like text or time series, as they have “memory” of previous inputs in the sequence.
- VLM
- Vision-Language Model. A multimodal AI model capable of processing and understanding information from both images and text simultaneously. VLMs are susceptible to novel cross-modal attacks, where an adversarial input in one modality (e.g., an image) triggers an unintended behavior in another (e.g., text output).