25.3.2. AI/ML Specific Acronyms

2025.10.06.
AI Security Blog

The fields of artificial intelligence, machine learning, and their security applications are rife with acronyms. This reference provides concise definitions for common terms you will encounter during AI red teaming engagements, with a focus on their relevance to security testing.

AGI
Artificial General Intelligence. A hypothetical form of AI that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a human level. The pursuit of AGI drives the creation of powerful models that introduce complex and novel security challenges.
AI
Artificial Intelligence. The overarching field of computer science dedicated to creating systems capable of performing tasks that normally require human intelligence, such as visual perception, speech recognition, and decision-making.
ANN
Artificial Neural Network. A computational model inspired by the biological neural networks of animal brains. ANNs are the foundational technology behind most deep learning models.
ASR
Adversarial Success Rate. A key performance indicator in adversarial attacks. It measures the percentage of adversarial inputs that successfully cause a model to misclassify or produce an undesired output.
BIM
Basic Iterative Method. An iterative adversarial attack that applies a weaker attack (like FGSM) multiple times with small steps. This approach often creates more subtle and effective perturbations than a single-step attack.
C&W
Carlini & Wagner Attack. A family of powerful, optimization-based adversarial attacks known for generating high-quality, often imperceptible adversarial examples that are highly effective at fooling models.
CNN
Convolutional Neural Network. A class of deep neural networks, most commonly applied to analyzing visual imagery. They are a primary target for adversarial attacks against computer vision systems.
DL
Deep Learning. A subfield of machine learning based on ANNs with multiple layers (a “deep” architecture). It’s the technology powering most modern AI capabilities, including LLMs and image generators.
EOT
Expectation Over Transformation. A technique for creating robust adversarial examples by ensuring they remain effective even after undergoing random transformations (e.g., rotation, scaling, jitter). This helps bypass defenses that rely on input transformations.
FAccT
Fairness, Accountability, and Transparency. A multidisciplinary field focused on ensuring AI systems operate fairly, are explainable, and have clear lines of responsibility. Red teamers often test systems for violations of FAccT principles, such as discovering hidden biases.
FGSM
Fast Gradient Sign Method. A foundational, single-step adversarial attack that adds a small amount of noise to an input, calculated using the gradient of the model’s loss function. It is fast but often easy to defend against.
FIM
Foundation Model. A large-scale AI model trained on a vast quantity of data, designed to be adapted to a wide range of downstream tasks. Examples include GPT-4, Llama, and Claude. These are the primary targets of modern AI red teaming.
GAN
Generative Adversarial Network. A class of models composed of two competing neural networks: a “generator” that creates synthetic data and a “discriminator” that tries to distinguish it from real data. GANs can be used offensively to create deepfakes or highly convincing phishing content.
LLM
Large Language Model. A type of foundation model specialized in processing and generating human-like text. They are the target of prompt injection, data extraction, and jailbreaking attacks.
LSTM
Long Short-Term Memory. A type of Recurrent Neural Network (RNN) architecture capable of learning long-term dependencies in sequential data. It’s often used in time-series analysis and older NLP applications.
ML
Machine Learning. A subset of AI focused on building algorithms that allow computers to learn from and make predictions or decisions based on data, without being explicitly programmed for the task.
MLOps
Machine Learning Operations. A set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The MLOps pipeline is a critical attack surface, encompassing data ingestion, model training, and deployment infrastructure.
MLSecOps
Machine Learning Security Operations. The practice of integrating security measures into the MLOps lifecycle. It focuses on securing the entire ML pipeline, from data provenance to model monitoring and incident response.
NLP
Natural Language Processing. A field of AI that gives computers the ability to read, understand, and derive meaning from human language. It is the core technology behind LLMs.
PGD
Projected Gradient Descent. A powerful, iterative adversarial attack that is considered a strong benchmark for measuring model robustness. It refines an adversarial example over multiple steps, ensuring the perturbation stays within a predefined limit (e.g., an L-infinity norm ball).
RAG
Retrieval-Augmented Generation. An architecture that enhances a foundation model by connecting it to an external knowledge base. The model retrieves relevant information before generating a response. RAG systems introduce new vulnerabilities, such as indirect prompt injection through poisoned documents or sensitive data leakage from the knowledge source.
RAI
Responsible AI. A governance framework for designing, developing, and deploying AI systems in a safe, trustworthy, and ethical manner. Red team engagements are often designed to test a system’s adherence to its organization’s RAI principles.
RLHF
Reinforcement Learning from Human Feedback. A training method used to align models (especially LLMs) with human preferences and values. Human raters provide feedback on model outputs, which is used to train a reward model that then fine-tunes the AI. Understanding RLHF is crucial for developing jailbreaks that bypass these safety alignments.
RNN
Recurrent Neural Network. A class of neural networks well-suited for sequential data like text or time series, as they have “memory” of previous inputs in the sequence.
VLM
Vision-Language Model. A multimodal AI model capable of processing and understanding information from both images and text simultaneously. VLMs are susceptible to novel cross-modal attacks, where an adversarial input in one modality (e.g., an image) triggers an unintended behavior in another (e.g., text output).
Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here: