This section provides a foundational glossary of Artificial Intelligence (AI) and Machine Learning (ML) terms. Each entry includes its standard Hungarian translation and, crucially, a brief description framed for the AI Red Teamer. Understanding these concepts is not just academic; it is essential for identifying the unique attack surfaces and vulnerabilities inherent in AI systems.
| Term (English) | Hungarian Translation | Description for Red Teamers |
|---|---|---|
| Artificial Intelligence (AI) | Mesterséges Intelligencia (MI) | The broad field of creating systems that perform tasks requiring human intelligence. For you, this is the entire domain of engagement, from simple classifiers to complex generative models. |
| Machine Learning (ML) | Gépi Tanulás (GT) | A subset of AI where systems learn from data rather than being explicitly programmed. This data-driven nature is a primary attack surface: think data poisoning, model inversion, and membership inference. |
| Deep Learning (DL) | Mélytanulás | A subset of ML using multi-layered neural networks. The “depth” creates complexity and a massive parameter space, making models powerful but also opaque (black-box) and susceptible to subtle adversarial examples. |
| Neural Network (NN) | Neurális Háló | The core architecture of deep learning, composed of interconnected nodes (neurons). Understanding layers (input, hidden, output) helps conceptualize where attacks like feature-level adversarial perturbations are processed. |
| Large Language Model (LLM) | Nagy Nyelvi Modell (NNM) | A type of DL model trained on vast text data. Your primary target for prompt injection, jailbreaking, data extraction, and inducing harmful or biased outputs. Their scale makes them prone to memorizing sensitive training data. |
| Transformer | Transzformátor | The dominant architecture for modern LLMs, notable for its “attention mechanism.” Understanding attention helps in crafting more effective adversarial prompts, as you can exploit how the model weighs different parts of the input. |
| Embedding | Beágyazás | A numerical vector representation of words, sentences, or other data types. Adversarial attacks can be crafted in this “embedding space” to find subtle perturbations that are semantically similar to humans but trigger misclassification. |
| Fine-tuning | Finomhangolás | Adapting a pre-trained model to a specific task using a smaller, specialized dataset. This process is a key vector for data poisoning, where an attacker can inject malicious data to create backdoors or biases in the final model. |
| Reinforcement Learning (RL) | Megerősítéses Tanulás | Training an agent to make decisions by rewarding desired outcomes. The reward function is a critical vulnerability; manipulating it or the agent’s environment can lead to unexpected and potentially harmful emergent behaviors. |
| Supervised Learning | Felügyelt Tanulás | Training a model on labeled data (input-output pairs). The quality and security of these labels are paramount. Label-flipping is a classic data poisoning attack against these systems. |
| Unsupervised Learning | Felügyeletlen Tanulás | Training a model on unlabeled data to find patterns (e.g., clustering). These systems can be manipulated by poisoning the data pool to create malicious clusters or hide illicit activity within “normal” patterns. |
| Overfitting | Túlillesztés | When a model learns the training data too well, including its noise, and fails to generalize to new data. An overfit model is often brittle and highly susceptible to adversarial examples that exploit its memorized, non-general patterns. |
| Gradient | Gradiens | A vector indicating the direction of the steepest ascent of a function. In ML, gradients are used to update model weights during training. Many powerful “white-box” adversarial attacks use the model’s gradients to efficiently craft perturbations. |
| Hallucination | Hallucináció | A phenomenon where a generative model produces confident but factually incorrect or nonsensical outputs. Your goal is often to induce specific, targeted hallucinations that serve a malicious purpose, such as generating misinformation or revealing system flaws. |
| Prompt Engineering | Prompt Tervezés | The art of crafting inputs (prompts) to elicit desired outputs from a model. For a red teamer, this is “adversarial prompt engineering”—designing inputs to bypass safety filters, extract data, or trigger unintended behaviors (jailbreaking). |