2.1.1 Neural Network Architecture

2025.10.06.
AI Security Blog

Before you can break a system, you must understand its blueprint. For a machine learning model, that blueprint is its architecture. Forget the “black box” mystique; a neural network is an engineered structure of interconnected components. As a red teamer, your job is to see this structure not just for what it does, but for how it can be manipulated. Every layer, every neuron, and every connection is a potential point of failure or leverage.

The Building Blocks: Neurons, Layers, and Weights

At its core, a neural network is composed of simple, interconnected processing units. Understanding these three fundamental components is the first step toward mapping its attack surface.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Neurons (or Nodes)

A neuron is the smallest computational unit. It receives one or more inputs, performs a simple calculation, and produces an output. Think of it as a tiny decision-maker. It takes in evidence (inputs), weighs that evidence (weights), and then decides whether to “fire” and pass a signal onward (activation).

Neuron Logic

  1. Receive Inputs: Gathers numerical values from other neurons or the initial data.
  2. Compute Weighted Sum: Each input is multiplied by a “weight,” a value representing its importance. The neuron sums these weighted inputs.
  3. Apply Activation Function: The sum is passed through an activation function (e.g., ReLU, Sigmoid), which transforms the value into the neuron’s final output. This non-linear step is what allows networks to learn complex patterns.

Layers

Individual neurons are organized into layers. A typical network has at least three types of layers:

  • Input Layer: The entry point. It receives the raw data (e.g., the pixel values of an image, the words in a sentence). The number of neurons in this layer corresponds to the number of features in your input data. This is your primary attack surface.
  • Hidden Layers: The computational engine of the network. These layers sit between the input and output. It’s here that the model learns to recognize increasingly abstract patterns. A “deep” neural network is simply one with many hidden layers.
  • Output Layer: The final layer. It produces the model’s prediction (e.g., a probability score for each class in a classification task). Your goal as an attacker is often to control the output of this layer.

A Visual Blueprint for Attack

Visualizing the architecture helps clarify how data flows and where interventions are possible. The diagram below illustrates a simple feedforward neural network, where information moves in one direction: from input to output.

Neural Network Architecture Diagram A diagram showing an input layer with 3 neurons, two hidden layers with 4 neurons each, and an output layer with 2 neurons. Arrows indicate the flow of data and connections between layers. Input Layer Hidden Layers Output Layer

From a red team perspective, this flow is your map. An adversarial input starts at the Input Layer. Its effects propagate through the connections and hidden layers, with the goal of corrupting the final decision at the Output Layer.

Common Architectures and Their Security Profiles

Not all networks are built the same. Different architectures are designed for different tasks, and this specialization creates unique vulnerabilities. As a red teamer, you must adapt your attack strategy to the architecture you’re facing.

Architecture Type Primary Use Case Key Red Teaming Considerations
Feedforward Neural Network (FNN) / Multi-Layer Perceptron (MLP) Tabular data, basic classification The “vanilla” target. Susceptible to standard adversarial examples where small input perturbations cause misclassification. The attack surface is straightforward.
Convolutional Neural Network (CNN) Image and video analysis Exploits spatial hierarchies. Vulnerable to “patch” attacks (a small, visible sticker on an object) and imperceptible pixel-level noise. The convolutional filters themselves can be targeted.
Recurrent Neural Network (RNN) / LSTM Sequential data (text, time series) Has an internal “memory.” Vulnerable to attacks that exploit this state, such as subtle data poisoning over a sequence or inputs that trigger specific, long-forgotten states to cause failure.
Transformer Natural Language Processing (LLMs), vision Relies on an “attention” mechanism. Highly susceptible to prompt injection, jailbreaking, and data extraction attacks. Manipulating attention weights is an advanced but powerful vector.

From Blueprint to Code

This architectural theory translates directly into code. When you gain access to a model’s source or can infer its structure, you’ll see these layers defined explicitly. Recognizing this structure in code is a critical skill for white-box testing.

Below is a simplified example using a Keras-like Python syntax. Notice how the code mirrors the layered structure we’ve discussed.

# A simple sequential model definition
import framework as ml

# Define the model as a sequence of layers
model = ml.Sequential([
    # 1. Input Layer: Expects a flat array of 784 features (e.g., a 28x28 image)
    ml.layers.InputLayer(input_shape=(784,)),

    # 2. First Hidden Layer: 128 neurons, using ReLU activation
    ml.layers.Dense(units=128, activation='relu'),

    # 3. Second Hidden Layer: 64 neurons, also with ReLU
    ml.layers.Dense(units=64, activation='relu'),

    # 4. Output Layer: 10 neurons (for 10 classes), softmax for probabilities
    ml.layers.Dense(units=10, activation='softmax')
])

# The architecture is now defined and ready for training.

Even without deep coding expertise, you can see the blueprint. You know the entry point (`InputLayer`), the computational depth (two `Dense` hidden layers), and the decision point (`Dense` output layer). Each of these lines is a potential target for analysis and attack planning.

Understanding this architecture is non-negotiable. It dictates the types of attacks that are likely to succeed and provides the foundational knowledge needed to dissect the model’s learning process and its deployment environment, which we will cover next.