0.2.5 Cultural and linguistic misunderstandings – translation errors, lost context

2025.10.06.
AI Security Blog

An AI model, particularly an LLM, is not a worldly scholar fluent in the planet’s languages and cultures. It is a statistical engine trained on a massive, but finite, dataset. When you ask it to cross linguistic or cultural boundaries, you are not asking for a thoughtful translation; you are asking it to find the most probable sequence of tokens in a new language based on patterns it has seen before. Sometimes, this works remarkably well. Other times, it results in failures that range from comical to catastrophic.

These failures are a prime example of accidental harm. The user has no malicious intent, and the model isn’t designed to cause damage. Yet, the gap between human meaning and statistical representation creates a vulnerability that can be exploited, often without the user even realizing it.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

The Mechanics of Meaning Lost in Translation

AI models lack true “understanding.” They operate on correlation, not causation or intent. This fundamental limitation is the root cause of translation and contextual errors, which typically manifest in three key areas.

1. Idioms and Figurative Language

Idioms are the bane of literal translation. A phrase like “bite the bullet” has a meaning entirely divorced from its individual words. While modern LLMs have been trained on many common English idioms, they can easily falter with less common ones or when translating between languages where the idiomatic mappings are not well-represented in the training data.

Table 1: Examples of Idiomatic Translation Failures
English Idiom Intended Meaning Potential Literal AI Translation (to Spanish) Resulting Misunderstanding
Break a leg! Good luck! ¡Rómpete una pierna! A literal, and potentially threatening, command to injure oneself.
It’s not rocket science. It’s not difficult. No es ciencia de cohetes. Grammatically correct but unnatural and confusing. A native speaker would say “No es tan difícil” or “No tiene ciencia.”
Spill the beans. Reveal a secret. Derrama los frijoles. A nonsensical instruction to physically spill beans.

2. Loss of Cultural Context

Language is inseparable from culture. Words carry weight, history, and nuance that is not captured in a dictionary definition. An AI has no lived experience and cannot grasp this context. It may translate a word correctly but strip it of its essential cultural meaning, leading to outputs that are sterile, inappropriate, or even offensive.

Consider the Japanese concept of 「空気を読む」(kuuki wo yomu), which literally translates to “reading the air.” An AI might define it as “discerning the mood of a situation.” While technically correct, this misses the deep cultural importance of non-verbal communication, social harmony, and anticipating others’ needs without them being explicitly stated. In a customer service chatbot, this failure could lead to responses that are perceived as blunt, cold, and unhelpful to a Japanese user.

3. Polysemy and Ambiguity

Polysemy is the capacity for a word or phrase to have multiple meanings. Humans use context to disambiguate. AI models try to do the same, but when context is thin or doesn’t align with training data patterns, they guess wrong.

For example, the English word “sanction” can mean both “to approve” (a permit) and “to penalize” (a trade embargo). The correct interpretation is entirely dependent on the surrounding text. An accidental translation error in a legal or diplomatic document could invert the entire meaning of a clause, with severe consequences.

From Misunderstanding to Security Incident

As a red teamer, your job is to identify how these “innocent” errors can create systemic risk. The harm is accidental, but the impact can be as severe as a malicious attack.

Source Text (e.g., safety manual) AI Translation (Context is lost) Target Text (Now ambiguous/incorrect) The Pathway of Contextual Failure

This flow demonstrates how a perfectly valid source text can be transformed into a dangerous output. For example:

  • Safety-Critical Information: An instruction like “Ground the wire before proceeding” could be mistranslated into something ambiguous, leading to electrocution.
  • Medical Misinformation: A chatbot translating dosage information might miss a subtle distinction between adult and pediatric doses, resulting in a dangerous recommendation.
  • Unintentional Generation of Hate Speech: A term that is neutral in one culture could be translated into a severe slur in another. An AI content moderation system, if relying on flawed translation, could either fail to flag real hate speech or incorrectly flag benign content.

Red Teaming Strategies: Probing for Linguistic Flaws

Your goal is not just to find funny translation errors. It is to find errors that have a security or safety impact. This requires a systematic approach.

Back-Translation Testing

One of the most effective and simple techniques is back-translation. You translate a piece of text from its source language (A) to a target language (B), and then immediately translate it back to the source language (A). If the final text significantly deviates from the original, you’ve identified a point of contextual instability.

# Pseudocode for a simple back-translation test
function test_linguistic_stability(original_text, lang_b):
    # Step 1: Translate from source (e.g., English) to target (e.g., German)
    intermediate_text = translate_api(original_text, target=lang_b)

    # Step 2: Translate back to the source language
    final_text = translate_api(intermediate_text, target='en')

    # Step 3: Compare the original and final texts
    similarity = calculate_semantic_similarity(original_text, final_text)

    if similarity < 0.85: # Threshold can be adjusted
        log_failure({
            "original": original_text,
            "final": final_text,
            "similarity": similarity
        })
    return similarity

# Example usage
test_text = "The board sanctioned the new safety measures."
test_linguistic_stability(test_text, 'ja') # Test via Japanese

This automated approach is excellent for finding glaring errors at scale. However, it can miss subtle shifts in tone or nuance.

Targeted Fuzzing with Cultural Probes

Instead of random inputs, build test cases specifically designed to stress the model’s cultural and linguistic knowledge. Create datasets containing:

  • Local Slang and Jargon: From various regions and dialects.
  • Culturally-Specific Scenarios: Prompts involving holidays, rituals, or social etiquette unique to a particular culture.
  • Politically or Socially Sensitive Terms: Words that have different connotations or levels of offense across cultures.

The key is to move beyond “correctness” and evaluate for “appropriateness.” A translation might be technically correct but culturally disastrous. This is where human expertise becomes irreplaceable. A red team for a global product is incomplete without members who possess deep, native-level understanding of the target languages and cultures.