While PromptFoo’s built-in assertions for keywords, regex, and similarity are powerful for common checks, they represent the baseline of automated testing. The true leverage for a red teamer comes from crafting bespoke validation logic that targets the unique vulnerabilities and business rules of your specific application. Custom tests are where you move from generic “is it broken?” checks to precise “does it fail in the way we care about?” evaluations.
When Standard Assertions Reach Their Limit
Standard assertions are excellent for clear, binary outcomes. Does the output contain a forbidden word? Is the output valid JSON? These questions have simple yes/no answers. However, many critical AI behaviors are nuanced and context-dependent. Custom tests bridge this gap by allowing you to inject your own logic into the evaluation process.
Consider the following scenarios where standard assertions fall short and custom logic becomes necessary:
| Scenario | Standard Assertion Limitation | Custom Test Solution |
|---|---|---|
| Assessing Tone | A `contains: “sorry”` check can be easily fooled. The model could say “I am not sorry,” which would pass. | A script that analyzes sentence structure, checks for a list of apologetic phrases, and penalizes contradictory language. |
| Fact Verification | You can’t use a simple `equals` check for facts that can be phrased in many ways (e.g., “The capital of France is Paris”). | A custom function could normalize both the expected answer and the LLM output (lowercase, remove punctuation) before comparison. |
| Enforcing Complex Business Rules | A rule like “The output must mention Product A but not if the user’s query mentions a competitor” is impossible with regex alone. | A script that takes the prompt (`context.vars.prompt`) and the output as inputs to evaluate the conditional logic. |
| Detecting Subtle Bias | Bias isn’t just about forbidden words. It can manifest in stereotypes or associations that are hard to capture with simple patterns. | A function that checks for gendered pronouns associated with specific professions or uses a word embedding model to check for proximity to biased concepts. |
The Mechanism: Executing External Logic
PromptFoo’s extensibility hinges on its ability to call external scripts during the assertion phase. The most common and integrated method is using a JavaScript or TypeScript function. For each test case, after the LLM generates an output, PromptFoo can pass that output, along with other context, to your custom function. Your function’s return value then determines the test’s outcome.
This creates a powerful evaluation pipeline:
Implementing a Custom JavaScript Assertion
Creating a custom test involves two key steps: writing the logic in a JavaScript file and then referencing that logic from your `promptfooconfig.yaml`.
Step 1: Write the Assertion Function
Create a JavaScript file (e.g., `customChecks.js`) in your project directory. Your function will receive the LLM’s string output as the first parameter. It should return `true` for a pass and `false` for a fail.
Let’s create a simple function to check if a response is overly verbose (e.g., more than 50 words), a common issue with LLMs that can degrade user experience.
// customChecks.js
function isConcise(output) {
// A simple function to check word count.
// The 'output' parameter is the string generated by the LLM.
const words = output.trim().split(/s+/);
const wordCount = words.length;
// Fail the test if the output is longer than 50 words.
return wordCount <= 50;
}
// You must export the function to make it available to PromptFoo.
module.exports = {
isConcise,
};
Step 2: Configure the Assertion in YAML
In your `promptfooconfig.yaml`, you’ll use the `javascript` assertion type. The `value` field should point to your exported function using the format `path/to/file.js:functionName`.
# promptfooconfig.yaml
prompts:
- "Summarize the concept of photosynthesis in one sentence."
providers:
- openai:gpt-3.5-turbo
tests:
- vars: {}
assert:
- type: javascript
# Path to the JS file, colon, then the function name.
value: ./customChecks.js:isConcise
# Optional hint for better failure messages.
hint: "The model's response was too long."
When you run PromptFoo with this configuration, it will execute the `isConcise` function for each output and mark the test as passed or failed based on its boolean return value.
Advanced Logic: Returning Scores and Context
For more sophisticated analysis, a simple pass/fail is insufficient. You might want to know *how well* a response performed. Custom functions can return an object containing a `pass` boolean, a numeric `score` (from 0.0 for a total fail to 1.0 for a perfect pass), and a `reason` string for logging.
Let’s evolve our example to check for a “professional tone.” Our criteria will be: it must not contain slang, and it should contain at least one polite phrase.
// customChecks.js
function checkProfessionalTone(output) {
const lowerOutput = output.toLowerCase();
const slang = ['lol', 'imo', 'btw', 'gotta'];
const politePhrases = ['please', 'thank you', 'could you'];
let score = 1.0;
let reason = 'Passes professional tone check.';
// Penalize for slang
if (slang.some(word => lowerOutput.includes(word))) {
score -= 0.5;
reason = 'Contains slang.';
}
// Reward for politeness, but only if no slang is present
if (!politePhrases.some(phrase => lowerOutput.includes(phrase))) {
score -= 0.5;
reason = reason.includes('slang') ? 'Contains slang and lacks politeness.' : 'Lacks politeness.';
}
return {
pass: score > 0.5, // Pass if score is better than 0.5
score: Math.max(0, score), // Ensure score doesn't go below 0
reason: reason,
};
}
module.exports = {
// ... other functions
checkProfessionalTone,
};
This scored approach provides much richer data. In your CI/CD pipeline or reporting dashboard, you can now track the average “professionalism score” of your model over time, catching subtle regressions that a simple pass/fail system would miss.
Key Takeaways
- Custom tests are essential for evaluating nuanced, domain-specific requirements that go beyond simple keyword or pattern matching.
- Implement custom logic using external JavaScript functions that receive the LLM output and return a pass/fail status.
- For more detailed evaluation, functions can return an object with `pass`, `score`, and `reason` fields.
- Connect your custom tests to specific business risks or quality metrics (e.g., brand compliance, user experience, factual accuracy) to maximize their value.