Experiments
Foil’s experiment feature lets you run A/B tests on prompts, models, and configurations to find what works best.What Can You Test?
- Prompts - Compare different system prompts or instructions
- Models - Test gpt-4o vs gpt-4o-mini vs claude
- Parameters - Temperature, max tokens, etc.
- Tools - Different tool configurations
- Full workflows - Compare entire agent implementations
Creating an Experiment
- Go to Experiments in the dashboard
- Click Create Experiment
- Configure variants (name, traffic weight, configuration)
- Set metrics to track (signal names)
- Start the experiment
Using Variants in Your Code
Experiment Configuration
| Field | Type | Description |
|---|---|---|
name | string | Experiment identifier |
description | string | What you’re testing |
variants | array | List of variants to test |
variants[].name | string | Variant name |
variants[].weight | number | Traffic percentage (0-100) |
variants[].config | object | Variant-specific configuration |
metrics | array | Signal names to track |
minimumSampleSize | number | Required samples per variant |
maximumDuration | number | Auto-stop after N days |
Viewing Results
The experiment results page in the dashboard shows:- Variant performance comparison
- Statistical significance (p-value)
- Metric breakdowns
- Traffic distribution
Best Practices
Test one variable at a time
Test one variable at a time
Isolate what you’re testing. If you change both prompt AND model, you won’t know which caused the difference.
Use consistent identifiers
Use consistent identifiers
Use user IDs or session IDs for assignment to ensure users get the same variant consistently.
Wait for statistical significance
Wait for statistical significance
Don’t end experiments early. Wait until you have enough samples and the p-value is meaningful (typically < 0.05).
Track multiple metrics
Track multiple metrics
A variant might improve one metric while hurting another. Track quality, user satisfaction, and business metrics.