Experiments

Foil’s experiment feature lets you run A/B tests on prompts, models, and configurations to find what works best.

What Can You Test?

Prompts - Compare different system prompts or instructions
Models - Test gpt-4o vs gpt-4o-mini vs claude
Parameters - Temperature, max tokens, etc.
Tools - Different tool configurations
Full workflows - Compare entire agent implementations

Creating an Experiment

Go to Experiments in the dashboard
Click Create Experiment
Configure variants (name, traffic weight, configuration)
Set metrics to track (signal names)
Start the experiment

Using Variants in Your Code

const { Foil } = require('@getfoil/foil-js');
const OpenAI = require('openai');

const openai = new OpenAI();
const foil = new Foil({
  apiKey: process.env.FOIL_API_KEY,
  agentName: 'customer-support'
});

async function handleQuery(query, userId) {
  // Get experiment assignment
  const assignment = await foil.getExperimentVariant('prompt-test-v2', userId);

  return await foil.trace(async (ctx) => {
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        { role: 'system', content: assignment.config.systemPrompt },
        { role: 'user', content: query }
      ]
    });

    return response.choices[0].message.content;
  }, {
    properties: {
      experimentId: assignment.experimentId,
      variant: assignment.variant
    }
  });
}

Experiment Configuration

Field	Type	Description
`name`	string	Experiment identifier
`description`	string	What you’re testing
`variants`	array	List of variants to test
`variants[].name`	string	Variant name
`variants[].weight`	number	Traffic percentage (0-100)
`variants[].config`	object	Variant-specific configuration
`metrics`	array	Signal names to track
`minimumSampleSize`	number	Required samples per variant
`maximumDuration`	number	Auto-stop after N days

Viewing Results

The experiment results page in the dashboard shows:

Variant performance comparison
Statistical significance (p-value)
Metric breakdowns
Traffic distribution

Best Practices

Test one variable at a time

Isolate what you’re testing. If you change both prompt AND model, you won’t know which caused the difference.

Use consistent identifiers

Use user IDs or session IDs for assignment to ensure users get the same variant consistently.

Wait for statistical significance

Don’t end experiments early. Wait until you have enough samples and the p-value is meaningful (typically < 0.05).

Track multiple metrics

A variant might improve one metric while hurting another. Track quality, user satisfaction, and business metrics.

Getting Started

SDKs

Concepts

Features

Experiments

Experiments

What Can You Test?

Creating an Experiment

Using Variants in Your Code

Experiment Configuration

Viewing Results

Best Practices

Next Steps

Signals

Analytics

Getting Started

SDKs

Concepts

Features

​Experiments

​What Can You Test?

​Creating an Experiment

​Using Variants in Your Code

​Experiment Configuration

​Viewing Results

​Best Practices

​Next Steps

Signals

Analytics

Experiments

What Can You Test?

Creating an Experiment

Using Variants in Your Code

Experiment Configuration

Viewing Results

Best Practices

Next Steps