Skip to main content

Multimodal & Media

Foil supports multimodal traces — mix text, documents, spreadsheets, code files, and images in your span inputs and outputs. Upload files via the SDK, and Foil automatically extracts text content for evaluations and analysis.
Check out the multimodal examples for complete, runnable code in both JavaScript and Python.

How It Works

Your App → SDK uploadMedia() → Foil API → S3 (original file)

                                   SQS

                            Ingestion Service

                          ┌─────────────────────┐
                          │  Category Detection  │
                          │  & Text Extraction   │
                          └──────────┬──────────┘

                          ┌─────────────────────┐
                          │  Evaluations use     │
                          │  extracted content   │
                          └─────────────────────┘
  1. You upload a file via the SDK — it’s stored in S3 and categorized automatically
  2. The ingestion service extracts text content (for documents, spreadsheets, code)
  3. You reference the uploaded media in span inputs/outputs using content blocks
  4. Evaluations automatically include extracted text when analyzing spans

Supported Media Categories

CategoryFile TypesMax SizeProcessing
DocumentPDF, DOCX, DOC, RTF50 MBText extraction, metadata
SpreadsheetCSV, TSV, XLSX, XLS, ODS25 MBText + structured JSON extraction
CodeAny text/code file10 MBDirect text passthrough
ImagePNG, JPEG, GIF, WebP, SVG20 MBDimensions & metadata
AudioMP3, WAV, OGG, FLAC100 MBComing soon
VideoMP4, WebM, MOV500 MBComing soon
ArchiveZIP, TAR, GZ100 MBComing soon
Notebook.ipynb10 MBComing soon
OtherAny other file25 MBStored as-is
Media category is auto-detected from the file’s MIME type. You can associate up to 20 files per span.

Content Blocks

Content blocks let you mix text and media references in span inputs and outputs:
  • Text blocks — plain text content ({ type: 'text', text: '...' })
  • Media blocks — references to uploaded media ({ type: 'media', mediaId: '...' })
const { ContentBlock, content } = require('@getfoil/foil-js');

// Mix text and media with the content() helper
const blocks = content(
  'Analyze this document:',
  ContentBlock.media('media-abc-123'),
  'Summarize the key findings.'
);
// => [
//   { type: 'text', text: 'Analyze this document:' },
//   { type: 'media', mediaId: 'media-abc-123' },
//   { type: 'text', text: 'Summarize the key findings.' }
// ]

Auto-Upload with ContentBlock.file() (JavaScript)

The JavaScript SDK supports ContentBlock.file() which automatically uploads files before the span is sent — no manual uploadMedia() call needed:
const { ContentBlock, content } = require('@getfoil/foil-js');

const blocks = content(
  'Process this CSV:',
  ContentBlock.file('/path/to/data.csv'),
  'Find the top trends.'
);

// When used in a span, the file is uploaded automatically
const span = await ctx.startSpan(SpanKind.LLM, 'gpt-4o', {
  input: blocks
});
ContentBlock.file() accepts a file path, Buffer, or ReadStream:
ContentBlock.file('/path/to/file.pdf')
ContentBlock.file(Buffer.from(data), { filename: 'data.csv' })
ContentBlock.file(fs.createReadStream('/path/to/file.png'))

Uploading Media

Upload files directly with uploadMedia() for more control over the upload lifecycle.
const { Foil } = require('@getfoil/foil-js');

const foil = new Foil({ apiKey: 'sk_live_...' });

// Upload from file path
const result = await foil.uploadMedia('/path/to/report.pdf');

// Upload from Buffer
const result = await foil.uploadMedia(Buffer.from(csvData), {
  filename: 'sales-data.csv',
  mimeType: 'text/csv'
});

// Associate with a span at upload time
const result = await foil.uploadMedia('/path/to/file.pdf', {
  spanId: 'span-123',
  traceId: 'trace-456',
  direction: 'input'
});

Upload Options

OptionTypeDescription
filenamestringOverride filename (required for Buffer/bytes)
mimeType / mime_typestringOverride MIME type (auto-detected if omitted)
spanId / span_idstringAssociate media with a span
traceId / trace_idstringAssociate media with a trace
directionstring'input' or 'output'

Using Media in Spans

After uploading, reference media in span inputs and outputs using content blocks.
const { Foil, ContentBlock, content } = require('@getfoil/foil-js');

const foil = new Foil({
  apiKey: 'sk_live_...',
  agentName: 'document-analyzer'
});

await foil.trace(async (ctx) => {
  const upload = await foil.uploadMedia('/path/to/report.pdf');

  const response = await ctx.llmCall('gpt-4o', async () => {
    return await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Summarize this document' }]
    });
  }, {
    input: content(
      'Summarize this document:',
      ContentBlock.media(upload.mediaId, {
        category: upload.category,
        filename: upload.filename
      })
    )
  });
});
Or use ContentBlock.file() to skip manual upload:
await foil.trace(async (ctx) => {
  const response = await ctx.llmCall('gpt-4o', async () => {
    return await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Summarize this document' }]
    });
  }, {
    input: content(
      'Summarize this document:',
      ContentBlock.file('/path/to/report.pdf')
    )
  });
});

Retrieving Media

const media = await foil.getMedia('media-abc-123');
console.log(media.category);            // 'document'
console.log(media.processing.status);   // 'completed'

// Include extracted content
const media = await foil.getMedia('media-abc-123', {
  content: 'extracted'
});
console.log(media.extracted.text.preview); // First 1000 chars
When evaluations run on spans containing media content blocks, Foil automatically includes extracted text (up to 15,000 characters per file) in the analysis. No manual configuration needed.

Limitations

Current limitations of multimodal support:
  • No vision model analysis for images — dimensions are extracted but image content is not analyzed by a vision model
  • No audio/video transcription — audio and video files are stored but not processed yet
  • No archive or notebook processing — ZIP files and Jupyter notebooks are stored as-is
  • 20 files per span maximum
  • Files over 100 MB require presigned upload (contact support)
  • Extracted text is capped at 100 KB per file; evaluation prompts use up to 15,000 characters per file

Best Practices

Upload files first and use the returned mediaId in your content blocks. This ensures media is available when the span is processed. Alternatively, use ContentBlock.file() (JavaScript) to handle this automatically.
Instead of embedding file contents as plain text, use content blocks. This enables Foil to track media associations, provide download links, and include extracted content in evaluations.
Text extraction is asynchronous. If you need extracted content immediately, poll getMedia() until processing.status is 'completed'.
When possible, pass file paths to uploadMedia() or ContentBlock.file(). This lets the SDK auto-detect the filename and MIME type. When using Buffer/bytes, always provide a filename.

Next Steps