Multimodal & Media

Foil supports multimodal traces — mix text, documents, spreadsheets, code files, and images in your span inputs and outputs. Upload files via the SDK, and Foil automatically extracts text content for evaluations and analysis.

Check out the multimodal examples for complete, runnable code in both JavaScript and Python.

How It Works

Your App → SDK uploadMedia() → Foil API → S3 (original file)
                                    ↓
                                   SQS
                                    ↓
                            Ingestion Service
                                    ↓
                          ┌─────────────────────┐
                          │  Category Detection  │
                          │  & Text Extraction   │
                          └──────────┬──────────┘
                                     ↓
                          ┌─────────────────────┐
                          │  Evaluations use     │
                          │  extracted content   │
                          └─────────────────────┘

You upload a file via the SDK — it’s stored in S3 and categorized automatically
The ingestion service extracts text content (for documents, spreadsheets, code)
You reference the uploaded media in span inputs/outputs using content blocks
Evaluations automatically include extracted text when analyzing spans

Supported Media Categories

Category	File Types	Max Size	Processing
Document	PDF, DOCX, DOC, RTF	50 MB	Text extraction, metadata
Spreadsheet	CSV, TSV, XLSX, XLS, ODS	25 MB	Text + structured JSON extraction
Code	Any text/code file	10 MB	Direct text passthrough
Image	PNG, JPEG, GIF, WebP, SVG	20 MB	Dimensions & metadata
Audio	MP3, WAV, OGG, FLAC	100 MB	Coming soon
Video	MP4, WebM, MOV	500 MB	Coming soon
Archive	ZIP, TAR, GZ	100 MB	Coming soon
Notebook	.ipynb	10 MB	Coming soon
Other	Any other file	25 MB	Stored as-is

Media category is auto-detected from the file’s MIME type. You can associate up to 20 files per span.

Content Blocks

Content blocks let you mix text and media references in span inputs and outputs:

Text blocks — plain text content ({ type: 'text', text: '...' })
Media blocks — references to uploaded media ({ type: 'media', mediaId: '...' })

JavaScript
Python

const { ContentBlock, content } = require('@getfoil/foil-js');

// Mix text and media with the content() helper
const blocks = content(
  'Analyze this document:',
  ContentBlock.media('media-abc-123'),
  'Summarize the key findings.'
);
// => [
//   { type: 'text', text: 'Analyze this document:' },
//   { type: 'media', mediaId: 'media-abc-123' },
//   { type: 'text', text: 'Summarize the key findings.' }
// ]

from foil import ContentBlock, content

blocks = content(
    "Analyze this document:",
    ContentBlock.media("media-abc-123"),
    "Summarize the key findings."
)

Auto-Upload with `ContentBlock.file()` (JavaScript)

The JavaScript SDK supports ContentBlock.file() which automatically uploads files before the span is sent — no manual uploadMedia() call needed:

const { ContentBlock, content } = require('@getfoil/foil-js');

const blocks = content(
  'Process this CSV:',
  ContentBlock.file('/path/to/data.csv'),
  'Find the top trends.'
);

// When used in a span, the file is uploaded automatically
const span = await ctx.startSpan(SpanKind.LLM, 'gpt-4o', {
  input: blocks
});

ContentBlock.file() accepts a file path, Buffer, or ReadStream:

ContentBlock.file('/path/to/file.pdf')
ContentBlock.file(Buffer.from(data), { filename: 'data.csv' })
ContentBlock.file(fs.createReadStream('/path/to/file.png'))

Uploading Media

Upload files directly with uploadMedia() for more control over the upload lifecycle.

JavaScript
Python

const { Foil } = require('@getfoil/foil-js');

const foil = new Foil({ apiKey: 'sk_live_...' });

// Upload from file path
const result = await foil.uploadMedia('/path/to/report.pdf');

// Upload from Buffer
const result = await foil.uploadMedia(Buffer.from(csvData), {
  filename: 'sales-data.csv',
  mimeType: 'text/csv'
});

// Associate with a span at upload time
const result = await foil.uploadMedia('/path/to/file.pdf', {
  spanId: 'span-123',
  traceId: 'trace-456',
  direction: 'input'
});

from foil import Foil

foil = Foil(api_key="sk_live_...")

# Upload from file path
result = foil.upload_media("/path/to/report.pdf")

# Upload from bytes
result = foil.upload_media(csv_bytes, filename="sales-data.csv", mime_type="text/csv")

# Associate with a span at upload time
result = foil.upload_media("/path/to/file.pdf",
    span_id="span-123",
    trace_id="trace-456",
    direction="input"
)

Upload Options

Option	Type	Description
`filename`	string	Override filename (required for Buffer/bytes)
`mimeType` / `mime_type`	string	Override MIME type (auto-detected if omitted)
`spanId` / `span_id`	string	Associate media with a span
`traceId` / `trace_id`	string	Associate media with a trace
`direction`	string	`'input'` or `'output'`

Using Media in Spans

After uploading, reference media in span inputs and outputs using content blocks.

JavaScript
Python

const { Foil, ContentBlock, content } = require('@getfoil/foil-js');

const foil = new Foil({
  apiKey: 'sk_live_...',
  agentName: 'document-analyzer'
});

await foil.trace(async (ctx) => {
  const upload = await foil.uploadMedia('/path/to/report.pdf');

  const response = await ctx.llmCall('gpt-4o', async () => {
    return await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Summarize this document' }]
    });
  }, {
    input: content(
      'Summarize this document:',
      ContentBlock.media(upload.mediaId, {
        category: upload.category,
        filename: upload.filename
      })
    )
  });
});

Or use ContentBlock.file() to skip manual upload:

await foil.trace(async (ctx) => {
  const response = await ctx.llmCall('gpt-4o', async () => {
    return await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Summarize this document' }]
    });
  }, {
    input: content(
      'Summarize this document:',
      ContentBlock.file('/path/to/report.pdf')
    )
  });
});

from foil import Foil, ContentBlock, content

foil = Foil(api_key="sk_live_...")

upload = foil.upload_media("/path/to/report.pdf")

foil.start_span({
    "spanId": span_id,
    "name": "gpt-4o",
    "agentName": "document-analyzer",
    "input": content(
        "Summarize this document:",
        ContentBlock.media(upload["mediaId"],
            category=upload["category"],
            filename=upload["filename"]
        )
    )
})

Retrieving Media

const media = await foil.getMedia('media-abc-123');
console.log(media.category);            // 'document'
console.log(media.processing.status);   // 'completed'

// Include extracted content
const media = await foil.getMedia('media-abc-123', {
  content: 'extracted'
});
console.log(media.extracted.text.preview); // First 1000 chars

When evaluations run on spans containing media content blocks, Foil automatically includes extracted text (up to 15,000 characters per file) in the analysis. No manual configuration needed.

Limitations

Current limitations of multimodal support:

No vision model analysis for images — dimensions are extracted but image content is not analyzed by a vision model
No audio/video transcription — audio and video files are stored but not processed yet
No archive or notebook processing — ZIP files and Jupyter notebooks are stored as-is
20 files per span maximum
Files over 100 MB require presigned upload (contact support)
Extracted text is capped at 100 KB per file; evaluation prompts use up to 15,000 characters per file

Best Practices

Upload media before creating spans

Upload files first and use the returned mediaId in your content blocks. This ensures media is available when the span is processed. Alternatively, use ContentBlock.file() (JavaScript) to handle this automatically.

Use content blocks for structured input/output

Instead of embedding file contents as plain text, use content blocks. This enables Foil to track media associations, provide download links, and include extracted content in evaluations.

Check processing status for time-sensitive workflows

Text extraction is asynchronous. If you need extracted content immediately, poll getMedia() until processing.status is 'completed'.

Prefer file paths over Buffers

When possible, pass file paths to uploadMedia() or ContentBlock.file(). This lets the SDK auto-detect the filename and MIME type. When using Buffer/bytes, always provide a filename.

Getting Started

SDKs

Concepts

Features

Multimodal & Media

Multimodal & Media

How It Works

Supported Media Categories

Content Blocks

Auto-Upload with `ContentBlock.file()` (JavaScript)

Uploading Media

Upload Options

Using Media in Spans

Retrieving Media

Limitations

Best Practices

Next Steps

Traces & Spans

Alerting

Getting Started

SDKs

Concepts

Features

Documentation Index

​Multimodal & Media

​How It Works

​Supported Media Categories

​Content Blocks

​Auto-Upload with ContentBlock.file() (JavaScript)

​Uploading Media

​Upload Options

​Using Media in Spans

​Retrieving Media

​Limitations

​Best Practices

​Next Steps

Traces & Spans

Alerting

Multimodal & Media

How It Works

Supported Media Categories

Content Blocks

Auto-Upload with `ContentBlock.file()` (JavaScript)

Uploading Media

Upload Options

Using Media in Spans

Retrieving Media

Limitations

Best Practices

Next Steps