Claude Files API: from upload and reuse to quota management

If you use Claude in production for document-heavy workflows, repeated base64 uploads become waste quickly. Contract Q&A, invoice extraction, code review, and data analysis all run into the same pattern: the same file is encoded and sent again on every request. A 5 MB PDF expands to about 6.6 MB after base64 encoding; at 100,000 requests, that is 660 GB of avoidable transfer, and the file content is still processed again for token billing.

The Files API is designed for this workflow: upload once, receive a file_id, and reference that ID in later requests. The concept is simple, but production integration has details that are easy to miss: beta headers, quota behavior, workspace-level reuse, deletion strategy, and how Files works with Batch, MCP, and Code Execution.

All examples below use ClaudeAPI’s gateway endpoint with base_url="https://gw.claudeapi.com". The Files API behavior matches Anthropic’s API; only the entry point changes.

What the Files API solves

Sending the same file in every request creates three concrete costs:

Cost	What happens	How Files API helps
Bandwidth	Base64 adds about 33% over the original bytes, and every request re-sends the file	Upload once, then send only a `file_id` of a few dozen bytes
Latency	Large file uploads consume the first part of each request	`file_id` references are constant-size
Orchestration complexity	Multiple users sharing the same document each upload their own copy	Within the same workspace, any API key can reference the same `file_id`

Files API does not reduce token usage by itself. When a file is referenced, Claude still processes the original content and bills based on the file’s token count. Files API saves network transfer, encoding overhead, and repeated uploads. To reduce token cost, combine it with prompt caching.

Current status as of May 2026

Files API is still in beta. Every request must include:

anthropic-beta: files-api-2025-04-14

anthropic-beta: files-api-2025-04-14

Anthropic has discussed general availability during 2026, but the interface may still change. In production code, keep this beta header in one shared constant so it can be updated in one place.

Supported platforms include direct Claude API calls, Claude Platform on AWS, and Microsoft Foundry. Amazon Bedrock and Vertex AI are not supported for this feature, according to the Anthropic Files API documentation.

Claude Files API lifecycle: upload, reference, and delete

The core lifecycle has three operations: upload the file, reference the file_id, and delete files when they are no longer needed.

Upload a file

import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-ClaudeAPI-key",
    base_url="https://gw.claudeapi.com",
    default_headers={"anthropic-beta": "files-api-2025-04-14"},
)

with open("annual_report.pdf", "rb") as f:
    uploaded = client.beta.files.upload(
        file=("annual_report.pdf", f, "application/pdf")
    )

print(uploaded)
# File(
#   id='file_011CNha8iCJcU1wXNR6q4V8w',
#   type='file',
#   filename='annual_report.pdf',
#   mime_type='application/pdf',
#   size_bytes=2456789,
#   created_at='2026-05-23T08:12:33Z',
#   downloadable=False
# )

import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-ClaudeAPI-key",
    base_url="https://gw.claudeapi.com",
    default_headers={"anthropic-beta": "files-api-2025-04-14"},
)

with open("annual_report.pdf", "rb") as f:
    uploaded = client.beta.files.upload(
        file=("annual_report.pdf", f, "application/pdf")
    )

print(uploaded)
# File(
#   id='file_011CNha8iCJcU1wXNR6q4V8w',
#   type='file',
#   filename='annual_report.pdf',
#   mime_type='application/pdf',
#   size_bytes=2456789,
#   created_at='2026-05-23T08:12:33Z',
#   downloadable=False
# )

The response fields to track are:

id: the handle used by later requests.
size_bytes: the file size counted against the organization’s 100 GB storage quota.
downloadable: user-uploaded files cannot be downloaded, so this is always False. Only files generated by skills or code execution can be downloaded.

Reference a file in Messages

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    extra_headers={"anthropic-beta": "files-api-2025-04-14"},
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {"type": "file", "file_id": uploaded.id},
                },
                {"type": "text", "text": "Summarize the key financial metrics in this report."},
            ],
        }
    ],
)

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    extra_headers={"anthropic-beta": "files-api-2025-04-14"},
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {"type": "file", "file_id": uploaded.id},
                },
                {"type": "text", "text": "Summarize the key financial metrics in this report."},
            ],
        }
    ],
)

Choose the content block type based on the file type:

File type	Content block type	`source.type`
PDF	`document`	`file`
Images: PNG, JPEG, GIF, WebP	`image`	`file`
CSV or Excel with code execution	`container_upload`	Not applicable
Source files with code execution	`container_upload`	Not applicable

List and delete files

# List files with pagination
files = client.beta.files.list(limit=100)
for f in files.data:
    print(f.id, f.filename, f.size_bytes, f.created_at)

# Delete one file
client.beta.files.delete("file_011CNha8iCJcU1wXNR6q4V8w")

# List files with pagination
files = client.beta.files.list(limit=100)
for f in files.data:
    print(f.id, f.filename, f.size_bytes, f.created_at)

# Delete one file
client.beta.files.delete("file_011CNha8iCJcU1wXNR6q4V8w")

Deletion is irreversible and does not remove historical request records that already used the file_id. However, if an in-progress Batch job still references that file_id, deleting the file can cause the Batch job to fail. Before deleting, use the list endpoint or your own local index to confirm that no active workflow still depends on the file.

cURL examples

# Upload
curl https://gw.claudeapi.com/v1/files \
  -H "x-api-key: sk-your-ClaudeAPI-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: files-api-2025-04-14" \
  -F "file=@annual_report.pdf;type=application/pdf"

# List
curl https://gw.claudeapi.com/v1/files \
  -H "x-api-key: sk-your-ClaudeAPI-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: files-api-2025-04-14"

# Delete
curl -X DELETE https://gw.claudeapi.com/v1/files/file_xxx \
  -H "x-api-key: sk-your-ClaudeAPI-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: files-api-2025-04-14"

# Upload
curl https://gw.claudeapi.com/v1/files \
  -H "x-api-key: sk-your-ClaudeAPI-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: files-api-2025-04-14" \
  -F "file=@annual_report.pdf;type=application/pdf"

# List
curl https://gw.claudeapi.com/v1/files \
  -H "x-api-key: sk-your-ClaudeAPI-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: files-api-2025-04-14"

# Delete
curl -X DELETE https://gw.claudeapi.com/v1/files/file_xxx \
  -H "x-api-key: sk-your-ClaudeAPI-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: files-api-2025-04-14"

Claude Files API quotas: 500 MB per file and 100 GB per organization

Files API has practical storage limits that should shape your ingestion design.

Dimension	Limit
Single file size	500 MB
Organization or workspace storage	100 GB
Filename length	1 to 255 characters
Disallowed filename characters	Path separators `/` and `\`, plus control characters
Zero Data Retention	Not applicable to Files API

A 100 GB quota can fill quickly in production. If a customer-support SaaS uploads 5,000 one-megabyte ticket screenshots per day, it reaches 100 GB in about 20 days. Quota management needs two layers: deduplication before upload and scheduled cleanup after use.

Deduplicate before upload with SHA-256 and a local index

Before uploading, compute a hash of the file. If the hash already exists in your local index, reuse the existing file_id.

import hashlib
import sqlite3

def get_or_upload(file_path: str, client) -> str:
    with open(file_path, "rb") as f:
        content = f.read()
        digest = hashlib.sha256(content).hexdigest()

    # Look up the local hash -> file_id index
    db = sqlite3.connect("file_index.db")
    db.execute("CREATE TABLE IF NOT EXISTS files (hash TEXT PRIMARY KEY, file_id TEXT)")
    row = db.execute("SELECT file_id FROM files WHERE hash = ?", (digest,)).fetchone()
    if row:
        return row[0]

    # Upload on miss, then store the mapping
    uploaded = client.beta.files.upload(
        file=(file_path, content, "application/pdf")
    )
    db.execute("INSERT INTO files VALUES (?, ?)", (digest, uploaded.id))
    db.commit()
    return uploaded.id

import hashlib
import sqlite3

def get_or_upload(file_path: str, client) -> str:
    with open(file_path, "rb") as f:
        content = f.read()
        digest = hashlib.sha256(content).hexdigest()

    # Look up the local hash -> file_id index
    db = sqlite3.connect("file_index.db")
    db.execute("CREATE TABLE IF NOT EXISTS files (hash TEXT PRIMARY KEY, file_id TEXT)")
    row = db.execute("SELECT file_id FROM files WHERE hash = ?", (digest,)).fetchone()
    if row:
        return row[0]

    # Upload on miss, then store the mapping
    uploaded = client.beta.files.upload(
        file=(file_path, content, "application/pdf")
    )
    db.execute("INSERT INTO files VALUES (?, ?)", (digest, uploaded.id))
    db.commit()
    return uploaded.id

The local hash index can drift from Anthropic’s server state. For example, you may delete a file_id but forget to update the local index. In production, run a weekly reconciliation job that compares your local index against client.beta.files.list().

Clean up by file purpose

Not every file should be retained for the same amount of time. Assign a purpose or tag in your own database, then use that purpose to drive cleanup.

File purpose	Retention policy
User-managed document library	Delete when the user deletes the document
Ticket or customer-support screenshots	30-day TTL, then automatic deletion
Temporary files for one-off Q&A	Delete immediately after the call completes
Global shared documents such as product manuals or SOPs	Keep permanently on an allowlist

Anthropic does not provide a server-side TTL field for Files API. Implement TTL using your own created_at value plus a scheduled job.

Monitor storage usage

files = client.beta.files.list(limit=1000)
total_bytes = sum(f.size_bytes for f in files.data)
print(f"Used: {total_bytes / 1024**3:.2f} GB / 100 GB")
print(f"File count: {len(files.data)}")

files = client.beta.files.list(limit=1000)
total_bytes = sum(f.size_bytes for f in files.data)
print(f"Used: {total_bytes / 1024**3:.2f} GB / 100 GB")
print(f"File count: {len(files.data)}")

Send this number to Grafana or Prometheus and alert at 80% usage.

Combining Files API with prompt caching, Batch, and Code Execution

Files API becomes more valuable when combined with other Claude capabilities. By itself, it is an upload and reference layer; with caching, Batch, Code Execution, and agents, it becomes part of a production file workflow.

Files API with prompt caching

For repeated Q&A over long documents, prompt caching can reduce the cost of reusing the same document context. The document content is billed at full price the first time, then cache hits within 5 minutes are billed at one-tenth of the normal price.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    extra_headers={"anthropic-beta": "files-api-2025-04-14"},
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {"type": "file", "file_id": file_id},
                    "cache_control": {"type": "ephemeral"},
                },
                {"type": "text", "text": "Question 1..."},
            ],
        }
    ],
)

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    extra_headers={"anthropic-beta": "files-api-2025-04-14"},
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {"type": "file", "file_id": file_id},
                    "cache_control": {"type": "ephemeral"},
                },
                {"type": "text", "text": "Question 1..."},
            ],
        }
    ],
)

For a legal team asking 20 questions in sequence, the effective billing pattern can become 1 full-price pass plus 19 one-tenth-price cache hits, assuming the requests stay within the cache window.

Files API with Code Execution Tool

The Code Execution Tool lets Claude run Python in a sandbox, which is useful for CSV and Excel analysis. After uploading the file through Files API, mount it into the sandbox with the container_upload content type.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
    extra_headers={
        "anthropic-beta": "files-api-2025-04-14,code-execution-2025-05-22"
    },
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "container_upload", "file_id": csv_file_id},
                {"type": "text", "text": "Analyze the sales data and return the top 10 SKUs with year-over-year growth."},
            ],
        }
    ],
)

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
    extra_headers={
        "anthropic-beta": "files-api-2025-04-14,code-execution-2025-05-22"
    },
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "container_upload", "file_id": csv_file_id},
                {"type": "text", "text": "Analyze the sales data and return the top 10 SKUs with year-over-year growth."},
            ],
        }
    ],
)

Claude can then read the file in the sandbox with pd.read_csv(), run the analysis, and return the result.

Files API with MCP and Managed Agents

The Managed Agents API, introduced as a beta on April 1, 2026, can mount files inside an agent session. The mounted file becomes a session-scoped copy of the file_id, does not count against the user’s storage quota, and cannot be modified by the agent as the original file.

This makes it practical to share standard documents across multiple agents, such as a support SOP or product manual.

Common Files API errors and fixes

Most integration failures fall into a small set of categories: missing beta headers, files that exceed limits, invalid filenames, exhausted storage, or stale file_id references.

HTTP status	Error type	Common cause	Fix
400	`invalid_request_error`	Missing beta header	Add `anthropic-beta: files-api-2025-04-14`
400	`file_too_large_for_context`	File is larger than the model context window	Split the file or choose a model with a larger context window
400	`invalid_filename`	Filename is invalid	Keep filenames between 1 and 255 characters and remove path separators
413	`request_too_large`	File is larger than 500 MB	Split or compress the file
403	`storage_limit_exceeded`	Organization has reached 100 GB	Delete old files or request more quota
404	`not_found`	`file_id` was deleted or mistyped	Check whether the `file_id` still exists

Five production practices

These practices prevent most production issues with Files API.

Put the beta header in SDK initialization

client = anthropic.Anthropic(
    api_key="...",
    base_url="https://gw.claudeapi.com",
    default_headers={"anthropic-beta": "files-api-2025-04-14"},
)

client = anthropic.Anthropic(
    api_key="...",
    base_url="https://gw.claudeapi.com",
    default_headers={"anthropic-beta": "files-api-2025-04-14"},
)

Keeping the beta header in one place avoids most 400 errors caused by missing request headers.

Do not expose `file_id` directly to the frontend

A file_id can be referenced across the same workspace. If the frontend receives it directly, it effectively receives a pointer to the file.

Expose your own business identifier instead, such as doc_uuid, and keep the doc_uuid -> file_id mapping on the backend.

Scope is workspace-level, not API-key-level

If API key A uploads a file, API key B in the same workspace can reference it. This enables workflows where one key handles uploads and another handles Q&A.

It also means multi-tenant systems must enforce tenant isolation themselves. Do not put multiple customer tenants into the same workspace unless your own isolation layer is designed for that.

Deletion is not a good immediate-consistency test

A successful DELETE response does not necessarily mean the file is no longer referenceable at the exact next millisecond. In very short windows, the storage backend may still accept a reference.

Do not write production logic that depends on “delete, then immediately reference” behavior.

Use `created_at`, not `updated_at`

Files API does not expose updated_at. Files are immutable after upload; changing a file means uploading a new one. Use created_at for local LRU cleanup and retention policies.

Model selection for file workflows

Files API itself is model-independent, but common production combinations differ by workload.

Scenario	Recommended model	Why
Deep Q&A over long documents such as contracts, financial reports, or papers	Opus 4.7	Stronger reasoning depth and stable long-context handling
Standard document Q&A such as product manuals and SOPs	Sonnet 4.6	Best cost-performance balance for most cases
Batch structured extraction such as invoices, resumes, and orders	Haiku 4.5	Fast, especially with Batch plus Files
Data analysis with CSV or Excel plus code execution	Sonnet 4.6 or Opus 4.7	Choose based on analysis complexity

Summary

Files API removes the waste of repeated uploads. Its engineering value comes from combining it with prompt caching, Batch, Code Execution, and Managed Agents: together, they can reduce network transfer, latency, and repeated processing overhead in production file workflows.

The integration checklist is short: set the beta header during SDK initialization, deduplicate with a local SHA-256 index, and assign retention policies by file purpose. The rest is operational detail.

Get started

Integrate through ClaudeAPI for a unified, OpenAI-compatible endpoint with console-level usage visibility, Stripe and major credit card payments, and enterprise invoicing options.

Claude Files API: from upload and reuse to quota management