Claude Files API: from upload and reuse to quota management
If you use Claude in production for document-heavy workflows, repeated base64 uploads become waste quickly. Contract Q&A, invoice extraction, code review, and data analysis all run into the same pattern: the same file is encoded and sent again on every request. A 5 MB PDF expands to about 6.6 MB after base64 encoding; at 100,000 requests, that is 660 GB of avoidable transfer, and the file content is still processed again for token billing.
The Files API is designed for this workflow: upload once, receive a file_id, and reference that ID in later requests. The concept is simple, but production integration has details that are easy to miss: beta headers, quota behavior, workspace-level reuse, deletion strategy, and how Files works with Batch, MCP, and Code Execution.
All examples below use ClaudeAPI’s gateway endpoint with base_url="https://gw.claudeapi.com". The Files API behavior matches Anthropic’s API; only the entry point changes.
What the Files API solves
Sending the same file in every request creates three concrete costs:
| Cost | What happens | How Files API helps |
|---|---|---|
| Bandwidth | Base64 adds about 33% over the original bytes, and every request re-sends the file | Upload once, then send only a file_id of a few dozen bytes |
| Latency | Large file uploads consume the first part of each request | file_id references are constant-size |
| Orchestration complexity | Multiple users sharing the same document each upload their own copy | Within the same workspace, any API key can reference the same file_id |
Files API does not reduce token usage by itself. When a file is referenced, Claude still processes the original content and bills based on the file’s token count. Files API saves network transfer, encoding overhead, and repeated uploads. To reduce token cost, combine it with prompt caching.
Current status as of May 2026
Files API is still in beta. Every request must include:
anthropic-beta: files-api-2025-04-14
anthropic-beta: files-api-2025-04-14
Anthropic has discussed general availability during 2026, but the interface may still change. In production code, keep this beta header in one shared constant so it can be updated in one place.
Supported platforms include direct Claude API calls, Claude Platform on AWS, and Microsoft Foundry. Amazon Bedrock and Vertex AI are not supported for this feature, according to the Anthropic Files API documentation.
Claude Files API lifecycle: upload, reference, and delete
The core lifecycle has three operations: upload the file, reference the file_id, and delete files when they are no longer needed.
Upload a file
import anthropic
client = anthropic.Anthropic(
api_key="sk-your-ClaudeAPI-key",
base_url="https://gw.claudeapi.com",
default_headers={"anthropic-beta": "files-api-2025-04-14"},
)
with open("annual_report.pdf", "rb") as f:
uploaded = client.beta.files.upload(
file=("annual_report.pdf", f, "application/pdf")
)
print(uploaded)
# File(
# id='file_011CNha8iCJcU1wXNR6q4V8w',
# type='file',
# filename='annual_report.pdf',
# mime_type='application/pdf',
# size_bytes=2456789,
# created_at='2026-05-23T08:12:33Z',
# downloadable=False
# )
import anthropic
client = anthropic.Anthropic(
api_key="sk-your-ClaudeAPI-key",
base_url="https://gw.claudeapi.com",
default_headers={"anthropic-beta": "files-api-2025-04-14"},
)
with open("annual_report.pdf", "rb") as f:
uploaded = client.beta.files.upload(
file=("annual_report.pdf", f, "application/pdf")
)
print(uploaded)
# File(
# id='file_011CNha8iCJcU1wXNR6q4V8w',
# type='file',
# filename='annual_report.pdf',
# mime_type='application/pdf',
# size_bytes=2456789,
# created_at='2026-05-23T08:12:33Z',
# downloadable=False
# )
The response fields to track are:
id: the handle used by later requests.size_bytes: the file size counted against the organization’s 100 GB storage quota.downloadable: user-uploaded files cannot be downloaded, so this is alwaysFalse. Only files generated by skills or code execution can be downloaded.
Reference a file in Messages
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
extra_headers={"anthropic-beta": "files-api-2025-04-14"},
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {"type": "file", "file_id": uploaded.id},
},
{"type": "text", "text": "Summarize the key financial metrics in this report."},
],
}
],
)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
extra_headers={"anthropic-beta": "files-api-2025-04-14"},
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {"type": "file", "file_id": uploaded.id},
},
{"type": "text", "text": "Summarize the key financial metrics in this report."},
],
}
],
)
Choose the content block type based on the file type:
| File type | Content block type | source.type |
|---|---|---|
document |
file |
|
| Images: PNG, JPEG, GIF, WebP | image |
file |
| CSV or Excel with code execution | container_upload |
Not applicable |
| Source files with code execution | container_upload |
Not applicable |
List and delete files
# List files with pagination
files = client.beta.files.list(limit=100)
for f in files.data:
print(f.id, f.filename, f.size_bytes, f.created_at)
# Delete one file
client.beta.files.delete("file_011CNha8iCJcU1wXNR6q4V8w")
# List files with pagination
files = client.beta.files.list(limit=100)
for f in files.data:
print(f.id, f.filename, f.size_bytes, f.created_at)
# Delete one file
client.beta.files.delete("file_011CNha8iCJcU1wXNR6q4V8w")
Deletion is irreversible and does not remove historical request records that already used the file_id. However, if an in-progress Batch job still references that file_id, deleting the file can cause the Batch job to fail. Before deleting, use the list endpoint or your own local index to confirm that no active workflow still depends on the file.
cURL examples
# Upload
curl https://gw.claudeapi.com/v1/files \
-H "x-api-key: sk-your-ClaudeAPI-key" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: files-api-2025-04-14" \
-F "file=@annual_report.pdf;type=application/pdf"
# List
curl https://gw.claudeapi.com/v1/files \
-H "x-api-key: sk-your-ClaudeAPI-key" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: files-api-2025-04-14"
# Delete
curl -X DELETE https://gw.claudeapi.com/v1/files/file_xxx \
-H "x-api-key: sk-your-ClaudeAPI-key" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: files-api-2025-04-14"
# Upload
curl https://gw.claudeapi.com/v1/files \
-H "x-api-key: sk-your-ClaudeAPI-key" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: files-api-2025-04-14" \
-F "file=@annual_report.pdf;type=application/pdf"
# List
curl https://gw.claudeapi.com/v1/files \
-H "x-api-key: sk-your-ClaudeAPI-key" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: files-api-2025-04-14"
# Delete
curl -X DELETE https://gw.claudeapi.com/v1/files/file_xxx \
-H "x-api-key: sk-your-ClaudeAPI-key" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: files-api-2025-04-14"
Claude Files API quotas: 500 MB per file and 100 GB per organization
Files API has practical storage limits that should shape your ingestion design.
| Dimension | Limit |
|---|---|
| Single file size | 500 MB |
| Organization or workspace storage | 100 GB |
| Filename length | 1 to 255 characters |
| Disallowed filename characters | Path separators / and \, plus control characters |
| Zero Data Retention | Not applicable to Files API |
A 100 GB quota can fill quickly in production. If a customer-support SaaS uploads 5,000 one-megabyte ticket screenshots per day, it reaches 100 GB in about 20 days. Quota management needs two layers: deduplication before upload and scheduled cleanup after use.
Deduplicate before upload with SHA-256 and a local index
Before uploading, compute a hash of the file. If the hash already exists in your local index, reuse the existing file_id.
import hashlib
import sqlite3
def get_or_upload(file_path: str, client) -> str:
with open(file_path, "rb") as f:
content = f.read()
digest = hashlib.sha256(content).hexdigest()
# Look up the local hash -> file_id index
db = sqlite3.connect("file_index.db")
db.execute("CREATE TABLE IF NOT EXISTS files (hash TEXT PRIMARY KEY, file_id TEXT)")
row = db.execute("SELECT file_id FROM files WHERE hash = ?", (digest,)).fetchone()
if row:
return row[0]
# Upload on miss, then store the mapping
uploaded = client.beta.files.upload(
file=(file_path, content, "application/pdf")
)
db.execute("INSERT INTO files VALUES (?, ?)", (digest, uploaded.id))
db.commit()
return uploaded.id
import hashlib
import sqlite3
def get_or_upload(file_path: str, client) -> str:
with open(file_path, "rb") as f:
content = f.read()
digest = hashlib.sha256(content).hexdigest()
# Look up the local hash -> file_id index
db = sqlite3.connect("file_index.db")
db.execute("CREATE TABLE IF NOT EXISTS files (hash TEXT PRIMARY KEY, file_id TEXT)")
row = db.execute("SELECT file_id FROM files WHERE hash = ?", (digest,)).fetchone()
if row:
return row[0]
# Upload on miss, then store the mapping
uploaded = client.beta.files.upload(
file=(file_path, content, "application/pdf")
)
db.execute("INSERT INTO files VALUES (?, ?)", (digest, uploaded.id))
db.commit()
return uploaded.id
The local hash index can drift from Anthropic’s server state. For example, you may delete a file_id but forget to update the local index. In production, run a weekly reconciliation job that compares your local index against client.beta.files.list().
Clean up by file purpose
Not every file should be retained for the same amount of time. Assign a purpose or tag in your own database, then use that purpose to drive cleanup.
| File purpose | Retention policy |
|---|---|
| User-managed document library | Delete when the user deletes the document |
| Ticket or customer-support screenshots | 30-day TTL, then automatic deletion |
| Temporary files for one-off Q&A | Delete immediately after the call completes |
| Global shared documents such as product manuals or SOPs | Keep permanently on an allowlist |
Anthropic does not provide a server-side TTL field for Files API. Implement TTL using your own created_at value plus a scheduled job.
Monitor storage usage
files = client.beta.files.list(limit=1000)
total_bytes = sum(f.size_bytes for f in files.data)
print(f"Used: {total_bytes / 1024**3:.2f} GB / 100 GB")
print(f"File count: {len(files.data)}")
files = client.beta.files.list(limit=1000)
total_bytes = sum(f.size_bytes for f in files.data)
print(f"Used: {total_bytes / 1024**3:.2f} GB / 100 GB")
print(f"File count: {len(files.data)}")
Send this number to Grafana or Prometheus and alert at 80% usage.
Combining Files API with prompt caching, Batch, and Code Execution
Files API becomes more valuable when combined with other Claude capabilities. By itself, it is an upload and reference layer; with caching, Batch, Code Execution, and agents, it becomes part of a production file workflow.
Files API with prompt caching
For repeated Q&A over long documents, prompt caching can reduce the cost of reusing the same document context. The document content is billed at full price the first time, then cache hits within 5 minutes are billed at one-tenth of the normal price.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
extra_headers={"anthropic-beta": "files-api-2025-04-14"},
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {"type": "file", "file_id": file_id},
"cache_control": {"type": "ephemeral"},
},
{"type": "text", "text": "Question 1..."},
],
}
],
)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
extra_headers={"anthropic-beta": "files-api-2025-04-14"},
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {"type": "file", "file_id": file_id},
"cache_control": {"type": "ephemeral"},
},
{"type": "text", "text": "Question 1..."},
],
}
],
)
For a legal team asking 20 questions in sequence, the effective billing pattern can become 1 full-price pass plus 19 one-tenth-price cache hits, assuming the requests stay within the cache window.
Files API with Code Execution Tool
The Code Execution Tool lets Claude run Python in a sandbox, which is useful for CSV and Excel analysis. After uploading the file through Files API, mount it into the sandbox with the container_upload content type.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
extra_headers={
"anthropic-beta": "files-api-2025-04-14,code-execution-2025-05-22"
},
messages=[
{
"role": "user",
"content": [
{"type": "container_upload", "file_id": csv_file_id},
{"type": "text", "text": "Analyze the sales data and return the top 10 SKUs with year-over-year growth."},
],
}
],
)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
extra_headers={
"anthropic-beta": "files-api-2025-04-14,code-execution-2025-05-22"
},
messages=[
{
"role": "user",
"content": [
{"type": "container_upload", "file_id": csv_file_id},
{"type": "text", "text": "Analyze the sales data and return the top 10 SKUs with year-over-year growth."},
],
}
],
)
Claude can then read the file in the sandbox with pd.read_csv(), run the analysis, and return the result.
Files API with MCP and Managed Agents
The Managed Agents API, introduced as a beta on April 1, 2026, can mount files inside an agent session. The mounted file becomes a session-scoped copy of the file_id, does not count against the user’s storage quota, and cannot be modified by the agent as the original file.
This makes it practical to share standard documents across multiple agents, such as a support SOP or product manual.
Common Files API errors and fixes
Most integration failures fall into a small set of categories: missing beta headers, files that exceed limits, invalid filenames, exhausted storage, or stale file_id references.
| HTTP status | Error type | Common cause | Fix |
|---|---|---|---|
| 400 | invalid_request_error |
Missing beta header | Add anthropic-beta: files-api-2025-04-14 |
| 400 | file_too_large_for_context |
File is larger than the model context window | Split the file or choose a model with a larger context window |
| 400 | invalid_filename |
Filename is invalid | Keep filenames between 1 and 255 characters and remove path separators |
| 413 | request_too_large |
File is larger than 500 MB | Split or compress the file |
| 403 | storage_limit_exceeded |
Organization has reached 100 GB | Delete old files or request more quota |
| 404 | not_found |
file_id was deleted or mistyped |
Check whether the file_id still exists |
Five production practices
These practices prevent most production issues with Files API.
Put the beta header in SDK initialization
client = anthropic.Anthropic(
api_key="...",
base_url="https://gw.claudeapi.com",
default_headers={"anthropic-beta": "files-api-2025-04-14"},
)
client = anthropic.Anthropic(
api_key="...",
base_url="https://gw.claudeapi.com",
default_headers={"anthropic-beta": "files-api-2025-04-14"},
)
Keeping the beta header in one place avoids most 400 errors caused by missing request headers.
Do not expose file_id directly to the frontend
A file_id can be referenced across the same workspace. If the frontend receives it directly, it effectively receives a pointer to the file.
Expose your own business identifier instead, such as doc_uuid, and keep the doc_uuid -> file_id mapping on the backend.
Scope is workspace-level, not API-key-level
If API key A uploads a file, API key B in the same workspace can reference it. This enables workflows where one key handles uploads and another handles Q&A.
It also means multi-tenant systems must enforce tenant isolation themselves. Do not put multiple customer tenants into the same workspace unless your own isolation layer is designed for that.
Deletion is not a good immediate-consistency test
A successful DELETE response does not necessarily mean the file is no longer referenceable at the exact next millisecond. In very short windows, the storage backend may still accept a reference.
Do not write production logic that depends on “delete, then immediately reference” behavior.
Use created_at, not updated_at
Files API does not expose updated_at. Files are immutable after upload; changing a file means uploading a new one. Use created_at for local LRU cleanup and retention policies.
Model selection for file workflows
Files API itself is model-independent, but common production combinations differ by workload.
| Scenario | Recommended model | Why |
|---|---|---|
| Deep Q&A over long documents such as contracts, financial reports, or papers | Opus 4.7 | Stronger reasoning depth and stable long-context handling |
| Standard document Q&A such as product manuals and SOPs | Sonnet 4.6 | Best cost-performance balance for most cases |
| Batch structured extraction such as invoices, resumes, and orders | Haiku 4.5 | Fast, especially with Batch plus Files |
| Data analysis with CSV or Excel plus code execution | Sonnet 4.6 or Opus 4.7 | Choose based on analysis complexity |
Summary
Files API removes the waste of repeated uploads. Its engineering value comes from combining it with prompt caching, Batch, Code Execution, and Managed Agents: together, they can reduce network transfer, latency, and repeated processing overhead in production file workflows.
The integration checklist is short: set the beta header during SDK initialization, deduplicate with a local SHA-256 index, and assign retention policies by file purpose. The rest is operational detail.
Get started
Integrate through ClaudeAPI for a unified, OpenAI-compatible endpoint with console-level usage visibility, Stripe and major credit card payments, and enterprise invoicing options.



