Claude Citations API in Practice: Auto-Annotated Source References That Boost RAG Accuracy by 15%
Every engineer building RAG (Retrieval-Augmented Generation) has faced this question from stakeholders:
“Where exactly in the documents did this answer come from?”
The question is devastating because the model will confidently cite a passage that doesn’t exist in the source material. In legal, healthcare, finance, and compliance auditing contexts, a bad citation isn’t just embarrassing — it’s a liability.
The traditional workaround is to prompt-engineer your way out: “Please use square brackets to indicate which passages you’re referencing,” then parse with regex. This approach has at least three problems:
- The model fabricates plausible-looking but nonexistent citation numbers
- Outputting raw source text in the response doubles your output token costs
- The parsing logic is brittle — the model occasionally breaks format and everything falls apart
Anthropic’s Citations API — introduced in early 2025 and GA in 2026 — provides the official solution. The model returns structured citation objects containing character-level offsets, document indices, and source text excerpts, all guaranteed at the API layer.
This article covers the complete integration guide, three document input methods, a head-to-head comparison with prompt-engineered citations, and a real, runnable legal RAG example.
1. What Citations Actually Are
Citations is a toggle on Anthropic’s messages API for document content blocks. Pass your documents as document content blocks to Claude, set citations.enabled = true, and the model automatically attaches a citation object to each claim in its response:
{
"type": "char_location",
"cited_text": "...the referenced source text excerpt...",
"document_index": 0,
"document_title": "annual-report-2025.pdf",
"start_char_index": 1024,
"end_char_index": 1180
}
{
"type": "char_location",
"cited_text": "...the referenced source text excerpt...",
"document_index": 0,
"document_title": "annual-report-2025.pdf",
"start_char_index": 1024,
"end_char_index": 1180
}
Four key fields:
| Field | Description |
|---|---|
cited_text |
The referenced source text excerpt — does not count toward output tokens |
document_index |
Which document (0-indexed) |
start_char_index / end_char_index |
Character offset within that document |
type |
Citation granularity type (see table below) |
Three Levels of Citation Granularity
| Granularity | Best For | Type Field |
|---|---|---|
| char_location | Plain text documents (most precise) | char_location |
| page_location | PDF documents (page-level) | page_location |
| content_block_location | Custom document blocks (most flexible) | content_block_location |
Why This Beats Prompt Engineering
In Anthropic’s own benchmarks, the built-in Citations feature improved recall accuracy by up to 15% compared to prompt-engineered citation approaches. Three reasons:
- Citation positions are API-guaranteed —
start_char_index/end_char_indexare computed by the system, not generated as model output strings cited_textdoesn’t count toward output tokens — you no longer pay extra to have the model “repeat back the source text”- The model can’t fabricate citations — every citation maps to a real position in the documents you provided
2. Three Ways to Feed Documents
Method 1: Plain Text (Character-Precise Citations)
Best for: Pre-parsed plain text documents where you need character-level precision.
import anthropic
client = anthropic.Anthropic(
api_key="sk-yourClaudeAPIkey",
base_url="https://gw.claudeapi.com"
)
doc_text = open("contract.txt", "r", encoding="utf-8").read()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": doc_text,
},
"title": "Partnership-Agreement-2026.txt",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "What are the penalty clauses for breach of contract?"
}
]
}]
)
import anthropic
client = anthropic.Anthropic(
api_key="sk-yourClaudeAPIkey",
base_url="https://gw.claudeapi.com"
)
doc_text = open("contract.txt", "r", encoding="utf-8").read()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": doc_text,
},
"title": "Partnership-Agreement-2026.txt",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "What are the penalty clauses for breach of contract?"
}
]
}]
)
Parsing the response:
for block in response.content:
if block.type == "text":
print(block.text)
for cit in block.citations or []:
print(f" └─ Cited from {cit.document_title}")
print(f" Chars [{cit.start_char_index}:{cit.end_char_index}]")
print(f" Source: {cit.cited_text[:80]}...")
for block in response.content:
if block.type == "text":
print(block.text)
for cit in block.citations or []:
print(f" └─ Cited from {cit.document_title}")
print(f" Chars [{cit.start_char_index}:{cit.end_char_index}]")
print(f" Source: {cit.cited_text[:80]}...")
Example output:
The breaching party shall pay the non-breaching party a penalty of 20% of the total contract value, payable in full within 30 days.
└─ Cited from Partnership-Agreement-2026.txt
Chars [4128:4280]
Source: Article 12 — Breach of Contract: The breaching party shall pay the non-breaching...
The breaching party shall pay the non-breaching party a penalty of 20% of the total contract value, payable in full within 30 days.
└─ Cited from Partnership-Agreement-2026.txt
Chars [4128:4280]
Source: Article 12 — Breach of Contract: The breaching party shall pay the non-breaching...
Method 2: PDF (Page-Level Citations)
Best for: Scanned documents, financial reports with charts, original contract PDFs. The model automatically handles OCR + visual understanding, with citations precise to the PDF page number.
import base64
with open("annual-report.pdf", "rb") as f:
pdf_b64 = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_b64,
},
"title": "Annual-Report-2025.pdf",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "What is the net operating cash flow in the cash flow statement?"
}
]
}]
)
import base64
with open("annual-report.pdf", "rb") as f:
pdf_b64 = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_b64,
},
"title": "Annual-Report-2025.pdf",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "What is the net operating cash flow in the cash flow statement?"
}
]
}]
)
In PDF mode, citations include start_page_number / end_page_number (1-indexed):
Net operating cash flow was $2.85 billion, a 12% year-over-year increase.
└─ Cited from Annual-Report-2025.pdf, pages 32-32
Net operating cash flow was $2.85 billion, a 12% year-over-year increase.
└─ Cited from Annual-Report-2025.pdf, pages 32-32
If the document has already been uploaded via the Files API, you can reference it by file_id — see the Files API Complete Guide.
Method 3: Custom Content (Most Flexible — Essential for RAG)
Best for: RAG pipelines where you’ve pre-chunked documents and want precise control over chunk boundaries.
chunks = [
"Article 1: This agreement is entered into by Party A and Party B on January 1, 2026...",
"Article 12 — Breach of Contract: The breaching party shall pay the non-breaching party a penalty of 20% of the total contract value...",
"Article 15 — Dispute Resolution: Any disputes arising from this agreement..."
]
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "content",
"content": [
{"type": "text", "text": chunk} for chunk in chunks
]
},
"title": "Contract-Clauses-Retrieved",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "What is the penalty amount? How are disputes resolved?"
}
]
}]
)
chunks = [
"Article 1: This agreement is entered into by Party A and Party B on January 1, 2026...",
"Article 12 — Breach of Contract: The breaching party shall pay the non-breaching party a penalty of 20% of the total contract value...",
"Article 15 — Dispute Resolution: Any disputes arising from this agreement..."
]
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "content",
"content": [
{"type": "text", "text": chunk} for chunk in chunks
]
},
"title": "Contract-Clauses-Retrieved",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "What is the penalty amount? How are disputes resolved?"
}
]
}]
)
The returned citations include start_block_index / end_block_index, telling you exactly which block in the array was referenced. This mode is particularly useful for:
- Post-retrieval top-K chunks: Each chunk is a block — the model tells you exactly which blocks it cited
- Multi-document summarization: Combine fragments from different sources into one document with a clean citation structure
3. Equivalent Implementations in Node.js / cURL
Node.js / TypeScript
import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";
const client = new Anthropic({
apiKey: "sk-yourClaudeAPIkey",
baseURL: "https://gw.claudeapi.com",
});
const docText = fs.readFileSync("contract.txt", "utf-8");
const response = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 1024,
messages: [{
role: "user",
content: [
{
type: "document",
source: { type: "text", media_type: "text/plain", data: docText },
title: "Partnership-Agreement.txt",
citations: { enabled: true },
},
{ type: "text", text: "What are the penalty clauses?" }
]
}]
});
for (const block of response.content) {
if (block.type === "text") {
console.log(block.text);
for (const cit of (block as any).citations || []) {
console.log(` └─ ${cit.document_title} [${cit.start_char_index}:${cit.end_char_index}]`);
}
}
}
import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";
const client = new Anthropic({
apiKey: "sk-yourClaudeAPIkey",
baseURL: "https://gw.claudeapi.com",
});
const docText = fs.readFileSync("contract.txt", "utf-8");
const response = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 1024,
messages: [{
role: "user",
content: [
{
type: "document",
source: { type: "text", media_type: "text/plain", data: docText },
title: "Partnership-Agreement.txt",
citations: { enabled: true },
},
{ type: "text", text: "What are the penalty clauses?" }
]
}]
});
for (const block of response.content) {
if (block.type === "text") {
console.log(block.text);
for (const cit of (block as any).citations || []) {
console.log(` └─ ${cit.document_title} [${cit.start_char_index}:${cit.end_char_index}]`);
}
}
}
cURL
curl https://gw.claudeapi.com/v1/messages \
-H "x-api-key: sk-yourClaudeAPIkey" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-opus-4-7",
"max_tokens": 1024,
"messages": [{
"role": "user",
"content": [
{
"type": "document",
"source": {"type": "text", "media_type": "text/plain", "data": "This agreement is effective as of January 1, 2026..."},
"title": "contract.txt",
"citations": {"enabled": true}
},
{"type": "text", "text": "When does the agreement take effect?"}
]
}]
}'
curl https://gw.claudeapi.com/v1/messages \
-H "x-api-key: sk-yourClaudeAPIkey" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-opus-4-7",
"max_tokens": 1024,
"messages": [{
"role": "user",
"content": [
{
"type": "document",
"source": {"type": "text", "media_type": "text/plain", "data": "This agreement is effective as of January 1, 2026..."},
"title": "contract.txt",
"citations": {"enabled": true}
},
{"type": "text", "text": "When does the agreement take effect?"}
]
}]
}'
4. Citations vs. Prompt-Engineered References: Six-Dimension Comparison
| Dimension | Prompt-Engineered Citations | Citations API |
|---|---|---|
| Citation reliability | Model may fabricate | API guarantees real positions |
| Output token cost | Cited text counts as output (expensive) | cited_text is not billed |
| Character-level offsets | Must parse yourself | Built-in start_char_index |
| Multi-document support | Requires complex prompt scaffolding | Automatic document_index |
| Implementation complexity | Custom regex parsing | SDK returns structured objects |
| Recall accuracy | Baseline | +15% (Anthropic internal benchmarks) |
For legal, healthcare, finance, and government customers, the first two dimensions alone are enough to drive the technical decision.
5. Web Search Includes Citations by Default (No Toggle Needed)
If you’re using Anthropic’s native web_search tool (web_search_20260209 is the latest 2026 version with dynamic filtering support), citations are on by default — you don’t need to add citations.enabled to a document block. Every web result automatically carries a web_search_result_location citation type.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[{"type": "web_search_20260209", "name": "web_search"}],
messages=[{"role": "user", "content": "What is Anthropic's 2026 valuation? Cite your sources."}]
)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[{"type": "web_search_20260209", "name": "web_search"}],
messages=[{"role": "user", "content": "What is Anthropic's 2026 valuation? Cite your sources."}]
)
Anthropic’s terms of service explicitly state: when displaying API output directly to end users, you must preserve citations pointing to the original sources — this is especially important for teams building search or Q&A products. Don’t hide the URL field for aesthetic reasons.
The cited_text, title, and url fields are also not counted toward token billing.
6. Pitfalls to Watch Out For
Pitfall 1: Citations + streaming requires careful accumulation. In streaming mode, each text delta may carry a citation_delta that needs to be accumulated per block on the client side — you can’t just concatenate strings. The SDK handles this, but if you’re writing a custom client, be careful.
Pitfall 2: Documents should have a title. It technically works without one, but all citation document_title fields will be empty, making frontend display ugly. Always pass a title.
Pitfall 3: In custom content mode, overly granular blocks hurt recall. Splitting a contract into 200 blocks of 8 words each actually makes the model “lose context.” Keep each block at 100–500 words as a semantically complete paragraph — too granular actually reduces accuracy.
Pitfall 4: Citation granularity varies slightly across models. Opus 4.7 / Sonnet 4.6 / Haiku 4.5 all support Citations, but Haiku occasionally misses citations in complex multi-document scenarios. For critical use cases, use Opus 4.7 or Sonnet 4.6.
Pitfall 5: Citations ≠ hallucination-proof. The model can still misinterpret document semantics. Citations only guarantee that “the cited position actually exists in the source” — not that “the interpretation of the citation is correct.” In production, consider a secondary validation step: run a semantic consistency check between cited_text and the model’s answer.
Pitfall 6: Can’t reach api.anthropic.com directly from some regions. Change base_url to https://gw.claudeapi.com to access all Citations capabilities with no additional configuration.
7. A Complete End-to-End Legal RAG Example
Putting the Custom Content approach together into a minimal runnable legal consultation assistant:
import anthropic
from typing import List
client = anthropic.Anthropic(
api_key="sk-yourClaudeAPIkey",
base_url="https://gw.claudeapi.com"
)
# Assume you have a vector database (Pinecone / Weaviate / Chroma)
def retrieve_chunks(query: str, k: int = 5) -> List[dict]:
"""Return top-k relevant legal provisions"""
# ... your vector retrieval code
return [
{"source": "Civil Code", "article": "Article 585", "text": "The parties may agree that when one party breaches..."},
{"source": "Contract Law", "article": "Article 114", "text": "If the agreed penalty is less than the loss incurred..."},
# ...
]
def answer_with_citations(question: str):
chunks = retrieve_chunks(question)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
system="You are a rigorous legal consultation assistant. Your answers must be based solely on the provided legal provisions. Do not cite any laws not included in the provided documents.",
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "content",
"content": [
{"type": "text",
"text": f"[{c['source']} {c['article']}] {c['text']}"}
for c in chunks
]
},
"title": "Retrieved Legal Provisions",
"citations": {"enabled": True}
},
{"type": "text", "text": question}
]
}]
)
# Structured output with citations
result = {"answer": "", "citations": []}
for block in response.content:
if block.type == "text":
result["answer"] += block.text
for cit in (block.citations or []):
idx = cit.start_block_index
result["citations"].append({
"source": chunks[idx]["source"],
"article": chunks[idx]["article"],
"cited_text": cit.cited_text,
})
return result
print(answer_with_citations("Can the agreed penalty be adjusted if it's too low?"))
import anthropic
from typing import List
client = anthropic.Anthropic(
api_key="sk-yourClaudeAPIkey",
base_url="https://gw.claudeapi.com"
)
# Assume you have a vector database (Pinecone / Weaviate / Chroma)
def retrieve_chunks(query: str, k: int = 5) -> List[dict]:
"""Return top-k relevant legal provisions"""
# ... your vector retrieval code
return [
{"source": "Civil Code", "article": "Article 585", "text": "The parties may agree that when one party breaches..."},
{"source": "Contract Law", "article": "Article 114", "text": "If the agreed penalty is less than the loss incurred..."},
# ...
]
def answer_with_citations(question: str):
chunks = retrieve_chunks(question)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
system="You are a rigorous legal consultation assistant. Your answers must be based solely on the provided legal provisions. Do not cite any laws not included in the provided documents.",
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "content",
"content": [
{"type": "text",
"text": f"[{c['source']} {c['article']}] {c['text']}"}
for c in chunks
]
},
"title": "Retrieved Legal Provisions",
"citations": {"enabled": True}
},
{"type": "text", "text": question}
]
}]
)
# Structured output with citations
result = {"answer": "", "citations": []}
for block in response.content:
if block.type == "text":
result["answer"] += block.text
for cit in (block.citations or []):
idx = cit.start_block_index
result["citations"].append({
"source": chunks[idx]["source"],
"article": chunks[idx]["article"],
"cited_text": cit.cited_text,
})
return result
print(answer_with_citations("Can the agreed penalty be adjusted if it's too low?"))
Example output:
{
"answer": "Under the law, if the agreed penalty is less than the loss incurred, the parties may request the court or arbitration institution to increase it.",
"citations": [
{
"source": "Contract Law",
"article": "Article 114",
"cited_text": "If the agreed penalty is less than the loss incurred, the parties may request the people's court or arbitration institution to increase it..."
}
]
}
{
"answer": "Under the law, if the agreed penalty is less than the loss incurred, the parties may request the court or arbitration institution to increase it.",
"citations": [
{
"source": "Contract Law",
"article": "Article 114",
"cited_text": "If the agreed penalty is less than the loss incurred, the parties may request the people's court or arbitration institution to increase it..."
}
]
}
This output can be fed directly to a frontend for “hover to see source text” or “export audit report” — far more engineering-friendly than the prompt-engineered approach.
8. Why Use ClaudeAPI.com for Citations
Citations is a native Anthropic API feature, which means you must use the Anthropic-native path — the OpenAI-compatible interface (chat/completions) does not support Citations.
Advantages of accessing through claudeapi.com:
| Dimension | Details |
|---|---|
| Protocol compatibility | 100% compatible with the native Anthropic messages API — Citations, Files, Batch all supported |
| Endpoint | https://gw.claudeapi.com — direct access from anywhere, no extra configuration |
| Billing | Pay per token, on-demand |
| Models | Opus 4.7 / Sonnet 4.6 recommended for Citations, available at standard pricing |
See pricing details on the claudeapi.com pricing page. For RAG workloads, Sonnet 4.6 is commonly used — and the fact that Citations’ cited_text doesn’t count toward billing makes a meaningful cost difference.
Summary
Citations is Anthropic’s official answer to the “prevent hallucinations + enable auditability” requirement — more reliable, cheaper, and more engineering-friendly than prompt-engineered citations:
- Three granularity levels: char_location (plain text), page_location (PDF), content_block_location (custom chunks)
- Cost-saving detail: cited_text does not count toward output tokens
- Production recommendations: Use Opus 4.7 / Sonnet 4.6 for critical use cases, always pass a document title, and keep custom blocks as semantically complete paragraphs
Set base_url to https://gw.claudeapi.com to access all Citations capabilities through claudeapi.com — sign up and get started in minutes.
References: Citations API official documentation, Anthropic Citations launch blog, Web Search Tool.



