Skip to main content

Claude Citations API in Practice: Auto-Annotated Source References That Boost RAG Accuracy by 15%

Claude Citations enables the model to precisely cite the exact document passages used in its responses, helping reduce hallucinations and improve auditability. This article provides complete code examples for three document formats: PDF, plain text, and custom documents. It also compares Claude Citations with manually prompted citations, and explains how `cited_text` can help reduce costs because it is not billed.

Dev GuidesCitationsRAGcited_textAnti-hallucinationLegal complianceEst. read10min
2026.05.20 published
Claude Citations API in Practice: Auto-Annotated Source References That Boost RAG Accuracy by 15%

Claude Citations API in Practice: Auto-Annotated Source References That Boost RAG Accuracy by 15%

Every engineer building RAG (Retrieval-Augmented Generation) has faced this question from stakeholders:

“Where exactly in the documents did this answer come from?”

The question is devastating because the model will confidently cite a passage that doesn’t exist in the source material. In legal, healthcare, finance, and compliance auditing contexts, a bad citation isn’t just embarrassing — it’s a liability.

The traditional workaround is to prompt-engineer your way out: “Please use square brackets to indicate which passages you’re referencing,” then parse with regex. This approach has at least three problems:

  1. The model fabricates plausible-looking but nonexistent citation numbers
  2. Outputting raw source text in the response doubles your output token costs
  3. The parsing logic is brittle — the model occasionally breaks format and everything falls apart

Anthropic’s Citations API — introduced in early 2025 and GA in 2026 — provides the official solution. The model returns structured citation objects containing character-level offsets, document indices, and source text excerpts, all guaranteed at the API layer.

This article covers the complete integration guide, three document input methods, a head-to-head comparison with prompt-engineered citations, and a real, runnable legal RAG example.


1. What Citations Actually Are

Citations is a toggle on Anthropic’s messages API for document content blocks. Pass your documents as document content blocks to Claude, set citations.enabled = true, and the model automatically attaches a citation object to each claim in its response:

{
  "type": "char_location",
  "cited_text": "...the referenced source text excerpt...",
  "document_index": 0,
  "document_title": "annual-report-2025.pdf",
  "start_char_index": 1024,
  "end_char_index": 1180
}
{
  "type": "char_location",
  "cited_text": "...the referenced source text excerpt...",
  "document_index": 0,
  "document_title": "annual-report-2025.pdf",
  "start_char_index": 1024,
  "end_char_index": 1180
}

Four key fields:

Field Description
cited_text The referenced source text excerpt — does not count toward output tokens
document_index Which document (0-indexed)
start_char_index / end_char_index Character offset within that document
type Citation granularity type (see table below)

Three Levels of Citation Granularity

Granularity Best For Type Field
char_location Plain text documents (most precise) char_location
page_location PDF documents (page-level) page_location
content_block_location Custom document blocks (most flexible) content_block_location

Why This Beats Prompt Engineering

In Anthropic’s own benchmarks, the built-in Citations feature improved recall accuracy by up to 15% compared to prompt-engineered citation approaches. Three reasons:

  1. Citation positions are API-guaranteedstart_char_index/end_char_index are computed by the system, not generated as model output strings
  2. cited_text doesn’t count toward output tokens — you no longer pay extra to have the model “repeat back the source text”
  3. The model can’t fabricate citations — every citation maps to a real position in the documents you provided

2. Three Ways to Feed Documents

Method 1: Plain Text (Character-Precise Citations)

Best for: Pre-parsed plain text documents where you need character-level precision.

import anthropic

client = anthropic.Anthropic(
    api_key="sk-yourClaudeAPIkey",
    base_url="https://gw.claudeapi.com"
)

doc_text = open("contract.txt", "r", encoding="utf-8").read()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "text",
                    "media_type": "text/plain",
                    "data": doc_text,
                },
                "title": "Partnership-Agreement-2026.txt",
                "citations": {"enabled": True}
            },
            {
                "type": "text",
                "text": "What are the penalty clauses for breach of contract?"
            }
        ]
    }]
)
import anthropic

client = anthropic.Anthropic(
    api_key="sk-yourClaudeAPIkey",
    base_url="https://gw.claudeapi.com"
)

doc_text = open("contract.txt", "r", encoding="utf-8").read()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "text",
                    "media_type": "text/plain",
                    "data": doc_text,
                },
                "title": "Partnership-Agreement-2026.txt",
                "citations": {"enabled": True}
            },
            {
                "type": "text",
                "text": "What are the penalty clauses for breach of contract?"
            }
        ]
    }]
)

Parsing the response:

for block in response.content:
    if block.type == "text":
        print(block.text)
        for cit in block.citations or []:
            print(f"  └─ Cited from {cit.document_title}")
            print(f"     Chars [{cit.start_char_index}:{cit.end_char_index}]")
            print(f"     Source: {cit.cited_text[:80]}...")
for block in response.content:
    if block.type == "text":
        print(block.text)
        for cit in block.citations or []:
            print(f"  └─ Cited from {cit.document_title}")
            print(f"     Chars [{cit.start_char_index}:{cit.end_char_index}]")
            print(f"     Source: {cit.cited_text[:80]}...")

Example output:

The breaching party shall pay the non-breaching party a penalty of 20% of the total contract value, payable in full within 30 days.
  └─ Cited from Partnership-Agreement-2026.txt
     Chars [4128:4280]
     Source: Article 12 — Breach of Contract: The breaching party shall pay the non-breaching...
The breaching party shall pay the non-breaching party a penalty of 20% of the total contract value, payable in full within 30 days.
  └─ Cited from Partnership-Agreement-2026.txt
     Chars [4128:4280]
     Source: Article 12 — Breach of Contract: The breaching party shall pay the non-breaching...

Method 2: PDF (Page-Level Citations)

Best for: Scanned documents, financial reports with charts, original contract PDFs. The model automatically handles OCR + visual understanding, with citations precise to the PDF page number.

import base64

with open("annual-report.pdf", "rb") as f:
    pdf_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_b64,
                },
                "title": "Annual-Report-2025.pdf",
                "citations": {"enabled": True}
            },
            {
                "type": "text",
                "text": "What is the net operating cash flow in the cash flow statement?"
            }
        ]
    }]
)
import base64

with open("annual-report.pdf", "rb") as f:
    pdf_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_b64,
                },
                "title": "Annual-Report-2025.pdf",
                "citations": {"enabled": True}
            },
            {
                "type": "text",
                "text": "What is the net operating cash flow in the cash flow statement?"
            }
        ]
    }]
)

In PDF mode, citations include start_page_number / end_page_number (1-indexed):

Net operating cash flow was $2.85 billion, a 12% year-over-year increase.
  └─ Cited from Annual-Report-2025.pdf, pages 32-32
Net operating cash flow was $2.85 billion, a 12% year-over-year increase.
  └─ Cited from Annual-Report-2025.pdf, pages 32-32

If the document has already been uploaded via the Files API, you can reference it by file_id — see the Files API Complete Guide.

Method 3: Custom Content (Most Flexible — Essential for RAG)

Best for: RAG pipelines where you’ve pre-chunked documents and want precise control over chunk boundaries.

chunks = [
    "Article 1: This agreement is entered into by Party A and Party B on January 1, 2026...",
    "Article 12 — Breach of Contract: The breaching party shall pay the non-breaching party a penalty of 20% of the total contract value...",
    "Article 15 — Dispute Resolution: Any disputes arising from this agreement..."
]

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "content",
                    "content": [
                        {"type": "text", "text": chunk} for chunk in chunks
                    ]
                },
                "title": "Contract-Clauses-Retrieved",
                "citations": {"enabled": True}
            },
            {
                "type": "text",
                "text": "What is the penalty amount? How are disputes resolved?"
            }
        ]
    }]
)
chunks = [
    "Article 1: This agreement is entered into by Party A and Party B on January 1, 2026...",
    "Article 12 — Breach of Contract: The breaching party shall pay the non-breaching party a penalty of 20% of the total contract value...",
    "Article 15 — Dispute Resolution: Any disputes arising from this agreement..."
]

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "content",
                    "content": [
                        {"type": "text", "text": chunk} for chunk in chunks
                    ]
                },
                "title": "Contract-Clauses-Retrieved",
                "citations": {"enabled": True}
            },
            {
                "type": "text",
                "text": "What is the penalty amount? How are disputes resolved?"
            }
        ]
    }]
)

The returned citations include start_block_index / end_block_index, telling you exactly which block in the array was referenced. This mode is particularly useful for:

  • Post-retrieval top-K chunks: Each chunk is a block — the model tells you exactly which blocks it cited
  • Multi-document summarization: Combine fragments from different sources into one document with a clean citation structure

3. Equivalent Implementations in Node.js / cURL

Node.js / TypeScript

import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";

const client = new Anthropic({
  apiKey: "sk-yourClaudeAPIkey",
  baseURL: "https://gw.claudeapi.com",
});

const docText = fs.readFileSync("contract.txt", "utf-8");

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  messages: [{
    role: "user",
    content: [
      {
        type: "document",
        source: { type: "text", media_type: "text/plain", data: docText },
        title: "Partnership-Agreement.txt",
        citations: { enabled: true },
      },
      { type: "text", text: "What are the penalty clauses?" }
    ]
  }]
});

for (const block of response.content) {
  if (block.type === "text") {
    console.log(block.text);
    for (const cit of (block as any).citations || []) {
      console.log(`  └─ ${cit.document_title} [${cit.start_char_index}:${cit.end_char_index}]`);
    }
  }
}
import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";

const client = new Anthropic({
  apiKey: "sk-yourClaudeAPIkey",
  baseURL: "https://gw.claudeapi.com",
});

const docText = fs.readFileSync("contract.txt", "utf-8");

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  messages: [{
    role: "user",
    content: [
      {
        type: "document",
        source: { type: "text", media_type: "text/plain", data: docText },
        title: "Partnership-Agreement.txt",
        citations: { enabled: true },
      },
      { type: "text", text: "What are the penalty clauses?" }
    ]
  }]
});

for (const block of response.content) {
  if (block.type === "text") {
    console.log(block.text);
    for (const cit of (block as any).citations || []) {
      console.log(`  └─ ${cit.document_title} [${cit.start_char_index}:${cit.end_char_index}]`);
    }
  }
}

cURL

curl https://gw.claudeapi.com/v1/messages \
  -H "x-api-key: sk-yourClaudeAPIkey" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "messages": [{
      "role": "user",
      "content": [
        {
          "type": "document",
          "source": {"type": "text", "media_type": "text/plain", "data": "This agreement is effective as of January 1, 2026..."},
          "title": "contract.txt",
          "citations": {"enabled": true}
        },
        {"type": "text", "text": "When does the agreement take effect?"}
      ]
    }]
  }'
curl https://gw.claudeapi.com/v1/messages \
  -H "x-api-key: sk-yourClaudeAPIkey" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "messages": [{
      "role": "user",
      "content": [
        {
          "type": "document",
          "source": {"type": "text", "media_type": "text/plain", "data": "This agreement is effective as of January 1, 2026..."},
          "title": "contract.txt",
          "citations": {"enabled": true}
        },
        {"type": "text", "text": "When does the agreement take effect?"}
      ]
    }]
  }'

4. Citations vs. Prompt-Engineered References: Six-Dimension Comparison

Dimension Prompt-Engineered Citations Citations API
Citation reliability Model may fabricate API guarantees real positions
Output token cost Cited text counts as output (expensive) cited_text is not billed
Character-level offsets Must parse yourself Built-in start_char_index
Multi-document support Requires complex prompt scaffolding Automatic document_index
Implementation complexity Custom regex parsing SDK returns structured objects
Recall accuracy Baseline +15% (Anthropic internal benchmarks)

For legal, healthcare, finance, and government customers, the first two dimensions alone are enough to drive the technical decision.


5. Web Search Includes Citations by Default (No Toggle Needed)

If you’re using Anthropic’s native web_search tool (web_search_20260209 is the latest 2026 version with dynamic filtering support), citations are on by default — you don’t need to add citations.enabled to a document block. Every web result automatically carries a web_search_result_location citation type.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=[{"type": "web_search_20260209", "name": "web_search"}],
    messages=[{"role": "user", "content": "What is Anthropic's 2026 valuation? Cite your sources."}]
)
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=[{"type": "web_search_20260209", "name": "web_search"}],
    messages=[{"role": "user", "content": "What is Anthropic's 2026 valuation? Cite your sources."}]
)

Anthropic’s terms of service explicitly state: when displaying API output directly to end users, you must preserve citations pointing to the original sources — this is especially important for teams building search or Q&A products. Don’t hide the URL field for aesthetic reasons.

The cited_text, title, and url fields are also not counted toward token billing.


6. Pitfalls to Watch Out For

Pitfall 1: Citations + streaming requires careful accumulation. In streaming mode, each text delta may carry a citation_delta that needs to be accumulated per block on the client side — you can’t just concatenate strings. The SDK handles this, but if you’re writing a custom client, be careful.

Pitfall 2: Documents should have a title. It technically works without one, but all citation document_title fields will be empty, making frontend display ugly. Always pass a title.

Pitfall 3: In custom content mode, overly granular blocks hurt recall. Splitting a contract into 200 blocks of 8 words each actually makes the model “lose context.” Keep each block at 100–500 words as a semantically complete paragraph — too granular actually reduces accuracy.

Pitfall 4: Citation granularity varies slightly across models. Opus 4.7 / Sonnet 4.6 / Haiku 4.5 all support Citations, but Haiku occasionally misses citations in complex multi-document scenarios. For critical use cases, use Opus 4.7 or Sonnet 4.6.

Pitfall 5: Citations ≠ hallucination-proof. The model can still misinterpret document semantics. Citations only guarantee that “the cited position actually exists in the source” — not that “the interpretation of the citation is correct.” In production, consider a secondary validation step: run a semantic consistency check between cited_text and the model’s answer.

Pitfall 6: Can’t reach api.anthropic.com directly from some regions. Change base_url to https://gw.claudeapi.com to access all Citations capabilities with no additional configuration.


Putting the Custom Content approach together into a minimal runnable legal consultation assistant:

import anthropic
from typing import List

client = anthropic.Anthropic(
    api_key="sk-yourClaudeAPIkey",
    base_url="https://gw.claudeapi.com"
)

# Assume you have a vector database (Pinecone / Weaviate / Chroma)
def retrieve_chunks(query: str, k: int = 5) -> List[dict]:
    """Return top-k relevant legal provisions"""
    # ... your vector retrieval code
    return [
        {"source": "Civil Code", "article": "Article 585", "text": "The parties may agree that when one party breaches..."},
        {"source": "Contract Law", "article": "Article 114", "text": "If the agreed penalty is less than the loss incurred..."},
        # ...
    ]

def answer_with_citations(question: str):
    chunks = retrieve_chunks(question)

    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        system="You are a rigorous legal consultation assistant. Your answers must be based solely on the provided legal provisions. Do not cite any laws not included in the provided documents.",
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "content",
                        "content": [
                            {"type": "text",
                             "text": f"[{c['source']} {c['article']}] {c['text']}"}
                            for c in chunks
                        ]
                    },
                    "title": "Retrieved Legal Provisions",
                    "citations": {"enabled": True}
                },
                {"type": "text", "text": question}
            ]
        }]
    )

    # Structured output with citations
    result = {"answer": "", "citations": []}
    for block in response.content:
        if block.type == "text":
            result["answer"] += block.text
            for cit in (block.citations or []):
                idx = cit.start_block_index
                result["citations"].append({
                    "source": chunks[idx]["source"],
                    "article": chunks[idx]["article"],
                    "cited_text": cit.cited_text,
                })
    return result

print(answer_with_citations("Can the agreed penalty be adjusted if it's too low?"))
import anthropic
from typing import List

client = anthropic.Anthropic(
    api_key="sk-yourClaudeAPIkey",
    base_url="https://gw.claudeapi.com"
)

# Assume you have a vector database (Pinecone / Weaviate / Chroma)
def retrieve_chunks(query: str, k: int = 5) -> List[dict]:
    """Return top-k relevant legal provisions"""
    # ... your vector retrieval code
    return [
        {"source": "Civil Code", "article": "Article 585", "text": "The parties may agree that when one party breaches..."},
        {"source": "Contract Law", "article": "Article 114", "text": "If the agreed penalty is less than the loss incurred..."},
        # ...
    ]

def answer_with_citations(question: str):
    chunks = retrieve_chunks(question)

    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        system="You are a rigorous legal consultation assistant. Your answers must be based solely on the provided legal provisions. Do not cite any laws not included in the provided documents.",
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "content",
                        "content": [
                            {"type": "text",
                             "text": f"[{c['source']} {c['article']}] {c['text']}"}
                            for c in chunks
                        ]
                    },
                    "title": "Retrieved Legal Provisions",
                    "citations": {"enabled": True}
                },
                {"type": "text", "text": question}
            ]
        }]
    )

    # Structured output with citations
    result = {"answer": "", "citations": []}
    for block in response.content:
        if block.type == "text":
            result["answer"] += block.text
            for cit in (block.citations or []):
                idx = cit.start_block_index
                result["citations"].append({
                    "source": chunks[idx]["source"],
                    "article": chunks[idx]["article"],
                    "cited_text": cit.cited_text,
                })
    return result

print(answer_with_citations("Can the agreed penalty be adjusted if it's too low?"))

Example output:

{
  "answer": "Under the law, if the agreed penalty is less than the loss incurred, the parties may request the court or arbitration institution to increase it.",
  "citations": [
    {
      "source": "Contract Law",
      "article": "Article 114",
      "cited_text": "If the agreed penalty is less than the loss incurred, the parties may request the people's court or arbitration institution to increase it..."
    }
  ]
}
{
  "answer": "Under the law, if the agreed penalty is less than the loss incurred, the parties may request the court or arbitration institution to increase it.",
  "citations": [
    {
      "source": "Contract Law",
      "article": "Article 114",
      "cited_text": "If the agreed penalty is less than the loss incurred, the parties may request the people's court or arbitration institution to increase it..."
    }
  ]
}

This output can be fed directly to a frontend for “hover to see source text” or “export audit report” — far more engineering-friendly than the prompt-engineered approach.


8. Why Use ClaudeAPI.com for Citations

Citations is a native Anthropic API feature, which means you must use the Anthropic-native path — the OpenAI-compatible interface (chat/completions) does not support Citations.

Advantages of accessing through claudeapi.com:

Dimension Details
Protocol compatibility 100% compatible with the native Anthropic messages API — Citations, Files, Batch all supported
Endpoint https://gw.claudeapi.com — direct access from anywhere, no extra configuration
Billing Pay per token, on-demand
Models Opus 4.7 / Sonnet 4.6 recommended for Citations, available at standard pricing

See pricing details on the claudeapi.com pricing page. For RAG workloads, Sonnet 4.6 is commonly used — and the fact that Citations’ cited_text doesn’t count toward billing makes a meaningful cost difference.


Summary

Citations is Anthropic’s official answer to the “prevent hallucinations + enable auditability” requirement — more reliable, cheaper, and more engineering-friendly than prompt-engineered citations:

  • Three granularity levels: char_location (plain text), page_location (PDF), content_block_location (custom chunks)
  • Cost-saving detail: cited_text does not count toward output tokens
  • Production recommendations: Use Opus 4.7 / Sonnet 4.6 for critical use cases, always pass a document title, and keep custom blocks as semantically complete paragraphs

Set base_url to https://gw.claudeapi.com to access all Citations capabilities through claudeapi.com — sign up and get started in minutes.


References: Citations API official documentation, Anthropic Citations launch blog, Web Search Tool.

Related Articles