Claude API with Python: The Complete Beginner's Guide — From Setup to Streaming Output

Claude API with Python: The Complete Beginner’s Guide — From Setup to Streaming Output

claude-api-python-tutorial--claude-api-python-tutorial

Never used the Claude API before? This guide starts from scratch — environment setup, sending your first message, multi-turn conversations, and streaming output. Every step includes copy-paste-ready code that runs out of the box. No proxy or VPN needed.

What You’ll Learn

✅ Installing and configuring the Anthropic Python SDK
✅ Making your first API request — with real terminal screenshots
✅ Setting a System Prompt to give Claude a custom persona
✅ Building multi-turn conversations with context (3 live demo rounds)
✅ Streaming output for a typewriter-style UX
✅ FastAPI + SSE integration — full frontend & backend code included
✅ Production-grade error handling and best practices

All code in this guide has been tested locally. Screenshots show actual terminal output. Everything is ready to copy and run.

Prerequisites

Before you start, make sure you have Python ≥ 3.8 and pip available.

# Check your Python version
python --version

# Check pip is available
pip --version

# Check your Python version
python --version

# Check pip is available
pip --version

ou’ll also need a Claude API key. Sign up atClaudeAPI.com — access works from anywhere without a VPN, and you’ll get free credits on sign-up to run your first request within 5 minutes.

Step 1: Install the Anthropic SDK

pip install anthropic

pip install anthropic

Verify the installation:

python -c "import anthropic; print(anthropic.__version__)"
# Expected output: 0.40.0

python -c "import anthropic; print(anthropic.__version__)"
# Expected output: 0.40.0

**Slow download speeds?**Try using an alternative PyPI mirror:

pip install anthropic -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install anthropic -i https://pypi.tuna.tsinghua.edu.cn/simple

Step 2: Send Your First Message

Create a new file called hello_claude.py， and add the following code:

import os
import anthropic

import os
import anthropic
# Clear any proxy environment variables to avoid SSL conflicts
os.environ['HTTP_PROXY']  = ''
os.environ['HTTPS_PROXY'] = ''
os.environ['ALL_PROXY']   = ''
os.environ['http_proxy']  = ''
os.environ['https_proxy'] = ''
os.environ['all_proxy']   = ''
client = anthropic.Anthropic(
    api_key="your-api-key-here",          # Replace with your API key
    base_url="https://api.claudeapi.com", # ClaudeAPI relay endpoint — no VPN required
    timeout=60.0,
)
message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain recursion in one sentence."}
    ]
)
print(message.content[0].text)

import os
import anthropic

import os
import anthropic
# Clear any proxy environment variables to avoid SSL conflicts
os.environ['HTTP_PROXY']  = ''
os.environ['HTTPS_PROXY'] = ''
os.environ['ALL_PROXY']   = ''
os.environ['http_proxy']  = ''
os.environ['https_proxy'] = ''
os.environ['all_proxy']   = ''
client = anthropic.Anthropic(
    api_key="your-api-key-here",          # Replace with your API key
    base_url="https://api.claudeapi.com", # ClaudeAPI relay endpoint — no VPN required
    timeout=60.0,
)
message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain recursion in one sentence."}
    ]
)
print(message.content[0].text)

Run it：

python hello_claude.py

python hello_claude.py

Heads up: max_tokens is required in the Claude API — this is the biggest difference from OpenAI. Omitting it will throw a 422 error. 1024 is a safe default for most use cases.

Step 3: Set an AI Role with System Prompts

The system parameter controls Claude’s persona and behavior. Unlike OpenAI, Claude treats system as a top-level parameter — it’s not passed inside the messages array.

message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=2048,
    system="You are a senior Python engineer. Keep your code clean and concise. Always respond directly — no filler — and you must include a runnable code example.",
    messages=[
        {"role": "user", "content": "How do I read a large file without running out of memory?"}
    ]
)
print(message.content[0].text)

message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=2048,
    system="You are a senior Python engineer. Keep your code clean and concise. Always respond directly — no filler — and you must include a runnable code example.",
    messages=[
        {"role": "user", "content": "How do I read a large file without running out of memory?"}
    ]
)
print(message.content[0].text)

System Prompt Writing Tips

Technique	Example	Effect
Define a role	`You are a senior {domain} engineer`	More focused, domain-appropriate responses
Specify output format	`Output code only, no explanations`	Less filler, precise output
Constrain response length	`Be concise, 100 words max`	Control token usage and cost
Lock the language/framework	`Use Python 3.10+ syntax only`	Avoids outdated patterns
Ban filler phrases	`No apologies, no "certainly" or "of course"`	Cuts the fluff, gets straight to the point

Step 4: Build Multi-Turn Conversations

The Claude API is stateless — there’s no built-in memory. For multi-turn conversations, you need to manually pass the full message history on every request. The only rule: user and assistant turns must strictly alternate.

history = []
def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        system="You are a professional coding assistant.",
        messages=history,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply
print("=== Round 1 ===")
print(chat("Write me a function to calculate the Fibonacci sequence."))
print("\n=== Round 2 ===")
print(chat("Now update it to use memoization to avoid redundant calculations."))
print("\n=== Round 3 ===")
print(chat("Add input validation — negative numbers should raise an exception."))

history = []
def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        system="You are a professional coding assistant.",
        messages=history,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply
print("=== Round 1 ===")
print(chat("Write me a function to calculate the Fibonacci sequence."))
print("\n=== Round 2 ===")
print(chat("Now update it to use memoization to avoid redundant calculations."))
print("\n=== Round 3 ===")
print(chat("Add input validation — negative numbers should raise an exception."))

Managing History Length (Preventing Token Limit Errors)

Token usage grows linearly with conversation length. Add a simple cap to keep things under control:

MAX_HISTORY = 20  # Keep only the most recent 20 messages
def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    trimmed = history[-MAX_HISTORY:] if len(history) > MAX_HISTORY else history
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        messages=trimmed,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

MAX_HISTORY = 20  # Keep only the most recent 20 messages
def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    trimmed = history[-MAX_HISTORY:] if len(history) > MAX_HISTORY else history
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        messages=trimmed,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

Step 5: Streaming Output

By default, the API waits for the full response before returning anything. Streaming lets content render as it’s generated — giving users that familiar typewriter effect and a much snappier feel.

Basic Streaming

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print("\n\nStream complete.")

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print("\n\nStream complete.")

Streaming + Token Usage Tracking

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()
print(f"\n\nToken usage — Input: {final.usage.input_tokens}, Output: {final.usage.output_tokens}")

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()
print(f"\n\nToken usage — Input: {final.usage.input_tokens}, Output: {final.usage.output_tokens}")

Step 6: FastAPI + SSE Streaming Endpoin

t The typewriter effect on the frontend, SSE push on the backend — this is the most common production pattern for real-time AI responses.

Install dependencies:

pip install fastapi uvicorn

pip install fastapi uvicorn

Backend（save as main.py）：

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
import anthropic, os

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"], allow_methods=["*"], allow_headers=["*"]
)

client = anthropic.Anthropic(
    api_key="your-api-key-here",
    base_url="https://api.claudeapi.com",
    timeout=60.0,
)

@app.get("/chat")
async def chat_stream(q: str, system: str = "You are a professional assistant."):
    def generate():
        with client.messages.stream(
            model="claude-haiku-4-5-20251001",
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": q}],
        ) as stream:
            for text in stream.text_stream:
                yield f"data: {text}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={"X-Accel-Buffering": "no"},
    )

# Start the server:
# uvicorn main:app --reload --port 8000

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
import anthropic, os

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"], allow_methods=["*"], allow_headers=["*"]
)

client = anthropic.Anthropic(
    api_key="your-api-key-here",
    base_url="https://api.claudeapi.com",
    timeout=60.0,
)

@app.get("/chat")
async def chat_stream(q: str, system: str = "You are a professional assistant."):
    def generate():
        with client.messages.stream(
            model="claude-haiku-4-5-20251001",
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": q}],
        ) as stream:
            for text in stream.text_stream:
                yield f"data: {text}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={"X-Accel-Buffering": "no"},
    )

# Start the server:
# uvicorn main:app --reload --port 8000

Frontend — consuming the stream （JavaScript）：

const source = new EventSource(
    `/chat?q=${encodeURIComponent('Implement bubble sort in Python')}`
);
source.onmessage = (event) => {
    if (event.data === '[DONE]') { source.close(); return; }
    document.getElementById('output').textContent += event.data;
};
source.onerror = () => source.close();

const source = new EventSource(
    `/chat?q=${encodeURIComponent('Implement bubble sort in Python')}`
);
source.onmessage = (event) => {
    if (event.data === '[DONE]') { source.close(); return; }
    document.getElementById('output').textContent += event.data;
};
source.onerror = () => source.close();

Key config：X-Accel-Buffering: no prevents reverse proxies like Nginx from buffering SSE responses. Without it, chunks pile up and get sent all at once — killing the typewriter effect.

Step 7: Production-Ready Error Handling

Three things you must do before going live: environment variable management (no hardcoded keys), automatic retries (built into the SDK), and typed error handling.

Standard Production Setup

import anthropic, os
client = anthropic.Anthropic(
    api_key=os.environ["ANTHROPIC_API_KEY"],
    base_url=os.environ.get("ANTHROPIC_BASE_URL", "https://api.claudeapi.com"),
    max_retries=3,    # Auto-handles 429s and 5xx errors with exponential backoff
    timeout=60.0,
)
def chat(prompt: str, system: str = "") -> str:
    kwargs = {
        "model": "claude-haiku-4-5-20251001",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": prompt}],
    }
    if system:
        kwargs["system"] = system
    try:
        response = client.messages.create(**kwargs)
        return response.content[0].text
    except anthropic.AuthenticationError:
        raise ValueError("Invalid API key — check your environment variables")
    except anthropic.RateLimitError as e:
        raise RuntimeError(f"Rate limit exceeded: {e}") from e
    except anthropic.BadRequestError as e:
        raise ValueError(f"Invalid request parameters: {e}") from e
    except anthropic.APIStatusError as e:
        raise RuntimeError(f"API error {e.status_code}: {e.message}") from e

import anthropic, os
client = anthropic.Anthropic(
    api_key=os.environ["ANTHROPIC_API_KEY"],
    base_url=os.environ.get("ANTHROPIC_BASE_URL", "https://api.claudeapi.com"),
    max_retries=3,    # Auto-handles 429s and 5xx errors with exponential backoff
    timeout=60.0,
)
def chat(prompt: str, system: str = "") -> str:
    kwargs = {
        "model": "claude-haiku-4-5-20251001",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": prompt}],
    }
    if system:
        kwargs["system"] = system
    try:
        response = client.messages.create(**kwargs)
        return response.content[0].text
    except anthropic.AuthenticationError:
        raise ValueError("Invalid API key — check your environment variables")
    except anthropic.RateLimitError as e:
        raise RuntimeError(f"Rate limit exceeded: {e}") from e
    except anthropic.BadRequestError as e:
        raise ValueError(f"Invalid request parameters: {e}") from e
    except anthropic.APIStatusError as e:
        raise RuntimeError(f"API error {e.status_code}: {e.message}") from e

.env file — add to .gitignore，never commit this：

ANTHROPIC_API_KEY=your-api-key-here
ANTHROPIC_BASE_URL=https://api.claudeapi.com

ANTHROPIC_API_KEY=your-api-key-here
ANTHROPIC_BASE_URL=https://api.claudeapi.com

Error Code Reference

Status Code	Meaning	How to Handle
401	Invalid API key	Verify the key in your environment variables
400/422	Malformed request	check `max_tokens`required、`messages` format
403	No access to this model	Confirm your account’s permission tier
429	Rate limit exceeded	SDK retries automatically; or upgrade your quota plan
529	Service overloaded	Wait briefly — SDK will retry automatically

Model Selection Guide

Model ID	Best For	Speed	Cost
`claude-haiku-4-5-20251001`	Classification, translation, summarization, Q&A, bulk generation	Fastest	Lowest
`claude-sonnet-4-6`	Code generation, writing, analysis, everyday dev tasks	Fast	Medium
`claude-opus-4-6`	Complex reasoning, hard tasks, long-document understanding	Slower	Highest

Tip：Use claude-haiku-4-5-20251001 across the board during prototyping to keep costs low. Switch to Sonnet or Opus only after validating your use case.

FAQ

Q：Is max_tokens required? Yes — unlike OpenAI where it’s optional, Claude API returns a 422 error if you omit it. When in doubt, use 1024 for short tasks and 4096 for longer ones.

Q: Why can’t I put system inside the messages array?

That’s how the native Anthropic SDK is designed — system is a top-level parameter, separate from messages. If you’re using the OpenAI-compatible endpoint ( /v1/chat/completions ), you can keep the role: system syntax. ClaudeAPI.com supports both approaches.

Q: Token count keeps growing in multi-turn conversations — what should I do?

Cap your history length by keeping only the last N turns (see the MAX_HISTORY truncation pattern in Step 4). Alternatively, periodically ask Claude to summarize the conversation history, then replace the raw history with that summary.

Q: What if a streaming response gets interrupted mid-way?

Catch anthropic.APIConnectionError in your generator and decide whether to retry or return whatever partial content was already received. For production, implement reconnection logic with dead-line handling.

Q: Can I use the OpenAI SDK to call Claude API directly?

Yes. ClaudeAPI.com is fully compatible with the OpenAI Chat format — just swap base_url and api_key, change model to claude-haiku-4-5-20251001, and the rest of your code stays untouched.

Summary

Step	Key Takeaway
Install	`pip install anthropic`
Basic call	`client.messages.create()`，`max_tokens` is required
System role	Top-level `system` parameter，don’t stuff it into `messages`
Multi-turn chat	anually maintain the `history` list，strictly alternate user/assistant turns
Streaming	`client.messages.stream()`for typewriter-style output
Web integration	FastAPI + SSE— full frontend and backend code included
Production hardening	Environment variables + `max_retries` + typed error handling

Run all the code in this guide without any proxy setup — just point base_url to ClaudeAPI.com and you’re good to go:

client = anthropic.Anthropic(
    api_key="your-claudeapi-key",
    base_url="https://api.claudeapi.com",  # Change only this line — everything else stays the same
)

client = anthropic.Anthropic(
    api_key="your-claudeapi-key",
    base_url="https://api.claudeapi.com",  # Change only this line — everything else stays the same
)

ClaudeAPI.com supports both the native Anthropic format and the OpenAI-compatible format. All Claude models are available, pay-as-you-go with no subscription required.

Get started at claudeapi.com — make your first API call in under 5 minutes.

Claude API with Python: The Complete Beginner's Guide — From Setup to Streaming Output