Claude API with Python: The Complete Beginner’s Guide — From Setup to Streaming Output

Never used the Claude API before? This guide starts from scratch — environment setup, sending your first message, multi-turn conversations, and streaming output. Every step includes copy-paste-ready code that runs out of the box. No proxy or VPN needed.
What You’ll Learn
- ✅ Installing and configuring the Anthropic Python SDK
- ✅ Making your first API request — with real terminal screenshots
- ✅ Setting a System Prompt to give Claude a custom persona
- ✅ Building multi-turn conversations with context (3 live demo rounds)
- ✅ Streaming output for a typewriter-style UX
- ✅ FastAPI + SSE integration — full frontend & backend code included
- ✅ Production-grade error handling and best practices
All code in this guide has been tested locally. Screenshots show actual terminal output. Everything is ready to copy and run.
目录
- 环境准备
- 第一步:安装 Anthropic SDK
- 第二步:发出第一条消息
- 第三步:System Prompt 设定 AI 角色
- 第四步:实现多轮对话
- 第五步:流式输出 Streaming
- 第六步:FastAPI 集成 SSE 推流
- 第七步:生产环境错误处理
- 模型选型参考
- 常见问题 FAQ
Prerequisites
Before you start, make sure you have Python ≥ 3.8 and pip available.
# Check your Python version
python --version
# Check pip is available
pip --version
# Check your Python version
python --version
# Check pip is available
pip --version
ou’ll also need a Claude API key. Sign up atClaudeAPI.com — access works from anywhere without a VPN, and you’ll get free credits on sign-up to run your first request within 5 minutes.
Step 1: Install the Anthropic SDK
pip install anthropic
pip install anthropic
Verify the installation:
python -c "import anthropic; print(anthropic.__version__)"
# Expected output: 0.40.0
python -c "import anthropic; print(anthropic.__version__)"
# Expected output: 0.40.0
**Slow download speeds?**Try using an alternative PyPI mirror:
pip install anthropic -i https://pypi.tuna.tsinghua.edu.cn/simplepip install anthropic -i https://pypi.tuna.tsinghua.edu.cn/simple
Step 2: Send Your First Message
Create a new file called hello_claude.py, and add the following code:
import os
import anthropic
import os
import anthropic
# Clear any proxy environment variables to avoid SSL conflicts
os.environ['HTTP_PROXY'] = ''
os.environ['HTTPS_PROXY'] = ''
os.environ['ALL_PROXY'] = ''
os.environ['http_proxy'] = ''
os.environ['https_proxy'] = ''
os.environ['all_proxy'] = ''
client = anthropic.Anthropic(
api_key="your-api-key-here", # Replace with your API key
base_url="https://api.claudeapi.com", # ClaudeAPI relay endpoint — no VPN required
timeout=60.0,
)
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain recursion in one sentence."}
]
)
print(message.content[0].text)
import os
import anthropic
import os
import anthropic
# Clear any proxy environment variables to avoid SSL conflicts
os.environ['HTTP_PROXY'] = ''
os.environ['HTTPS_PROXY'] = ''
os.environ['ALL_PROXY'] = ''
os.environ['http_proxy'] = ''
os.environ['https_proxy'] = ''
os.environ['all_proxy'] = ''
client = anthropic.Anthropic(
api_key="your-api-key-here", # Replace with your API key
base_url="https://api.claudeapi.com", # ClaudeAPI relay endpoint — no VPN required
timeout=60.0,
)
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain recursion in one sentence."}
]
)
print(message.content[0].text)
Run it:
python hello_claude.py
python hello_claude.py
Heads up:
max_tokensis required in the Claude API — this is the biggest difference from OpenAI. Omitting it will throw a 422 error. 1024 is a safe default for most use cases.
Step 3: Set an AI Role with System Prompts
The system parameter controls Claude’s persona and behavior. Unlike OpenAI, Claude treats system as a top-level parameter — it’s not passed inside the messages array.
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
system="You are a senior Python engineer. Keep your code clean and concise. Always respond directly — no filler — and you must include a runnable code example.",
messages=[
{"role": "user", "content": "How do I read a large file without running out of memory?"}
]
)
print(message.content[0].text)
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
system="You are a senior Python engineer. Keep your code clean and concise. Always respond directly — no filler — and you must include a runnable code example.",
messages=[
{"role": "user", "content": "How do I read a large file without running out of memory?"}
]
)
print(message.content[0].text)
System Prompt Writing Tips
| Technique | Example | Effect |
|---|---|---|
| Define a role | You are a senior {domain} engineer |
More focused, domain-appropriate responses |
| Specify output format | Output code only, no explanations |
Less filler, precise output |
| Constrain response length | Be concise, 100 words max |
Control token usage and cost |
| Lock the language/framework | Use Python 3.10+ syntax only |
Avoids outdated patterns |
| Ban filler phrases | No apologies, no "certainly" or "of course" |
Cuts the fluff, gets straight to the point |
Step 4: Build Multi-Turn Conversations
The Claude API is stateless — there’s no built-in memory. For multi-turn conversations, you need to manually pass the full message history on every request. The only rule: user and assistant turns must strictly alternate.
history = []
def chat(user_input: str) -> str:
history.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
system="You are a professional coding assistant.",
messages=history,
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
print("=== Round 1 ===")
print(chat("Write me a function to calculate the Fibonacci sequence."))
print("\n=== Round 2 ===")
print(chat("Now update it to use memoization to avoid redundant calculations."))
print("\n=== Round 3 ===")
print(chat("Add input validation — negative numbers should raise an exception."))
history = []
def chat(user_input: str) -> str:
history.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
system="You are a professional coding assistant.",
messages=history,
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
print("=== Round 1 ===")
print(chat("Write me a function to calculate the Fibonacci sequence."))
print("\n=== Round 2 ===")
print(chat("Now update it to use memoization to avoid redundant calculations."))
print("\n=== Round 3 ===")
print(chat("Add input validation — negative numbers should raise an exception."))
Managing History Length (Preventing Token Limit Errors)
Token usage grows linearly with conversation length. Add a simple cap to keep things under control:
MAX_HISTORY = 20 # Keep only the most recent 20 messages
def chat(user_input: str) -> str:
history.append({"role": "user", "content": user_input})
trimmed = history[-MAX_HISTORY:] if len(history) > MAX_HISTORY else history
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
messages=trimmed,
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
MAX_HISTORY = 20 # Keep only the most recent 20 messages
def chat(user_input: str) -> str:
history.append({"role": "user", "content": user_input})
trimmed = history[-MAX_HISTORY:] if len(history) > MAX_HISTORY else history
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
messages=trimmed,
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
Step 5: Streaming Output
By default, the API waits for the full response before returning anything. Streaming lets content render as it’s generated — giving users that familiar typewriter effect and a much snappier feel.
Basic Streaming
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[
{"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}
],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print("\n\nStream complete.")
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[
{"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}
],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print("\n\nStream complete.")
Streaming + Token Usage Tracking
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[{"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message()
print(f"\n\nToken usage — Input: {final.usage.input_tokens}, Output: {final.usage.output_tokens}")
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[{"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message()
print(f"\n\nToken usage — Input: {final.usage.input_tokens}, Output: {final.usage.output_tokens}")
Step 6: FastAPI + SSE Streaming Endpoin
t The typewriter effect on the frontend, SSE push on the backend — this is the most common production pattern for real-time AI responses.
Install dependencies:
pip install fastapi uvicorn
pip install fastapi uvicorn
Backend(save as main.py):
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
import anthropic, os
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], allow_methods=["*"], allow_headers=["*"]
)
client = anthropic.Anthropic(
api_key="your-api-key-here",
base_url="https://api.claudeapi.com",
timeout=60.0,
)
@app.get("/chat")
async def chat_stream(q: str, system: str = "You are a professional assistant."):
def generate():
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
system=system,
messages=[{"role": "user", "content": q}],
) as stream:
for text in stream.text_stream:
yield f"data: {text}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={"X-Accel-Buffering": "no"},
)
# Start the server:
# uvicorn main:app --reload --port 8000
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
import anthropic, os
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], allow_methods=["*"], allow_headers=["*"]
)
client = anthropic.Anthropic(
api_key="your-api-key-here",
base_url="https://api.claudeapi.com",
timeout=60.0,
)
@app.get("/chat")
async def chat_stream(q: str, system: str = "You are a professional assistant."):
def generate():
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
system=system,
messages=[{"role": "user", "content": q}],
) as stream:
for text in stream.text_stream:
yield f"data: {text}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={"X-Accel-Buffering": "no"},
)
# Start the server:
# uvicorn main:app --reload --port 8000
Frontend — consuming the stream (JavaScript):
const source = new EventSource(
`/chat?q=${encodeURIComponent('Implement bubble sort in Python')}`
);
source.onmessage = (event) => {
if (event.data === '[DONE]') { source.close(); return; }
document.getElementById('output').textContent += event.data;
};
source.onerror = () => source.close();
const source = new EventSource(
`/chat?q=${encodeURIComponent('Implement bubble sort in Python')}`
);
source.onmessage = (event) => {
if (event.data === '[DONE]') { source.close(); return; }
document.getElementById('output').textContent += event.data;
};
source.onerror = () => source.close();
Key config:
X-Accel-Buffering: noprevents reverse proxies like Nginx from buffering SSE responses. Without it, chunks pile up and get sent all at once — killing the typewriter effect.
Step 7: Production-Ready Error Handling
Three things you must do before going live: environment variable management (no hardcoded keys), automatic retries (built into the SDK), and typed error handling.
Standard Production Setup
import anthropic, os
client = anthropic.Anthropic(
api_key=os.environ["ANTHROPIC_API_KEY"],
base_url=os.environ.get("ANTHROPIC_BASE_URL", "https://api.claudeapi.com"),
max_retries=3, # Auto-handles 429s and 5xx errors with exponential backoff
timeout=60.0,
)
def chat(prompt: str, system: str = "") -> str:
kwargs = {
"model": "claude-haiku-4-5-20251001",
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}],
}
if system:
kwargs["system"] = system
try:
response = client.messages.create(**kwargs)
return response.content[0].text
except anthropic.AuthenticationError:
raise ValueError("Invalid API key — check your environment variables")
except anthropic.RateLimitError as e:
raise RuntimeError(f"Rate limit exceeded: {e}") from e
except anthropic.BadRequestError as e:
raise ValueError(f"Invalid request parameters: {e}") from e
except anthropic.APIStatusError as e:
raise RuntimeError(f"API error {e.status_code}: {e.message}") from e
import anthropic, os
client = anthropic.Anthropic(
api_key=os.environ["ANTHROPIC_API_KEY"],
base_url=os.environ.get("ANTHROPIC_BASE_URL", "https://api.claudeapi.com"),
max_retries=3, # Auto-handles 429s and 5xx errors with exponential backoff
timeout=60.0,
)
def chat(prompt: str, system: str = "") -> str:
kwargs = {
"model": "claude-haiku-4-5-20251001",
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}],
}
if system:
kwargs["system"] = system
try:
response = client.messages.create(**kwargs)
return response.content[0].text
except anthropic.AuthenticationError:
raise ValueError("Invalid API key — check your environment variables")
except anthropic.RateLimitError as e:
raise RuntimeError(f"Rate limit exceeded: {e}") from e
except anthropic.BadRequestError as e:
raise ValueError(f"Invalid request parameters: {e}") from e
except anthropic.APIStatusError as e:
raise RuntimeError(f"API error {e.status_code}: {e.message}") from e
.env file — add to .gitignore,never commit this:
ANTHROPIC_API_KEY=your-api-key-here
ANTHROPIC_BASE_URL=https://api.claudeapi.com
ANTHROPIC_API_KEY=your-api-key-here
ANTHROPIC_BASE_URL=https://api.claudeapi.com
Error Code Reference
| Status Code | Meaning | How to Handle |
|---|---|---|
| 401 | Invalid API key | Verify the key in your environment variables |
| 400/422 | Malformed request | check max_tokensrequired、messages format |
| 403 | No access to this model | Confirm your account’s permission tier |
| 429 | Rate limit exceeded | SDK retries automatically; or upgrade your quota plan |
| 529 | Service overloaded | Wait briefly — SDK will retry automatically |
Model Selection Guide
| Model ID | Best For | Speed | Cost |
|---|---|---|---|
claude-haiku-4-5-20251001 |
Classification, translation, summarization, Q&A, bulk generation | Fastest | Lowest |
claude-sonnet-4-6 |
Code generation, writing, analysis, everyday dev tasks | Fast | Medium |
claude-opus-4-6 |
Complex reasoning, hard tasks, long-document understanding | Slower | Highest |
Tip:Use
claude-haiku-4-5-20251001across the board during prototyping to keep costs low. Switch to Sonnet or Opus only after validating your use case.
FAQ
Q:Is max_tokens required?
Yes — unlike OpenAI where it’s optional, Claude API returns a 422 error if you omit it. When in doubt, use 1024 for short tasks and 4096 for longer ones.
Q: Why can’t I put system inside the messages array?
That’s how the native Anthropic SDK is designed — system is a top-level parameter, separate from messages. If you’re using the OpenAI-compatible endpoint (
/v1/chat/completions
), you can keep the role: system syntax. ClaudeAPI.com supports both approaches.
Q: Token count keeps growing in multi-turn conversations — what should I do?
Cap your history length by keeping only the last N turns (see the MAX_HISTORY truncation pattern in Step 4). Alternatively, periodically ask Claude to summarize the conversation history, then replace the raw history with that summary.
Q: What if a streaming response gets interrupted mid-way?
Catch anthropic.APIConnectionError in your generator and decide whether to retry or return whatever partial content was already received. For production, implement reconnection logic with dead-line handling.
Q: Can I use the OpenAI SDK to call Claude API directly?
Yes. ClaudeAPI.com is fully compatible with the OpenAI Chat format — just swap base_url and api_key, change model to claude-haiku-4-5-20251001, and the rest of your code stays untouched.
Summary
| Step | Key Takeaway |
|---|---|
| Install | pip install anthropic |
| Basic call | client.messages.create(),max_tokens is required |
| System role | Top-level system parameter,don’t stuff it into messages |
| Multi-turn chat | anually maintain the history list,strictly alternate user/assistant turns |
| Streaming | client.messages.stream()for typewriter-style output |
| Web integration | FastAPI + SSE— full frontend and backend code included |
| Production hardening | Environment variables + max_retries + typed error handling |
Run all the code in this guide without any proxy setup — just point base_url to ClaudeAPI.com and you’re good to go:
client = anthropic.Anthropic(
api_key="your-claudeapi-key",
base_url="https://api.claudeapi.com", # Change only this line — everything else stays the same
)
client = anthropic.Anthropic(
api_key="your-claudeapi-key",
base_url="https://api.claudeapi.com", # Change only this line — everything else stays the same
)
ClaudeAPI.com supports both the native Anthropic format and the OpenAI-compatible format. All Claude models are available, pay-as-you-go with no subscription required.
Get started at claudeapi.com — make your first API call in under 5 minutes.
相关阅读
Written and maintained by the ClaudeAPI.com team. Last updated: April 2026.



