This guide covers what you actually need to get started with the Claude API. It assumes you have Python 3.9+ or Node.js 18+ installed and can install packages. It does not assume prior experience with LLM APIs.
Prerequisites
- An Anthropic account with API access — create one at
console.anthropic.com - An API key from the Anthropic console
- Python 3.9+ or Node.js 18+
Installation
Python:
pip install anthropic
TypeScript / Node.js:
npm install @anthropic-ai/sdk
Authentication
Keep your API key in an environment variable. Never hardcode it, never commit it to source control.
export ANTHROPIC_API_KEY="sk-ant-..."
Or use a .env file with python-dotenv (Python) or dotenv (Node.js).
First Call: Basic Completion
Python:
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain what a transformer model is in two paragraphs."}
]
)
print(message.content[0].text)
TypeScript:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic(); // reads ANTHROPIC_API_KEY from environment
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [
{ role: "user", content: "Explain what a transformer model is in two paragraphs." },
],
});
console.log(message.content[0].text);
The response object contains message.content (array of content blocks), message.usage (input and output token counts), and message.stop_reason (why generation stopped).
System Prompts
The system parameter sets context for the conversation. Use it to define the model's role, constraints, and output format.
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a technical writer. Respond concisely and use code examples where relevant. Do not use marketing language.",
messages=[
{"role": "user", "content": "What is a vector embedding?"}
]
)
Multi-Turn Conversations
The messages array holds the conversation history. The API is stateless — each call must include the full history you want the model to consider.
messages = [
{"role": "user", "content": "What is RAG in the context of LLMs?"},
]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages
)
# Add the reply to history and continue
messages.append({"role": "assistant", "content": response.content[0].text})
messages.append({"role": "user", "content": "What are the main failure modes?"})
response2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages
)
Streaming Responses
For interactive applications, streaming delivers tokens as they are generated.
Python:
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain attention mechanisms."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Getting Structured JSON Output
import json
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a data extraction assistant. Always respond with valid JSON only. No explanation, no markdown fencing.",
messages=[
{
"role": "user",
"content": "Extract the key claims from this text as JSON with keys: claims (array), confidence (high/medium/low).\n\nText: The model achieves 94.2% accuracy on MMLU, outperforming the previous generation by 8 percentage points."
}
]
)
data = json.loads(message.content[0].text)
print(data)
Error Handling
try:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
except anthropic.APIConnectionError as e:
print(f"Connection failed: {e}")
except anthropic.RateLimitError as e:
print(f"Rate limit hit — implement exponential backoff")
except anthropic.APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
For production, implement retry logic with exponential backoff for rate limit errors (429) and transient server errors (5xx).
Model Selection
| Model | Context | Best for |
|---|---|---|
claude-opus-4-7 |
200K tokens | Complex reasoning, long-form analysis |
claude-sonnet-4-6 |
200K tokens | Balanced capability and speed |
claude-haiku-4-5-20251001 |
200K tokens | Fast, cost-efficient tasks |
Use Sonnet for most production workloads. Use Haiku where latency and cost matter more than maximum capability. Use Opus for tasks requiring the highest reasoning capability.
Next Steps
Once you have basic completions working: explore tool use (function calling) to allow Claude to call functions in your code; prompt caching to reduce latency and cost on repeated calls; the Batch API for processing large volumes asynchronously at reduced cost.
The official documentation at docs.anthropic.com covers all of these in detail. This guide is a starting point; the documentation is the authoritative reference.