Skip to content
Chimera readability score 56 out of 100, Graduate reading level.

In this article, you'll learn how to use the Claude API in Python, make your first request, and handle responses with the official SDK.

Introduction

You want to add Claude to a Python application. Creating an account and making your first API call is straightforward. The official documentation can get you from zero to a working request in a few minutes. The next questions are usually more practical:

  • What does the response object contain?
  • How do you stream responses so users can see output as it's generated?
  • How do you structure prompts and handle responses in a production application?

The Claude Python SDK takes care of much of the underlying API interaction. It provides typed response objects, built-in retry handling, and a simple interface for working with the Messages API.

This article walks you through setup, your first API call, reading the response, system prompts, and streaming. By the end, you'll have a working foundation.

Prerequisites and Installation

You need Python 3.9 or higher, a free Claude Console account, and an API key from the Console's Settings > API Keys page. You can add $5 in credits and work through everything in this article.

With those in place, install the SDK:

pip install anthropic

Never hardcode your API key in source files. Store it as an environment variable instead:

export ANTHROPIC_API_KEY="YOUR-API-KEY-HERE"

Or add it to a .env

file at the project root if you're using python-dotenv. The SDK reads the ANTHROPIC_API_KEY

from your environment, so you don't need to pass it anywhere in your code.

Making Your First API Call

The entry point for every interaction is client.messages.create()

. Let's ask Claude to explain what a context window is, something you'll actually need to understand as you use the API.

You pass three things: the model ID, a max_tokens

limit, and a messages

list. The messages list is always a list of dicts, each with a "role"

and "content"

key.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(

model="claude-sonnet-5",

max_tokens=256,

messages=[

{

"role": "user",

"content": "In one sentence, what is a context window?"

}

]

)

print(response.content[0].text)

The model

field takes the exact model ID string. max_tokens

is a hard ceiling on how many output tokens Claude will produce; the response stops there even if the thought isn't complete, so set it high enough for open-ended requests. The messages

list must always start with a "user"

turn.

Sample output:

A context window is the maximum amount of text (measured in tokens) that a language

model can process and consider at one time, encompassing both your input and its output.

Understanding the Response Object

The response from messages.create()

is a typed Message

object. It's worth inspecting the full structure before building anything on top of it.

Replace the print line in the previous example with:

print(response)

Running that gives you the full object:

Message(

id='msg_01XFDUDYJgAACzvnptvVoYEL',

type='message',

role='assistant',

content=[TextBlock(text='A context window is...', type='text')],

model='claude-sonnet-5',

stop_reason='end_turn',

stop_sequence=None,

usage=Usage(input_tokens=19, output_tokens=42)

)

A few fields here matter more than they first appear. stop_reason

tells you why Claude stopped generating. end_turn

means Claude finished on its own terms. If you see max_tokens

, the response was cut off by your limit, and you may need to raise it or rethink the prompt.

The usage

field tracks both input and output tokens for the request. This is how Anthropic calculates billing, and it's also how you detect when a prompt is creeping too close to the model's context limit. content

is a list — in standard text responses it always has one item, a TextBlock

— so response.content[0].text

is the idiomatic way to pull the text out.

Using System Prompts

A system prompt lets you give Claude a persistent role, set constraints, or provide context that should apply across the entire conversation. You pass it as a top-level system

parameter — separate from the messages list, not as a message itself.

Here we configure Claude to act as a code reviewer who only responds in Python and avoids general explanations:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(

model="claude-sonnet-5",

max_tokens=512,

system=(

"You are a Python code reviewer. "

"Respond only with corrected or improved Python code. "

"Do not explain changes unless the user explicitly asks."

),

messages=[

{

"role": "user",

"content": (

"def get_user(id):\n"

" db = connect()\n"

" return db.query('SELECT * FROM users WHERE id=' + id)"

)

}

]

)

print(response.content[0].text)

The system prompt sits above the conversation in Claude's context. It carries the same authority throughout all turns, so role instructions, formatting rules, and domain constraints you set here persist without you repeating them in every message.

Streaming Responses

For requests where Claude may take a few seconds to respond, streaming lets you display text as it arrives instead of waiting for the full response. The SDK exposes this through client.messages.stream()

, used as a context manager.

The text_stream

iterator yields individual text chunks in real time. Each chunk is a string fragment, not a full sentence. You pass end=""

and flush=True

to print()

so output appears continuously rather than buffering:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(

model="claude-sonnet-5",

max_tokens=512,

messages=[

{

"role": "user",

"content": "Walk me through what happens when a Python list grows beyond its initial capacity."

}

]

) as stream:

for chunk in stream.text_stream:

print(chunk, end="", flush=True)

print() # newline after stream ends

The context manager ensures the HTTP connection is closed cleanly when the block exits, even if an exception is raised mid-stream. If you need the complete Message

object after streaming — including token usage counts — call stream.get_final_message()

before the block closes.

Sample output:

Python lists are dynamic arrays. When you append an element and the list has no

room, Python allocates a new, larger block of memory — typically 1.125x the current

size — copies all existing elements into it, and releases the old block. This

operation is O(n) in the worst case, but because it happens infrequently relative to

the number of appends, the amortized cost per append stays O(1). You can pre-allocate

capacity with a list comprehension or by passing an iterable to the list constructor

if you know the final size upfront.

Next Steps

You now have the core building blocks: requests, structured responses, system prompts, and streaming.

Next, you can learn about error handling, token usage, and multi-turn conversations. Because the API is stateless, you need to send the conversation history with each request. The SDK documentation shows the recommended approach.

The API reference also includes features like structured outputs and tool use. Happy exploring!

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

Sentinel — Human

Confidence

This text exhibits the clear structure and idiomatic technical depth of a human-written instructional guide, focusing on practical API implementation rather than abstract theory.

Signals Detected
low severity: Sentence length variance is erratic (short setup sentences mixed with longer explanations). The text shifts smoothly between concise instruction and detailed conceptual explanation.
low severity: The flow is highly logical, moving seamlessly from prerequisites to practical code examples, maintaining a clear instructional voice. There is no excessive hedging or unnatural balancing of opposing viewpoints.
low severity: The structure follows a classic technical tutorial template (Setup -> Call -> Deep Dive -> Advanced Feature). The use of specific API names, methods, and concrete code blocks suggests direct experience or highly accurate LLM retrieval.
low severity: The technical details (e.g., `client.messages.create`, token usage calculation, `end_turn` stop reason) are precise and idiomatic to the Anthropic SDK, indicating a high degree of specialized knowledge.
Human Indicators
The integration of specific, nuanced concepts like amortized O(1) complexity in the context of stream processing, combined with practical code implementation and structural explanations for API objects (like `stop_reason` and `usage`), suggests domain expertise beyond generic LLM synthesis.
The structure of a comprehensive tutorial, including explicit prerequisites and next steps, aligns with established patterns used by human technical writers.