Extended thinking

Anthropic's reasoning mode where Claude generates explicit thinking tokens before its final response, configurable via a thinking budget.

What is Extended thinking?

Extended thinking is Anthropic’s reasoning mode for Claude, where the model spends extra tokens on internal deliberation before it returns a final answer. In practice, teams use a configurable thinking budget to trade off more reasoning time for better performance on harder tasks. (docs.claude.com)

Understanding Extended thinking

Extended thinking is designed for problems that benefit from step-by-step reasoning, such as multi-part analysis, planning, math, and code debugging. Anthropic’s docs describe it as enhanced reasoning with a visible thinking stream in the API, followed by the final text response. (docs.claude.com)

For developers, the key idea is control. You set a budget for how many tokens Claude may use for its internal reasoning, and that budget becomes part of the model’s overall token limit. In other words, extended thinking is not just a UI toggle, it is an API-level way to allocate more of the model’s context to deliberate before answering. (docs.claude.com)

Key aspects of Extended thinking include:

Thinking budget: You can allocate a specific token budget for internal reasoning.
Reasoning before response: Claude can spend more time working through the problem before producing final text.
API visibility: The response can include thinking blocks ahead of the answer text.
Best for hard tasks: It is most useful when the prompt requires planning, decomposition, or careful analysis.
Cost and latency tradeoff: More reasoning typically means more tokens and more time spent on the request.

Advantages of Extended thinking

Better complex-task performance: It gives the model more room to work through difficult problems.
More predictable reasoning depth: A budget makes reasoning effort easier to manage.
Useful for debugging: It can help Claude handle multi-step code and logic tasks more carefully.
Fits production workflows: Teams can tune it per request instead of changing models.
Works well with observability: Reasoning-heavy prompts are easier to inspect and evaluate when tracked.

Challenges in Extended thinking

Higher token usage: More reasoning can increase cost.
Added latency: More internal deliberation can slow response times.
Budget tuning: Too little budget may not help, while too much can be wasteful.
Prompt sensitivity: Some tasks do not benefit much from extended reasoning.
Evaluation complexity: Reasoning quality is harder to measure than final output alone.

Example of Extended thinking in action

Scenario: a product team asks Claude to review a customer support workflow, identify bottlenecks, and suggest a revised process.

With extended thinking enabled, Claude can spend its budget breaking the problem into stages, comparing options, and planning a structured response before it writes the final recommendation. That is often more effective than asking for a fast single-pass answer on a task with several constraints.

In practice, teams might pair this with prompt logging and evaluations so they can compare outputs across different thinking budgets and see which setting produces the best results for their use case.

How PromptLayer helps with Extended thinking

PromptLayer gives teams a place to version prompts, inspect outputs, and evaluate runs that use reasoning-heavy configurations like extended thinking. That makes it easier to compare prompt changes, track quality over time, and understand when a higher thinking budget actually improves results.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.