Context compaction

The practice of shrinking a running conversation or prompt to fit the context window, typically by summarizing, dropping, or restructuring prior turns.

What is Context compaction?

Context compaction is the practice of shrinking a running conversation or prompt to fit the model’s context window, usually by summarizing, dropping, or restructuring earlier turns. It helps long-lived assistants keep moving without losing the important state they have already built up. (platform.openai.com)

Understanding Context compaction

In practice, context compaction is a memory-management strategy for LLM apps and agents. As a chat, workflow, or tool sequence grows, the prompt can exceed the model’s token budget, which can lead to truncation or poor reasoning. OpenAI’s documentation describes the context window as the total token budget for input, output, and reasoning, and notes that long-running agent loops can fill it quickly. (platform.openai.com)

Teams usually compact context by keeping the latest turns verbatim, then converting older material into a shorter summary or structured state. Some systems summarize workflow events, others preserve decisions, goals, files, or tool results in a compressed form. The goal is not to store every token, but to retain the signal the model needs to continue correctly. Google’s ADK, for example, implements context compaction with a sliding-window approach over older agent events. (adk.dev)

Key aspects of Context compaction include:

Token budget awareness: compaction happens because the context window is finite, not because the conversation is finished.
Selective retention: recent turns and high-value facts are preserved, while low-signal text is reduced.
Summarization layer: older content is often rewritten into a concise narrative or structured state block.
Workflow continuity: the model keeps enough history to continue a task across long sessions.
Operational tradeoff: better compacted context usually means less raw detail, so the summary has to be accurate.

Advantages of Context compaction

Longer sessions: agents can keep working across many turns without immediately hitting token limits.
Lower cost: smaller prompts reduce wasted tokens on repeated history.
Cleaner reasoning: the model sees a tighter, more relevant working set.
Better agent reliability: compacted state can preserve goals, decisions, and tool outputs more consistently.
Easier automation: teams can standardize how state is carried forward between runs.

Challenges in Context compaction

Information loss: summaries can omit details that later matter.
Summary drift: repeated compaction can slowly distort the original intent.
State design: deciding what to keep as raw text versus structured memory takes care.
Evaluation difficulty: it can be hard to tell whether a compacted prompt still contains enough context.
Tooling complexity: production systems need thresholds, retention rules, and recovery paths.

Example of Context compaction in action

Scenario: a support agent helps a customer debug a failed API integration over 40 turns. The early turns contain setup details, account info, and several tool calls. By the end, the conversation is too large for the model to keep all of it verbatim.

The system compacts the conversation by keeping the last few exchanges intact, then turning the earlier discussion into a short state record: what failed, what was already tried, which environment variables were checked, and which next step remains. The model can then continue the session with the important history intact, without re-reading every old message.

In a production workflow, this can also include structured artifacts such as task status, open questions, or extracted entities. That makes the next model call much more stable than blindly truncating old turns.

How PromptLayer helps with Context compaction

PromptLayer helps teams observe how prompts evolve over time, compare prompt versions, and inspect the outputs that result from shorter or compacted context. That makes it easier to test whether your compaction strategy is preserving the right state, especially in long-running agent workflows where prompt quality depends on what gets kept, summarized, or dropped.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.