Working memory

The portion of an agent's state held inside the current model context, in contrast to long-term memory persisted outside the context window.

What is Working Memory?

Working memory is the portion of an agent's state held inside the current model context, in contrast to long-term memory persisted outside the context window. In practice, it is the short-lived information the model can actively use right now, such as the latest user message, tool results, and interim reasoning notes. (docs.anthropic.com)

Understanding Working Memory

For LLM agents, working memory is not a separate database. It is the live context the model can attend to during a run, which means it is bounded by the context window and lost when that context is not carried forward. Anthropic describes the context window as the model’s “working memory,” and Microsoft notes that raw LLM calls are stateless unless memory is managed around them. (docs.anthropic.com)

In agent systems, working memory usually includes the current instruction set, recent conversation turns, tool outputs, scratchpad-style notes, and any compact state a framework injects before the next step. Long-term memory, by contrast, lives outside the window and is retrieved or summarized when needed. That split is what helps agents stay coherent without stuffing every prior detail into every request. (learn.microsoft.com)

Key aspects of Working Memory include:

Scope: It covers only what the model can see in the active context window.
Recency: It tends to hold the most recent, most relevant state from the current session.
Volatility: It is temporary and can disappear once the run ends or the window rolls forward.
Coordination: It carries the state needed to plan, call tools, and continue multi-step tasks.
Compression: It often benefits from summaries, pruning, or state selection to avoid overflow.

Advantages of Working Memory

Fast access: The model can use current state immediately without an extra retrieval step.
Better coherence: Recent facts stay close at hand, which helps the agent stay on task.
Lower orchestration overhead: Teams can keep simple session state in context instead of wiring more storage.
More natural tool use: Tool outputs can be added directly to the next turn for follow-up actions.
Cleaner separation of concerns: Short-term context and durable memory can be managed with different mechanisms.

Challenges in Working Memory

Context limits: The window is finite, so long sessions can crowd out important details.
Information loss: Older but still useful facts can fall out of the active context.
State drift: If summaries are sloppy, the agent can carry forward the wrong version of the task.
Token cost: Keeping too much in working memory increases latency and usage.
Hidden complexity: Good short-term memory often depends on careful pruning, routing, and consolidation.

Example of Working Memory in Action

Scenario: a support agent is helping a user debug a failed API request.

The user first pastes an error message, then provides the request payload, then asks the agent to compare the payload against the docs. The agent keeps the error, the payload, and the latest debugging goal in working memory so it can reason over them together. If the session grows longer, older chat turns may be summarized or moved into long-term memory, while the current issue stays in context.

This is why working memory matters for agents that chain multiple steps. The model needs enough live context to decide what to do next, but not so much that the prompt becomes bloated or noisy.

How PromptLayer Helps with Working Memory

PromptLayer helps teams inspect the prompts, traces, and agent steps that make up working memory in practice. That makes it easier to see what state was present at each turn, where context was summarized, and how prompt changes affect short-term behavior across runs.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.