Reflexion

An agent framework that adds verbal self-reflection on past failures into memory to improve performance on retries.

What is Reflexion?

‍

Reflexion is an agent framework that adds verbal self-reflection on past failures into memory so the agent can improve on retries. In practice, it turns mistakes into text-based feedback that the next attempt can use. (arxiv.org)

Understanding Reflexion

‍

Reflexion comes from the idea of verbal reinforcement learning, where an agent does not update model weights after every miss. Instead, it writes a reflection about what went wrong, stores that reflection, and feeds it back into the next run as context. The original paper reports gains across tasks like sequential decision-making, coding, and language reasoning. (arxiv.org)

In an LLM stack, Reflexion usually sits around the agent loop, between evaluation and the next retry. That makes it useful when a team wants the system to learn from failed trajectories without retraining the base model. Over time, the reflection memory becomes a lightweight source of experience that can guide better plans, tool use, and final answers.

Key aspects of Reflexion include:

Self-critique: the agent describes why a prior attempt failed.
Persistent memory: the reflection is stored and reused on later attempts.
Retry-driven improvement: each new attempt can incorporate prior lessons.
Lightweight learning: the pattern improves behavior without changing model weights.
Task-agnostic design: it can be applied to reasoning, code, and tool-using agents.

Advantages of Reflexion

‍

Better retry quality: the agent starts the next attempt with more context.
No retraining loop: teams can improve behavior without fine-tuning after every failure.
Easy to inspect: reflections are readable, which helps debugging.
Fits agent workflows: it works naturally with planning, acting, and evaluation loops.
Reusable lessons: past mistakes can inform future tasks with similar failure modes.

Challenges in Reflexion

‍

Reflection quality: weak critiques can reinforce the wrong lesson.
Memory management: too much stored context can add noise.
Evaluation dependence: the loop works best when failures are detectable.
Prompt sensitivity: small wording changes can shift the reflection quality.
No weight-level learning: it improves behavior at runtime, not the base model itself.

Example of Reflexion in Action

‍

Scenario: a coding agent writes a function, tests it, and the test fails because it missed an edge case.

The agent then produces a short reflection, such as noting that it assumed only positive inputs, and stores that note in memory. On the next retry, it reuses that reflection and adjusts the implementation before running tests again.

This is the core Reflexion pattern: failure, self-review, memory, and a smarter retry. For teams building agentic products, that can turn repeated dead ends into faster convergence. (arxiv.org)

How PromptLayer helps with Reflexion

‍

PromptLayer helps teams track the prompts, outputs, and evals that drive Reflexion-style loops, so reflections and retries are easier to compare over time. That makes it simpler to see which prompt changes improve recovery after failure and which ones add noise.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.