Stop sequences

User-defined strings that cause an LLM to halt generation when produced, used to enforce output structure.

What are Stop sequences?

Stop sequences are user-defined strings that tell an LLM to halt generation when they appear in the output. They are commonly used to keep responses within a desired format, limit runaway text, and make completions easier to parse.

In practice, a stop sequence acts like a hard boundary. If the model generates one of the specified strings, the API stops returning more tokens, which is useful for structured outputs, templates, and interactive workflows. OpenAI’s API and Anthropic’s Messages API both document this pattern, with the returned text omitting the stop string itself. (platform.openai.com)

Understanding Stop sequences

Stop sequences are not a model capability so much as a generation control mechanism exposed by the API layer. Teams use them when they want the model to stop at a known delimiter such as ###, END, or a closing marker in a prompt template. That makes them especially useful when the application expects a clean answer, a single record, or a chunk of text that should not spill into the next section.

They are often paired with prompting patterns that ask the model to emit a specific format, then stop at a sentinel value. For example, a support bot might generate a summary followed by STOP, or a data-extraction prompt might end each item with a delimiter so downstream code can split the response reliably. Some APIs also surface a stop reason so your application can tell whether generation ended naturally, hit a length limit, or stopped because of a custom sequence. (docs.anthropic.com)

Key aspects of Stop sequences include:

Delimiter-driven control: The model stops when it produces a predefined string.
Output shaping: They help keep responses inside templates, tables, or short answer formats.
Parsing support: They make it easier for code to separate one generated section from another.
Provider-specific behavior: Different APIs expose stop handling in slightly different ways, including returned stop reasons.
Prompt design dependency: They work best when the prompt clearly tells the model what delimiter to use.

Advantages of Stop sequences

Cleaner outputs: They reduce the chance that the model continues past the intended response boundary.
Better structure: They are useful for JSON-like templates, sectioned answers, and extraction tasks.
Easier automation: Downstream systems can rely on predictable cutoffs.
Prompt control: They give builders a simple way to constrain generation without extra post-processing.
Faster iteration: Teams can test delimiters quickly while refining prompt formats in PromptLayer.

Challenges in Stop sequences

Delimiter collisions: The model may emit the stop string earlier than intended if the token is too common.
Prompt sensitivity: Small wording changes can affect whether the model uses the delimiter correctly.
Partial outputs: The response may end before the full answer is complete if the stop sequence appears too soon.
Provider differences: Stop handling, stop reasons, and model support vary across APIs.
Brittle workflows: Overreliance on stop strings can make prompts harder to maintain at scale.

Example of Stop sequences in Action

Scenario: A team wants an LLM to draft a customer email and stop once the signature block begins.

They instruct the model to write the email body, then output ###END### when finished. The application sets ###END### as a stop sequence, so the API cuts off generation before the signature or any trailing commentary appears.

The result is a cleaner workflow. The frontend can display the message immediately, while backend code knows the response ended exactly where the team planned. In PromptLayer, that prompt can be versioned, compared, and evaluated so the stop behavior stays consistent across revisions.

How PromptLayer helps with Stop sequences

PromptLayer helps teams track which prompt versions rely on stop sequences, compare how different delimiters affect output quality, and monitor whether generations end where expected. That makes it easier to manage structured completions without losing visibility into prompt changes or model behavior.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.