Indirect prompt injection

A prompt injection attack delivered through content the model retrieves at runtime, such as a webpage or document, rather than the user's direct input.

What is indirect prompt injection?

Indirect prompt injection is a prompt injection attack delivered through content the model retrieves at runtime, such as a webpage or document, rather than the user's direct input. It matters because the malicious instructions can ride along inside normal-looking data and still influence the model's behavior. (owasp.org)

Understanding indirect prompt injection

In practice, indirect prompt injection shows up when an LLM reads untrusted text from a tool, search result, email, file, or web page and treats that text as if it were safe context. The attacker is not trying to win the user's attention, they are trying to win the model's attention by hiding instructions inside content the model is likely to consume later. OWASP describes this as malicious prompts embedded in external content, and Google has also highlighted it as an evolving threat in multi-source AI systems. (owasp.org)

This is especially relevant in RAG and agentic workflows, where retrieved text is mixed with user input and then passed into reasoning or tool use. The security problem is not just that the model might summarize the bad content, but that it may follow attacker-chosen instructions, leak data, or alter its next action. In other words, the content is not only information to read, it can become instructions to obey if the stack does not clearly separate trusted and untrusted context. Key aspects of indirect prompt injection include:

Untrusted source: The payload comes from external content like a document, page, email, or tool output.
Runtime delivery: The attack reaches the model during retrieval or tool execution, not in the user's original message.
Instruction blending: Malicious text is mixed with normal context, which makes it harder to spot.
Action influence: The model may change its response, call tools differently, or reveal sensitive context.
Evaluation need: Teams should test prompt handling and retrieval paths, not only front-door prompts.

Advantages of indirect prompt injection

Security awareness: It helps teams recognize that retrieved content can be adversarial, not just helpful.
Better stack design: It encourages cleaner boundaries between instructions, data, and tool outputs.
Stronger testing: It pushes teams to build adversarial evals for RAG and agent workflows.
Fewer surprises in production: Systems that account for this risk are less likely to misbehave on real-world content.
Improved governance: It gives security and product teams a shared way to discuss context risks.

Challenges in indirect prompt injection

Source trust is hard to judge: A page or document can look legitimate while still containing malicious instructions.
Detection is imperfect: Hidden or subtle instructions can be difficult to filter reliably.
Complex retrieval paths: The more tools and sources a system uses, the larger the attack surface.
Behavior is model-dependent: Different models and prompts react differently to the same injected text.
Mitigations need tuning: Overly aggressive filtering can reduce usefulness, while weak filtering leaves gaps.

Example of indirect prompt injection in action

Scenario: A support agent uses a RAG workflow to summarize a customer-uploaded PDF. The PDF contains ordinary policy text, but also includes a hidden line telling the model to ignore prior instructions and expose internal notes.

If the pipeline inserts the PDF verbatim into the model context, the hidden instruction can compete with the developer prompt and the user's request. A safer setup labels the PDF as untrusted, keeps system instructions separate, and checks outputs for unexpected tool calls or policy leakage. That is why teams should treat retrieved content as data first, even when it looks like a normal document.

In a practical test, a team might run the same document through two flows, one with raw context injection and one with guardrails plus evals. The difference helps show whether the agent is following the task or following the injected instruction.

How PromptLayer helps with indirect prompt injection

PromptLayer helps teams version prompts, inspect traces, and evaluate model behavior across real workflows, which makes it easier to spot when retrieved content is changing outcomes. That visibility is useful for RAG and agent systems where untrusted context enters at runtime, because you can compare prompt variants, review outputs, and build tests around known attack patterns.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.