MCP sampling

An MCP capability where the server requests the client's LLM to generate a completion, letting servers leverage user-side model access.

What is MCP sampling?

‍

MCP sampling is a Model Context Protocol capability that lets a server ask the client to generate an LLM completion on its behalf. In practice, the client keeps control of model access and permissions while the server can still use AI for tasks like drafting, summarizing, or deciding next steps. (modelcontextprotocol.io)

Understanding MCP sampling

‍

In MCP, sampling is not a standalone model API. It is a protocol flow where a server sends a sampling request through the client, and the client forwards that request to the model it already has access to. That makes it useful when the server wants AI help but should not hold its own model key or own the full inference path. (modelcontextprotocol.io)

This pattern fits naturally into agentic workflows. For example, a tool can inspect data, then ask the client’s model to write a summary or choose an action, and then return that result to the server. MCP’s sampling design also assumes a human-in-the-loop review flow, with users able to review or edit requests and responses before they are delivered. (modelcontextprotocol.io)

Key aspects of MCP sampling include:

Client-owned model access: The client handles the model call, so servers do not need direct API credentials.
Server-initiated generation: A server can request a completion during another MCP operation such as a tool call or resource read.
User review control: Implementations are expected to support review, editing, or denial before a response is sent.
Capability negotiation: Clients must declare sampling support, and some features like tool use are advertised separately.
Cross-provider design: The spec is intended to work across model providers, not only one vendor.

Advantages of MCP sampling

‍

No server-side model key required: Servers can use AI without managing separate inference credentials.
Better permission boundaries: The client stays in control of which model is used and when it runs.
Fits agentic systems: It supports nested reasoning inside tools and workflows.
Flexible UX: Clients can expose approvals, edits, or confirmations in a way that matches their product.
Provider-agnostic pattern: Teams can swap underlying model providers more easily.

Challenges in MCP sampling

‍

Extra coordination: The request must move through the client, which adds flow complexity.
User experience design: Good review and approval UI takes care to get right.
Capability matching: Servers need to know whether the client supports sampling features they want to use.
Context boundaries: Teams need to think carefully about what context should be shared in each sampling request.
Latency tradeoff: A client-mediated model call can add round-trip time versus calling a model directly.

Example of MCP sampling in action

‍

Scenario: a knowledge-base server processes a user request to summarize a long incident report. The server extracts the relevant text, then asks the client to sample a completion that turns the raw notes into a concise summary.

The client sends that request to its chosen model, shows the prompt to the user for review, and returns the generated summary back to the server. The server then stores or displays the result as part of the original workflow.

This is useful when the server needs a smart step, but the organization wants model choice, approvals, and data exposure to remain under client control.

How PromptLayer helps with MCP sampling

‍

MCP sampling is still a model interaction, which means teams often want visibility into prompts, outputs, and evaluation quality. PromptLayer helps you track those generations, compare prompt versions, and review outputs across agent workflows, including patterns that rely on nested model calls like sampling.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.