Cost per request
The average LLM spend per API call, used as a basic monitoring metric for LLM applications.
What is Cost per Request?
Cost per request is the average LLM spend per API call, used as a basic monitoring metric for LLM applications. Because model pricing is usually tied to tokens and usage patterns, teams track this number to understand how much each request costs in practice. OpenAI, for example, publishes token-based pricing for its API models, which is why per-request cost is such a common operational metric. (openai.com)
Understanding Cost per Request
In practice, cost per request turns raw usage data into a simple unit-economics signal. A team can total the spend for a period, then divide by the number of API calls to see the average cost of serving one request. That makes it easier to compare prompts, models, routes, environments, or customer segments without getting lost in token-level detail.
The metric is especially useful because LLM costs can vary by model choice, prompt length, output length, cached tokens, and tool use. Tracking average spend per call helps teams spot drift early, like a new release that increases output length or a workflow that quietly switches to a more expensive model. Helicone’s cost tracking docs describe using per-request and average cost views to understand spending and unit economics across an application. (docs.helicone.ai)
Key aspects of cost per request include:
- Average, not absolute: It summarizes spend across many calls, so it is best used as a trend metric rather than a single-request truth.
- Model-sensitive: Different model families and tiers can change the metric dramatically, even when the product experience looks the same.
- Prompt-length-aware: Longer inputs and outputs usually raise cost, so prompt bloat shows up quickly here.
- Useful for segmentation: Break it down by route, user type, environment, or feature to find expensive workflows.
- Pairs well with volume: A low cost per request can still become expensive at high request volume, so teams should watch both.
Advantages of Cost per Request
Cost per request helps teams:
- Understand unit economics: It gives a clear view of how much one call costs to serve.
- Catch regressions quickly: Sudden increases can reveal prompt drift, model changes, or tool overuse.
- Compare workflows: Teams can benchmark different prompts, models, or product paths on the same scale.
- Support budgeting: It turns usage into a predictable planning metric for finance and engineering.
- Guide optimization: It helps prioritize caching, prompt trimming, routing, and model selection work.
Challenges in Cost per Request
Like any average, the metric has tradeoffs:
- Hides distribution: Averages can conceal a small set of very expensive requests.
- Depends on attribution: Shared prompts, retries, and multi-step chains can make per-call allocation tricky.
- Varies by provider: Token accounting, cached inputs, and tool pricing can differ across vendors.
- Can miss quality context: Lower cost is not automatically better if response quality falls.
- Needs consistent measurement: If logging is incomplete, the average may understate real spend.
Example of Cost per Request in Action
Scenario: A support assistant handles 20,000 requests in a week and the team spends $300 on model usage.
The cost per request is $300 divided by 20,000, or $0.015 per call. If the team later changes the prompt and the average rises to $0.022, they know the new workflow is 46% more expensive and can investigate whether longer outputs, a new model, or extra tool calls are responsible.
That same number becomes even more useful when broken down by route. A concise FAQ endpoint might stay near $0.005 per request, while a research assistant with retrieval and longer answers might cost far more. The average helps the team set budgets, choose routing rules, and decide where PromptLayer should surface alerts and comparisons.
How PromptLayer Helps with Cost per Request
PromptLayer helps teams track request-level usage, compare prompt versions, and connect performance with spend so cost per request becomes a practical operating metric. With visibility into prompts, models, and workflows, the PromptLayer team makes it easier to see where costs rise and which changes are worth keeping.
Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.