Search overlay panel for performing site-wide searches

Boost Performance & Scale with Postgres Advanced. Join Pilot Now!

Faster Agents with Automatic Prompt Caching

Heroku is launching automatic prompt caching starting December 18, 2025. Prompt caching delivers a notable, zero-effort performance increase for Heroku Managed Inference and Agents. Enabled by default, this feature is designed to deliver significantly faster responses for common workloads. We have taken a pragmatic approach and currently only enabled this to cache system prompts and tool definition, and not user messages or conversation history. You can disable caching for any request by setting X-Heroku-Prompt-Caching: false.

What is prompt caching and how does it speed up AI inference?

Prompt caching is an optimization that speeds up inference by securely caching and reusing the processed components of your requests for system prompts and tool definitions.
For applications involving agents, a large portion of the request remains static. Instead of reprocessing this content on every call, Heroku can now reuse the processed result from a secure cache. Currently, to simplify billing, we do not charge for cache writes or pass on the difference for cache hits as we evaluate the system.

How prompt caching works on Heroku

When your application sends an AI request, Heroku intelligently adds cache checkpoints to system prompts or tool definitions before securely passing them to the model.

  1. First Request: A request with a new, substantial prompt (meeting a token minimum) is processed normally, and its results are securely cached.
  2. Similar Requests: Subsequent requests with the same initial prompt or tools reuse the cached components, skipping most of the computation for a faster response.
  3. Automatic Expiration: The cache automatically expires after five minutes of inactivity.

This mechanism applies to all supported models, but caching only occurs when content meets the minimum token threshold, focusing performance gains where they add the most value.

Supported models and caching details

Caching behavior is model-specific, as different models have different thresholds and capabilities (such as caching tool definitions).

Model Vendor System Prompts Tools Minimum Tokens Required
Claude Sonnet 4.5 Anthropic 1,024
Claude Haiku 4.5 Anthropic 4,096
Claude Sonnet 4 Anthropic 1,024
Claude Sonnet 3.7 Anthropic 1,024
Claude Haiku 3.5 Anthropic 2,048
Nova Pro Amazon 1,000
Nova Lite Amazon 1,000

Enterprise-grade security and privacy

Privacy and Security is fundamental to Heroku Managed Inference and Agents. Our prompt caching feature is built on proven security infrastructure, protecting your data with enterprise-grade measures like cryptographic hashing and automatic expiration. The cache exists only in secure memory, not persistent storage, ensuring robust protection. Caching is only enabled for cache system prompts and tool definition and not user messages or conversation history.

How to disable prompt caching on sensitive workflows (opt-out)

While prompt caching offers significant benefits, you retain full control. You can disable caching for any request by adding a single HTTP header:

X-Heroku-Prompt-Caching: false

This is useful for highly sensitive workflows or for performance testing. You have the flexibility to use this feature as you see fit.

Build faster agents

Prompt caching is another step in making Heroku Managed Inference and Agents easy, secure, and efficient for building AI applications. It provides a zero-effort performance boost, transparently accelerating your applications without changing your logic.

We’re excited to see this speed improvement enhance the workflows, document processing, and code-generation tools you’re building on Heroku.

Ready to Get Started?

Stay focused on building great data-driven applications and let Heroku tackle the rest.

Talk to A Heroku Rep   Sign Up Now