Mandeep Bal
- News
- Last Updated: January 15, 2026
- Anush DSouza, Mandeep Bal
Today, we are announcing the general availability of reranking models on Heroku Managed Inference and Agents, featuring support for Cohere Rerank 3.5 and Amazon Rerank 1.0.
Semantic reranking models score documents based on their relevance to a specific query. Unlike keyword search or vector similarity, rerank models understand nuanced semantic relationships to identify the most relevant documents for a given question. Reranking acts as your RAG pipeline’s high-fidelity filter, decreasing noise and token costs by identifying which documents best answer the specific query.
- News
- Last Updated: December 04, 2025
- Anush DSouza, Mandeep Bal
Heroku is launching automatic prompt caching starting December 18, 2025. Prompt caching delivers a notable, zero-effort performance increase for Heroku Managed Inference and Agents. Enabled by default, this feature is designed to deliver significantly faster responses for common workloads. We have taken a pragmatic approach and currently only enabled this to cache system prompts and tool definition, and not user messages or conversation history. You can disable caching for any request by setting X-Heroku-Prompt-Caching: false.
Subscribe to the full-text RSS feed for Mandeep Bal.