Mandeep Bal

Software Engineering LMTS at Heroku

Heroku Staff

Optimize Search Precision with Reranking on Heroku AI

News
Last Updated: January 15, 2026
Anush DSouza, Mandeep Bal

Today, we are announcing the general availability of reranking models on Heroku Managed Inference and Agents, featuring support for Cohere Rerank 3.5 and Amazon Rerank 1.0.

Semantic reranking models score documents based on their relevance to a specific query. Unlike keyword search or vector similarity, rerank models understand nuanced semantic relationships to identify the most relevant documents for a given question. Reranking acts as your RAG pipeline’s high-fidelity filter, decreasing noise and token costs by identifying which documents best answer the specific query.

Faster Agents with Automatic Prompt Caching

News
Last Updated: December 04, 2025
Anush DSouza, Mandeep Bal

Heroku is launching automatic prompt caching starting December 18, 2025. Prompt caching delivers a notable, zero-effort performance increase for Heroku Managed Inference and Agents. Enabled by default, this feature is designed to deliver significantly faster responses for common workloads. We have taken a pragmatic approach and currently only enabled this to cache system prompts and tool definition, and not user messages or conversation history. You can disable caching for any request by setting X-Heroku-Prompt-Caching: false.

Subscribe to the full-text RSS feed for Mandeep Bal.

How Fastcall Delivers Enterprise-Scale Voice and Messaging with Heroku

Whats New in Heroku AI: New Models and a Flexible Standard Plan

Mandeep Bal

Optimize Search Precision with Reranking on Heroku AI

Faster Agents with Automatic Prompt Caching