Optimize Search Precision with Reranking on Heroku AI
- Last Updated: January 15, 2026
Today, we are announcing the general availability of reranking models on Heroku Managed Inference and Agents, featuring support for Cohere Rerank 3.5 and Amazon Rerank 1.0.
Semantic reranking models score documents based on their relevance to a specific query. Unlike keyword search or vector similarity, rerank models understand nuanced semantic relationships to identify the most relevant documents for a given question. Reranking acts as your RAG pipeline’s high-fidelity filter, decreasing noise and token costs by identifying which documents best answer the specific query.
Implement two-stage retrieval with the Heroku Rerank API
The Heroku Managed Inference API is designed to be compatible with the Cohere format. Integrate reranking into your existing RAG (retrieval augmented generation) stack by sending a request to the /v1/rerank endpoint.
To get started, provision a model via the Heroku CLI:
heroku ai:models:create -a your-app-name cohere-rerank-3-5 --as RERANK
Once the model is provisioned, you can set your environment variables and implement reranking with a simple request. In this example, we verify which technical documents best answer a query about database optimization:
const response = await fetch(`${process.env.RERANK_URL}/v1/rerank`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.RERANK_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: process.env.RERANK_MODEL_ID,
query: 'How do I optimize database connection pooling?',
documents: [
'Connection pooling reduces overhead by reusing existing database connections.',
'You can monitor application performance using built-in metrics and logging tools.',
'Set max pool size based on your dyno count to prevent connection exhaustion.',
'Regular database backups are essential for disaster recovery planning.'
],
top_n: 2
})
});
const { results } = await response.json();
console.log(results);
/*
[
{ index: 0, relevance_score: 0.5948578715324402 },
{ index: 2, relevance_score: 0.42105236649513245 }
]
*/
Analyzing reranker relevance scores
The rerank endpoint returns a comprehensive result object that allows you to map scores back to your original data. Each item in the results array contains an index, which represents the original position of the document in the input array, and a relevance_score, which is a normalized float where higher values indicate better alignment with the query. This structure allows teams to set strict quality thresholds, only passing information to the AI agent if the reranker confirms it is highly relevant.
Top-N context filtering for accuracy and reduced cost
The top_n parameter allows you to limit the number of results returned. This is particularly useful for retrieving only the most relevant documents to keep your context window clean and reduce inference costs. If not specified, the API will return scores for all provided documents.
Regional availability and limits
To support global performance and data residency requirements, these models are available in both US and EU regions. Performance is managed through specific rate limits and transparent pricing models:
- Cohere Rerank 3.5 supports up to 250 requests per minute at $2.00 per 1,000 queries.
- Amazon Rerank 1.0 supports up to 200 requests per minute at $1.00 per 1,000 queries.
Next steps
By bringing managed reranking to Heroku, we are giving developers the tools to build enterprise-grade search and retrieval without the overhead of managing infrastructure. Visit the Heroku Managed Inference Documentation for full technical specifications and implementation guides.
- Originally Published:
- AIHeroku AIManaged Inference and Agents