Search overlay panel for performing site-wide searches

Boost Performance & Scale with Postgres Advanced. Join Pilot Now!

Optimize Search Precision with Reranking on Heroku AI

Today, we are announcing the general availability of reranking models on Heroku Managed Inference and Agents, featuring support for Cohere Rerank 3.5 and Amazon Rerank 1.0.

Semantic reranking models score documents based on their relevance to a specific query. Unlike keyword search or vector similarity, rerank models understand nuanced semantic relationships to identify the most relevant documents for a given question. Reranking acts as your RAG pipeline’s high-fidelity filter, decreasing noise and token costs by identifying which documents best answer the specific query.

Implement two-stage retrieval with the Heroku Rerank API

The Heroku Managed Inference API is designed to be compatible with the Cohere format. Integrate reranking into your existing RAG (retrieval augmented generation) stack by sending a request to the /v1/rerank endpoint.

To get started, provision a model via the Heroku CLI:

heroku ai:models:create -a your-app-name cohere-rerank-3-5 --as RERANK

Once the model is provisioned, you can set your environment variables and implement reranking with a simple request. In this example, we verify which technical documents best answer a query about database optimization:

const response = await fetch(`${process.env.RERANK_URL}/v1/rerank`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.RERANK_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: process.env.RERANK_MODEL_ID,
    query: 'How do I optimize database connection pooling?',
    documents: [
      'Connection pooling reduces overhead by reusing existing database connections.',
      'You can monitor application performance using built-in metrics and logging tools.',
      'Set max pool size based on your dyno count to prevent connection exhaustion.',
      'Regular database backups are essential for disaster recovery planning.'
    ],
    top_n: 2
  })
});

const { results } = await response.json();
console.log(results);
/*
[
  { index: 0, relevance_score: 0.5948578715324402 },
  { index: 2, relevance_score: 0.42105236649513245 }
]
*/

Analyzing reranker relevance scores

The rerank endpoint returns a comprehensive result object that allows you to map scores back to your original data. Each item in the results array contains an index, which represents the original position of the document in the input array, and a relevance_score, which is a normalized float where higher values indicate better alignment with the query. This structure allows teams to set strict quality thresholds, only passing information to the AI agent if the reranker confirms it is highly relevant.

Top-N context filtering for accuracy and reduced cost

The top_n parameter allows you to limit the number of results returned. This is particularly useful for retrieving only the most relevant documents to keep your context window clean and reduce inference costs. If not specified, the API will return scores for all provided documents.

Regional availability and limits

To support global performance and data residency requirements, these models are available in both US and EU regions. Performance is managed through specific rate limits and transparent pricing models:

  • Cohere Rerank 3.5 supports up to 250 requests per minute at $2.00 per 1,000 queries.
  • Amazon Rerank 1.0 supports up to 200 requests per minute at $1.00 per 1,000 queries.

Next steps

By bringing managed reranking to Heroku, we are giving developers the tools to build enterprise-grade search and retrieval without the overhead of managing infrastructure. Visit the Heroku Managed Inference Documentation for full technical specifications and implementation guides.

Ready to Get Started?

Stay focused on building great data-driven applications and let Heroku tackle the rest.

Talk to A Heroku Rep   Sign Up Now

Browse the archives for News or all blogs. Subscribe to the RSS feed for News or all blogs.