Managed Inference and Agents

Whats New in Heroku AI: New Models and a Flexible Standard Plan

News
Last Updated: February 19, 2026
Anush DSouza, Josh Lewis

Heroku is introducing significant updates to Managed Inference and Agents. These changes focus on reducing developer friction, expanding model catalogue, and streamlining deployment workflows.

Code Execution Sandbox for Agents on Heroku

News
Last Updated: February 17, 2026
Anush DSouza

Large language models are good at writing code. Data from Anthropic shows that allowing Claude to execute scripts, rather than relying on sequential tool calls, reduces token consumption by an average of 37%, with some use cases seeing reductions as high as 98%.

Untrusted code needs a secure and isolated place to execute. We solved this with code execution sandboxes (powered by one-off dynos), launched alongside Heroku Managed Inference and Agents in May 2025.

Building AI-Powered Observability with Heroku Managed Inference and Agents

Engineering
Last Updated: February 13, 2026
Karunasri (Karuna) Garigipati

If you’ve ever debugged a production incident, you know the drill: IDE on one screen, Splunk on another, Sentry open in a third tab, frantically copying error messages between windows while your PagerDuty keeps buzzing.

You ask “What errors spiked in the last hour?” but instead of an answer, you have to context-switch, recall complex query syntax, and mentally correlate log timestamps with your code. By the time you find the relevant log, you’ve lost your flow. Meanwhile the incident clock keeps ticking away.

The workflow below fixes that broken loop. We’ll show you how to use the Model Context Protocol (MCP) and Heroku Managed Inference and Agents to pipe those observability queries directly into your IDE, turning manual hunting into instant answers.

Building AI Search on Heroku

Engineering, News
Last Updated: January 29, 2026
Anush DSouza

If you’ve built a RAG (Retrieval Augmented Generation) system, you’ve probably hit this wall: your vector search returns 20 documents that are semantically similar to the query, but half of them don’t actually answer it.

A user asks “how do I handle authentication errors?” and gets back documentation about authentication, errors, and error handling in embedding space, but only one or two are actually useful.

This is the gap between demo and production. Most tutorials stop at vector search. This reference architecture shows what comes next. This AI Search reference app shows you how to build a production grade enterprise AI search using Heroku Managed Inference and Agents.

Optimize Search Precision with Reranking on Heroku AI

News
Last Updated: January 15, 2026
Anush DSouza, Mandeep Bal

Today, we are announcing the general availability of reranking models on Heroku Managed Inference and Agents, featuring support for Cohere Rerank 3.5 and Amazon Rerank 1.0.

Semantic reranking models score documents based on their relevance to a specific query. Unlike keyword search or vector similarity, rerank models understand nuanced semantic relationships to identify the most relevant documents for a given question. Reranking acts as your RAG pipeline’s high-fidelity filter, decreasing noise and token costs by identifying which documents best answer the specific query.

Heroku AI: Accelerating AI Development With New Models, Performance Improvements, and Messages API

News
Last Updated: December 18, 2025
Anush DSouza

This month marks significant expansion for Heroku Managed Inference and Agents, directly accelerating our AI PaaS framework. We’re announcing a substantial addition to our model catalog, providing access to leading proprietary AI models such as Claude Opus 4.5, Nova 2, and open-weight models such as Kimi K2 thinking, MiniMax M2, and Qwen3. These resources are fully managed, secure, and accessible via a single CLI command. We have also refreshed aistudio.heroku.com, please navigate to …

Faster Agents with Automatic Prompt Caching

News
Last Updated: December 04, 2025
Anush DSouza, Mandeep Bal

Heroku is launching automatic prompt caching starting December 18, 2025. Prompt caching delivers a notable, zero-effort performance increase for Heroku Managed Inference and Agents. Enabled by default, this feature is designed to deliver significantly faster responses for common workloads. We have taken a pragmatic approach and currently only enabled this to cache system prompts and tool definition, and not user messages or conversation history. You can disable caching for any request by setting X-Heroku-Prompt-Caching: false.

Heroku AI Studio is Your Workspace for Smarter, Faster AI Apps

News
Last Updated: September 17, 2025
Anush DSouza

Ever found yourself in the endless loop of tweaking a prompt, running your code, and waiting to see if you finally got the output you wanted? That slow, frustrating feedback cycle is a common headache for AI developers. What if you could speed that up and get back to what you do best? Let’s focus on building amazing applications.

We're excited to introduce Heroku AI Studio, a new set of tools designed to streamline your generative AI development from prompt to production. We've focused on creating a more intuitive and efficient workflow, so you can focus on innovation instead of wrestling with your development environment. When using the Heroku Managed Inference and Agents add-on, this new tool is about to become an essential part of your workflow.

Amazon Nova Models: Now Available on Heroku

Engineering
Last Updated: August 26, 2025
Anush DSouza

Building intelligent applications requires powerful, cost-effective AI. Today, we’re simplifying that process by making Amazon’s cutting-edge Nova models directly available via Heroku Managed Inference and Agents. Provisioning these models is as simple as attaching the add-on to your Heroku application, providing a direct, managed path for developers and businesses to leverage a new class of powerful and cost-effective AI models with unparalleled simplicity.

Heroku AI Expands Model Offering with OpenAI’s gpt-oss-120b

Engineering
Last Updated: August 20, 2025
Anush DSouza

Start building with OpenAI’s new open-weight model, gpt-oss-120b, now available on Heroku Managed Inference and Agents. This gives developers a powerful, transparent, and flexible way to build and deploy AI applications on the platform they already trust. Access gpt-oss-120b with our OpenAI-compatible chat completions API, which you can drop into any OpenAI-compatible SDK or framework.

Subscribe to the full-text RSS feed for Managed Inference and Agents.

Heroku March 2026 Update

Heroku March 2026 Update

How Fastcall Delivers Enterprise-Scale Voice and Messaging with Heroku

Heroku March 2026 Update

Managed Inference and Agents

Whats New in Heroku AI: New Models and a Flexible Standard Plan

Code Execution Sandbox for Agents on Heroku

Building AI-Powered Observability with Heroku Managed Inference and Agents

Building AI Search on Heroku

Optimize Search Precision with Reranking on Heroku AI

Heroku AI: Accelerating AI Development With New Models, Performance Improvements, and Messages API

Faster Agents with Automatic Prompt Caching

Heroku AI Studio is Your Workspace for Smarter, Faster AI Apps

Amazon Nova Models: Now Available on Heroku

Heroku AI Expands Model Offering with OpenAI’s gpt-oss-120b