Search overlay panel for performing site-wide searches

Boost Performance & Scale with Postgres Advanced. Join Pilot Now!

Triage and Fix with Confidence: heroku run and OTel on Heroku Fir

When Production Goes Sideways

Imagine this: It’s 2 AM, your phone buzzes with an alert, and your dashboards are screaming. Production is down. Sound familiar? An automated health check has failed, and your internal dashboards are showing a spike in errors. You’ve just pushed a new release that included a critical database schema change, and a background worker task that relies on it is now failing. The web application is still running, but users are starting to report issues. You need to investigate and fix the problem, but doing so on a running production dyno could be risky and impact your live service.

In the past, you might have used heroku run:inside to connect to a running web dyno and troubleshoot, but that can consume resources from a live process and potentially destabilize a running production application. Alternatively, you might have used heroku run:detached to run a command in the background, but this doesn’t give you the interactive session you need for real-time diagnostics.

This is a classic developer’s nightmare, but it’s exactly the kind of scenario where Heroku’s next-gen platform capabilities shine. Now, you can use heroku run to launch a dedicated, one-off dyno to perform administrative or maintenance tasks, completely separate from your formation dynos. This is a key difference from heroku run:inside and heroku run:detached.

Fixing the problem: The power of heroku run interactive

Heroku’s next-gen platform (codename Fir) introduces the heroku run command for launching a one-off dyno to execute administrative or maintenance tasks for your application. This command initiates an interactive CLI session, relaying input and output between your terminal and the running dyno. This is a critical solution to a functional gap, providing an isolated, yet responsive shell for hands-on operations.

This new interactive capability is perfect for a critical task like a database migration. You get a shell inside a temporary dyno that has your application’s code and environment variables, allowing you to run a migration script and watch the output in real time. The ability to run interactive commands like this in a safe space is paramount for effective debugging and troubleshooting.

To use heroku run on Fir, you must first add a public SSH key to your account. This is a new security feature that provides a robust authentication mechanism for interactive sessions. You must also have your application deployed.

Then, simply run your command with heroku run. For a migration, you might use a command like this:

$ heroku run -a my-test-application -- rake db:migrate

This new workflow provides a much-needed bridge between a quick fix and a full-scale deployment, giving developers the power and flexibility they need to manage their applications more effectively.

Gaining full visibility with OTel

Now that the database migration is complete, how do you know your application’s performance has returned to normal? This is where the new OpenTelemetry (OTel) signal enhancements, natively integrated into the Fir platform, come into play. Heroku’s telemetry provides comprehensive out-of-the-box data, ensuring consistency by adhering to semantic conventions.

This adherence to an open standard is not a trivial detail; it is a design choice that ensures consistency and interoperability. The benefit of Heroku’s telemetry data using standardized attribute names and formats is that it allows the data to be easily ingested and correlated by any OTel-compliant observability platform, such as Grafana, Honeycomb or Datadog. This approach mitigates the risk of vendor lock-in and simplifies integration into an existing observability ecosystem.

  • Request duration percentiles: Heroku’s router now emits detailed metrics that include request duration summary statistics and percentiles. This allows you to see the “long tail” of performance issues that affect a small but significant percentage of your users. For example, you can inspect metrics like http.server.request.duration.p0-999 to see the 99th percentile request duration. The heroku.router.connect and heroku.router.service attributes are now captured as floats, providing more precise timing data. This detailed view gives you the confidence that your fix not only restored the service but also improved the experience for all users.
  • The heroku.app.name attribute: The heroku.app.name attribute is now automatically added to all application signals. This simple addition is incredibly powerful for filtering and analyzing data. You can easily filter all of your logs, metrics, and traces by this attribute to get a unified and complete view of a specific application’s health without having to look up the app UUID. This is especially useful in a microservices architecture where you have multiple applications running in the same space. This holistic data model allows for efficient correlation and analysis across all components of the system.

Heroku’s platform also emits signals from first-party services like the Heroku Platform API, Heroku Postgres, and Heroku Kafka. These signals are all filterable by the service.name attribute, allowing you to see all activity related to a specific service. This enables a level of operational visibility that is invaluable for root cause analysis.

A complete debugging workflow

With Heroku Fir, you have a complete and powerful debugging workflow that covers every stage of an incident. It’s a significant leap forward to improved operational efficiency, reduced risk, and faster incident resolution.

  1. Identify the problem: Start with your observability tools. The OTel signal enhancements, including the new http.server.request.duration router metrics, can help you identify a problem like high latency or a spike in errors. Use the heroku.app.name attribute to filter the traces and quickly pinpoint the affected application.
  2. Fix with confidence: Use the interactive heroku run command to launch a dedicated, one-off dyno to perform your fix. This provides a safe, isolated environment that won’t disrupt your live formation dynos. This capability reduces the risk of maintenance tasks affecting production by providing an isolated environment.
  3. Verify and monitor: Once the fix is complete, use the new OTel signals to confirm that your application’s performance has returned to normal. The request duration percentiles give you a detailed view of latency, and the heroku.app.name attribute ensures you can monitor the long-term health of your application with ease. This comprehensive telemetry provides the data needed for quick and effective root cause analysis, thereby reducing Mean Time to Resolution (MTTR) and improving operational resilience.

By combining the powerful interactivity of heroku run with the deep insights from native OTel signals, you’re not just fixing problems — you’re building a more resilient and observable application. This is the new era of Heroku development, built to empower you to debug, manage, and scale your applications with unprecedented visibility and control.

Ready to experience this new level of control and visibility? Explore Heroku Fir today.

More from the Author

Browse the archives for Engineering or all blogs. Subscribe to the RSS feed for Engineering or all blogs.