Corrective Action Update for the Heroku June 10th Outage

Last Updated: September 05, 2025

Beginning at 06:00 UTC on Tuesday, Jun 10, 2025, Heroku customers began experiencing a platform service disruption due to an unintended system update applied to our production infrastructure by our vendor. To compound the issue, the Heroku Status site was affected by the outage. Shortcomings in site design and API latency resulted in timeouts, and the Status site appeared as if there were no active incidents.

On June 15th we published a summary of our initial investigation, mitigation, and root cause analysis. We also identified the following post-incident remediation objectives:

Ensuring immutable infrastructure
Increasing resilience of communication channels
Accelerating investigation and recovery

As promised, we are providing a status update of our continued corrective actions.

Ensuring Immutable Infrastructure

Our June 15th commitment to customers

The root cause of this outage was an unexpected change to our running environment. We disabled the automated upgrade service during the incident (June 10), with permanent controls coming early the next week. No system changes will occur outside our controlled deployment process going forward. Additionally, we’re auditing all base images for similar risks and improving our network services to handle graceful service restarts.

Where we are today

To ensure that future system changes occur only in a controlled manner, we:

Implemented a permanent halt on all unattended vendor operating system upgrades
Audited our images to rule out any other sources of mutation
Developed a risk-based strategy for what types of changes could be safely automated as an attended upgrade

For network resiliency, we added automated startup scripts for our networking services. We are also actively working with our colleagues to help maintain and validate our system images.

Increasing Resilience of Communication Channels

Our June 15th commitment to customers

Our status page failed you when you needed it most because our primary communication tools were affected by the outage. We are building backup communication channels that are fully independent to ensure we can always provide timely and transparent updates, even in a worst-case scenario.

Our approach

Our objective is to move as quickly as possible while providing a smooth transition for customer Status site integrations and without compromising our internal operational safeguards.

Where we are today

We immediately added CDN caching to the Heroku Status site for resiliency and optimized our page load state to eliminate the appearance of false negatives. We are methodically migrating our internal and customer-facing integrations to the Salesforce Trust site, including internal release gating, CLI, and App Metrics integrations. We are also working on a formalized backup incident communications channel for business continuity. From the process side, new Trust site templates and incident commander protocols have been prepared. Heroku has aligned with global incident commander protocols, which require an incident update cadence of at least one update every 30 minutes for active Sev-0 incidents, and at least one update every 60 minutes for Sev-1, and Sev-2 incidents. The Heroku Status site configuration will be fully migrated to the Salesforce Trust site. Beginning on Oct 10th, the Salesforce Trust site will serve as the primary channel for all incident and maintenance communications.

What customers should expect

Incident and Maintenance Notifications

Customers who are currently subscribed to the Heroku Status site will be sent an email to confirm their intent to remain subscribed to incident notifications. Any Status site subscribers that don’t explicitly opt out will be automatically subscribed to the new Trust site.

Status API Migration

We are working on a longer-term Status API migration strategy to minimize disruption for customers with Status API integrations. We will keep Heroku customers informed of future migration expectations, provide migration guidance, and ensure that a minimum of 30 days is provided for customers to migrate their Status API integrations.

Migration instructions and guidance

We will provide Status site migration updates and guidance through the following communication channels:

Emails to Status site subscribers
Heroku changelog
Heroku DevCenter Status site page
Heroku Status site scheduled maintenance event

Accelerating Investigation and Recovery

Our June 15th commitment to customers

The time it took to diagnose and resolve this incident was unacceptable. To address this, we are overhauling our incident response tooling and processes. This includes building new tools and improving existing ones to help engineers diagnose issues faster and run queries across our entire fleet at scale. We are also streamlining our “break-glass” procedures to ensure teams have rapid access to critical systems during an emergency and enhancing our monitoring to detect complex issues much earlier.

Where we are today

We enhanced our testing and monitoring to more effectively prevent, detect, and diagnose issues, including the addition of:

Automated regression testing for dyno-to-dyno communications
Canaries for long-running applications
Additional monitoring and alerting for our monitoring and alerting orchestration service
Streamlined monitoring for our platform log drain collection and forwarding service
Improved introspection and monitoring for our customer notifications service

We are investigating the feasibility of monitoring operating system drift. Additionally, we plan to add canaries for dyno network connectivity.

To reduce the time to issue detection and remediation, we streamlined authorized engineers’ access to Private Spaces and dynos to conduct investigations. We are also working on safe processes at scale to expedite the detection and remediation of configuration-caused incidents.

We streamlined our “break-glass” tooling, and are in the process of revising related procedures for all core services.

Our ongoing commitment

We greatly appreciate the opportunity to serve our customers, and are committed to ensuring that this magnitude of outage and lapse in communications never happens again. We will continue to improve our processes, platform monitoring, performance, and resilience even after we have completed our identified corrective actions. We will keep you informed on the progress of pending corrective actions, including the Trust site migration.

Originally Published: September 5, 2025

Ready to Get Started?

Stay focused on building great data-driven applications and let Heroku tackle the rest.

Talk to A Heroku Rep Sign Up Now

More from the Author

Michelle Peot

Sr Director of Product Management, Heroku Core Platform at Heroku

Heroku Staff

Browse the archives for News or all blogs. Subscribe to the RSS feed for News or all blogs.

How Fastcall Delivers Enterprise-Scale Voice and Messaging with Heroku

From Fragmented Logs to Full-Stack Visibility with SolarWinds Papertrail

Corrective Action Update for the Heroku June 10th Outage

Ensuring Immutable Infrastructure

Our June 15th commitment to customers

Where we are today

Increasing Resilience of Communication Channels

Our June 15th commitment to customers

Our approach

Where we are today

What customers should expect

Incident and Maintenance Notifications

Status API Migration

Migration instructions and guidance

Accelerating Investigation and Recovery

Our June 15th commitment to customers

Where we are today

Our ongoing commitment

Ready to Get Started?

More from the Author

How Fastcall Delivers Enterprise-Scale Voice and Messaging with Heroku

From Fragmented Logs to Full-Stack Visibility with SolarWinds Papertrail

Corrective Action Update for the Heroku June 10th Outage

Ensuring Immutable Infrastructure

Our June 15th commitment to customers

Where we are today

Increasing Resilience of Communication Channels

Our June 15th commitment to customers

Our approach

Where we are today

What customers should expect

Incident and Maintenance Notifications

Status API Migration

Migration instructions and guidance

Accelerating Investigation and Recovery

Our June 15th commitment to customers

Where we are today

Our ongoing commitment

Related Posts

Ready to Get Started?

More from the Author