Engineering
- Engineering
- Last Updated: June 03, 2024
- Tom Crayford
At Heroku, we're always striving to provide the best operational experience with the services we offer. As we’ve recently launched Heroku Kafka, we were excited to help out with testing of the latest release of Apache Kafka, version 0.10, which landed earlier this week. While testing Kafka 0.10, we uncovered what seemed like a 33% throughput drop relative to the prior release. As others have noted, “it’s slow” is the hardest problem you’ll ever debug, and debugging this turned out to be very tricky indeed. We had to dig deep into Kafka’s configuration and operation to uncover what was going…
- Engineering
- Last Updated: May 26, 2016
- Andrew Gwozdziewycz
For almost two years now, the Heroku Dashboard has provided a metrics page to display information about memory usage and CPU load for all of the dynos running an application. Additionally, we've been providing aggregate error metrics, as well as metrics from the Heroku router about incoming requests: average and P95 response time, counts by status, etc. Almost all of this information is being slurped out of an application's log stream via the Log Runtime Metrics labs feature. For applications that don't have this flag enabled, which is most applications on the platform, the relevant logs are still generated, but…
- Engineering
- Last Updated: March 01, 2016
- Damien Mathieu
I spend most of my time at Heroku working on our support tools and services; help.heroku.com is one such example. Heroku's help application depends on the Platform API to, amongst other things, authenticate users, authorize or deny access, and fetch user data. So, what happens to tools and services like help.heroku.com during a platform incident? They must remain available to both agents and customers—regardless of the status of the Platform API. There is simply no substitute for communication during an outage. To ensure this is the case, we use api-maintenance-sim, an app we recently open-sourced, to regularly simulate Platform API…
- Engineering
- Last Updated: February 22, 2016
- Richard Schneeman
The asset pipeline is the slowest part of deploying a Rails app. How slow? On average, it's over 20x slower than installing dependencies via $ bundle install. Why so slow? In this article, we're going to take a look at some of the reasons the asset pipeline is slow and how we were able to get a 12x performance improvement on some apps with Sprockets version 3.3+. The Rails asset pipeline uses the sprockets library to take your raw assets such as javascript or Sass files and pre-build minified, compressed assets that are ready to be served by a production…
- Engineering
- Last Updated: June 03, 2024
- Ryan Brainard
Heroku has years of experience operating our world-class platform, and we have developed many internal tools to operate it along the way; however, with the introduction of Heroku Private Spaces, much of the infrastructure was built from the ground up and we needed new tools to operate this new platform. At the center of this, we built a new operations console to give ourselves a bird's eye view of the entire system, be able to drill down into issues in a particular space, and everything in between. The operations console is a single-page React application with a reverse proxy on…
- Engineering
- Last Updated: November 04, 2015
- Owen Ou
If your application is successful, there may come a time where you’re on an unsupported version of a dependency. In the case of the Heroku Platform API, this dependency was a very old version of Active Record from many years ago. Due to the complexity involved in the upgrade, this core piece of infrastructure had been pegged at version 2.3.18, which was released in March 2013. We're happy to announce that we've overcome the challenge and are now running Active Record 4.2.4, the latest version, in production. In this post, we'll describe the technical challenges we faced in the upgrade…
- Engineering
- Last Updated: October 21, 2015
- Fred Hebert
HTTP routing on Heroku is made up of three main logical layers: The state synchronization layer ensures that all nodes in the routing stack are aware of the latest changes in domains, application, and dyno locations across the platform; The routing layer chooses which dyno will handle an HTTP request (random or sticky), performs logging, error-reporting, and so on; The HTTP proxying layer handles the validation, normalization, and forwarding of requests between clients and dynos. This last part is the one the platform team is happy to open-source today with the Vegur library. Vegur can be thought of a bit…
- Engineering
- Last Updated: August 06, 2015
- Richard Schneeman
In a recent patch we improved Rails response time by >10%, our largest improvement to date. I'm going to show you how I did it, and introduce you to the tools I used, because.. who doesn’t want fast apps? In addition to a speed increase, we see a 29% decrease in allocated objects. If you haven't already, you can read or watch more about how temporary allocated objects affect total memory use. Decreasing memory pressure on an app may allow it to be run on a smaller dyno type, or spawn more worker processes to handle more throughput. Let's back…
- Engineering
- Last Updated: July 30, 2015
- Kevin Thompson
Earlier this month, the OpenSSL project team announced that three days later it would be releasing a new version of OpenSSL to address a high-severity security defect. In the end, this vulnerability resulted in another non-event for our customers, but we thought it might be useful and informative to share the process we went through to prepare for the issue. (more…)
- Engineering
- Last Updated: June 30, 2015
- Pedro Belo
Fun fact: the Heroku API consumes more endpoints than it serves. Our availability is heavily dependent on the availability of the services we interact with, which is the textbook definition of when to apply the circuit breaker pattern. (more…)
Subscribe to the full-text RSS feed for Tom Crayford.