Engineering
- Engineering
- Last Updated: December 19, 2019
- Ben Fritsch
This blog post is adapted from a lightning talk by Ben Fritsch at Ruby on Ice 2019. There can be a number of reasons why your application performs poorly, but perhaps none are as challenging as issues stemming from your database. If your database’s response times tend to be high, it can cause a strain on your network and your users’ patience. The usual culprit for a slow database is an inefficient query being executed somewhere in your application logic. Usually, you can implement a fix in a number of common ways, by: reducing the amount of open locks…
- Engineering
- Last Updated: June 03, 2024
- Richard Schneeman
Update: On closer inspection, the lock type was not on the table, but on a tuple. For more information on this locking mechanism see the internal Postgresql tuple locking documentation . Postgres does not have lock promotion as suggested in the debugging section of this post.
I maintain an internal-facing service at Heroku that does metadata processing. It’s not real-time, so there’s plenty of slack for when things go wrong. Recently I discovered a Postgres performance issue that bogged down the system to the point where no jobs were being executed at all. After hours of debugging, I found…
- Engineering
- Last Updated: December 18, 2019
- Julián Duque
This blog post is adapted from a talk given by Julián Duque at NodeConf EU 2019 titled " Let it crash! ."
Before coming to Heroku , I did some consulting work as a Node.js solutions architect. My job was to visit various companies and make sure that they were successful in designing production-ready Node applications. Unfortunately, I witnessed many different problems when it came to error handling, especially on process shutdown. When an error occurred, there was often not enough visibility on why it happened, a lack of logging details, and bouts of downtime as…
- Engineering
- Last Updated: October 31, 2019
- Jason Draper
As an experiment to see how static typing could help improve our team’s Ruby experience, we introduced Sorbet into a greenfield codebase with a team of 4 developers. Our theory was that adding static type checking through Sorbet could help us catch bugs before they go into production, make refactoring easier, and improve the design of our code. The short answer is that yes, it did all of that! Read on to learn a little more about what it was like to build in a type safe Ruby.
The Sorbet project’s logo
Static typing vs dynamic typing
…
- Engineering
- Last Updated: April 29, 2024
- Bernerd Schaefer
Over the past four years, the Heroku Runtime team has transitioned from occasional, manual deployments to continuous, automated deployments. Changes are now rolled out globally within a few hours of merging any change—without any human intervention. It's been an overwhelmingly positive experience for us. This post describes why we decided to make the change, how we did it, and what we learned along the way.
Heroku’s Runtime team builds and operates Heroku’s Private Space (single-tenant) and Common Runtime (multi-tenant) platforms, from container orchestration to routing and logging. When I joined the team…
- Engineering
- Last Updated: August 28, 2019
- Corey Purcell
As outlined in a previous blog post , Heroku Data services undergo routine maintenances for security and patching. In this post, we describe the process used to minimize downtime for Heroku Postgres and Heroku Redis premium ‘High Availability’ plans and how we optimized the process to perform up to 75% faster.
High availability plans for Postgres and Redis are designed to have two database instances running at the same time. One is a writeable primary database server and the other is a read-only hidden standby. Since the standby is hidden, customers cannot…
- Engineering
- Last Updated: May 16, 2024
- Ariana Escobar, Jamie White
This is the second post in a two-part series about accessibility. The first post shares why designing for accessibility is important to us and why we encourage you to incorporate it into your software design process.
Heroku’s first accessibility initiative was to reach Level AA for luminance contrast ratio as defined by the internationally recognized best practices of the Web Content Accessibility Guidelines (WCAG) 2.0 . This ratio guarantees the legibility of text against its background, in order to ensure all users can perceive Heroku’s user interfaces equally.
This benefits people with color-vision deficiencies (like Deuteranopia or Protanopia which affect…
- Engineering
- Last Updated: June 03, 2024
- Becky Jaimes
Every organization needs to be data-driven in order to be successful. Whether you're tracking an application's performance, incoming support tickets, or revenue rates, different components of any company depend on metrics that inform the health of the business.
At Heroku, we're hackers to the core, but that doesn't mean we're all programmers. We build on top of our own platform for everything we do, and one of the products we often use is Heroku Dataclips. If you haven't heard of them before, Heroku Dataclips allow you to create SQL queries in a web GUI that run on your…
- Engineering
- Last Updated: July 12, 2019
- Richard Schneeman
For quite some time we've received reports from our larger customers about a mysterious H13 – Connection closed error showing up for Ruby applications. Curiously it only ever happened around the time they were deploying or scaling their dynos. Even more peculiar, it only happened to relatively high scale applications. We couldn't reproduce the behavior on an example app. This is a story about distributed coordination, the TCP API, and how we debugged and fixed a bug in Puma that only shows up at scale.
First of all, what even is…
- Engineering
- Last Updated: July 11, 2019
- Ali Hamidi
This blog post is adapted from a talk given by Ali Hamidi at Data Council SF ’19 titled “ Operating Multi-Tenant Kafka Services for Developers on Heroku .”
https://www.youtube.com/embed/-AtHKoTNR1k
Thousands of developers use Heroku’s Apache Kafka service to process millions of transactions on our platform—and many of them do so through our multi-tenant Kafka service. Operating Kafka clusters at this scale requires careful planning to ensure capacity and uptime across a wide range of customer use cases. With significant automation and test suites, we’re able to do this without a massive operations team.
In this post,…
Subscribe to the full-text RSS feed for Engineering.