Search overlay panel for performing site-wide searches
Salesforce (Heroku) Named a Leader. Learn More!

Stuff Goes Bad

The Heroku Routing team does a lot of work with Erlang, both
in terms of development and maintenance, to make sure the platform scales smoothly
as it continues to grow.

Over time we've learned some hard-earned lessons about making systems that can
scale with some amounts of reliability (or rather, we've definitely learned what
doesn't work), and about what kind of operational work we may expect to have
to do in anger.

This kind of knowledge usually remains embedded within the teams that develop
it, and tends to die when individuals leave or change roles. When new members
join the team, it gets transmitted informally, over incident simulations, code
reviews, and other similar practices, but never in a really persistent manner.

For the past year or so, bit by bit, I've tried to grab the broad lines of this
knowledge and to put it into a manual, that we're proud to release today.

Book cover

From the introduction:

This book intends to be a little guide about how to be the Erlang medic in a
time of war. It is first and foremost a collection of tips and tricks to help
understand where failures come from, and a dictionary of different code
snippets and practices that helped developers debug production systems that
were built in Erlang.

This is our attempt at bridging the gap between most tutorials, books, training
sessions, and actually being able to operate, diagnose, and debug running
systems once they've made it to production.

This manual adds to the Routing team's efforts to interact with the Erlang (and
polyglot) community at large, sharing knowledge with teams from all over the
place. It is available in PDF for free, under a
Creative Commons License,
at erlang-in-anger.com

It comes just in time for the Chicago Erlang conference, dedicated to real world applications in Erlang, where you'll be able to talk to a few members of Heroku's Routing team, and a bunch of regulars from
the Erlang community.

We hope this will prove useful to the community!

Also, Heroku is hiring! Check out our jobs page for
opportunities to work on production systems at scale.

Browse the archives for Engineering or all blogs. Subscribe to the RSS feed for Engineering or all blogs.