Search overlay panel for performing site-wide searches

Boost Performance & Scale with Postgres Advanced. Join Pilot Now!

Lex Neva

SRE at Heroku
Heroku Staff

Lex is interested in all things related to reliability. That includes not just redundant and fault-tolerant design, but also incident response process, human factors, and automation. Lex has given [conference](https://www.usenix.org/conference/srecon16/program/presentation/neva) [talks](https://www.usenix.org/conference/lisa16/training-program/session/retrospectives) and enjoys contributing to [open](https://github.com/heroku/retrodot) [source](https://github.com/lexelby/apiary) [projects](https://github.com/lexelby/inkscape-embroidery).

Back on August 11, 2016, Heroku experienced increased routing latency in the EU region of the common runtime. While the official follow-up report describes what happened and what we’ve done to avoid this in the future, we found the root cause to be puzzling enough to require a deep dive into Linux networking.

The following is a write-up by SRE member Lex Neva ( what’s SRE? ) and routing engineer Fred Hebert (now Heroku alumni) of an interesting Linux networking “gotcha” they discovered while working on incident 930.

The Incident

Our monitoring systems paged us about a rise in…

Subscribe to the full-text RSS feed for Lex Neva.