Search overlay panel for performing site-wide searches

Boost Performance & Scale with Postgres Advanced. Join Pilot Now!

Heroku in the Wild: Saving Lives with Nonprofit Watch Duty

TAGS

  • Heroku in the Wild
  • autoscaling
  • caching
  • scaling

Heroku in the Wild: Saving Lives with Nonprofit Watch Duty

During the Los Angeles wildfires in 2025, more than 10 million people relied on Watch Duty for potentially life-saving information. On this episode of Code[ish], its Cofounder and CTO Dave Merrick talks Julián Duque through how the application works and the technologies like Heroku that make it possible.


Show Notes

Narrator
Hello and welcome to Code[ish], an exploration of the lives of modern developers. Join us as we dive into topics like languages and frameworks, data and event-driven architectures, artificial intelligence, and individual and team productivity. Tailored to developers and engineering leaders, this episode is part of our “Heroku in the Wild” series.

Julián
Hello, hello, and welcome to Code[ish]. My name is Julián Duque, Principal Developer Advocate for Heroku and your host of the Code[ish] podcast. And today, we are going to be talking about a great application and service that serves a lot of users. And even though, like, give them peace and saves lives as well. With us today, we have Dave Merritt. He’s the Cofounder and CTO of Watch Duty. Hello, Dave, how you doing?

Dave
Hi. Thank you for having me. Good to be here.

Julián
Yes, thank you for joining us today on the Code[ish] podcast. And for our audience that doesn’t know or never had the, the opportunity or the need to use Watch Duty, can you tell us what it is and what’s the problem you are solving?

Dave
Yeah, happy to. And, I hope many people haven’t had the need, I guess, to use Watch Duty. So, Watch Duty is a public reporting platform on natural disasters. We provide the gap of information about real-time emergencies like wildfires, natural disasters, to give people time and information to make decisions on what they need to do. So if there’s a wildfire near you and you have our app or you’re looking at our website, you’ll get a push notification alerting you to a new wildfire of significance. And then our human reporters are synthesizing and aggregating data, and providing that to you in updates that say “the fire has grown by 10 acres”, “the fire is increasing in intensity”, “there are new evacuation zones”. Again, trying to give people the information to make good decisions that keep them safe, that give them peace of mind. And I think it really bridges kind of a missing gap between government information, which can oftentimes be delayed or infrequent, and traditional news or social media, which is either fragmented, delayed, or unverified. You know, you don’t want to have to look at Facebook in between posts about your cousin’s birthday party to see information about a wildfire, that could threaten your home, your life, your neighbors, your grandmother’s house. So, we really view ourselves as a public utility, a public service, that happens to be provided by a nonprofit.

Julián
That’s beautiful. And usually these updates are being done by users of the platform, like, regular people that use the platform, or do you have like a people in the field, or monitoring, or partnerships with different other agencies to get this information? How does that part works?

Dave
Yeah. To clarify, it is not crowdsource. It is a group of expert reporters,

Julián
Okay.

Dave
that it all started as a volunteer operation. And the origin is actually quite interesting. A lot of the people that we brought on as original reporters to this platform were doing the same work, but on Facebook or Twitter. So these were people that were invested in delivering good information to their communities, but they used the platforms that they had available to them. So this is a very, very specific platform that these experts in their local communities, and now we have a broad team that gives us much, much better coverage across the entire United States. So they’re well versed in the fire service, they come from a wide range of backgrounds, including ex-firefighters or first responders. So they’re all coordinating and working with each other as a team to provide these updates. So it is verified from experts. We have strict standards on how we do reporting, guidelines, onboarding, training, the whole works.

Julián
That’s wonderful. Yeah, because usually on a crowdsourced platform, there might be like false positives, things that are not like the right information that me as an end user wanna, wanna see. I don’t want to just get false news or get panicked because somebody just decided to report something that is not happening. So having experts, real people that knows what they are doing, it’s great. You mentioned natural disasters, but all the examples where we have seen here are around wildfires. It’s only wildfires? Or you also support other types of natural disasters within the platform?

Dave
Yeah. So we’re actually currently working on also supporting and reporting on flooding.

Julián
Okay.

Dave
Not launched yet and won’t be until later in this year. So right now it’s just reporting on wildfires. But we understood the problem was larger than wildfires when it began. You know, we didn’t put fire in the name for a reason. And we believe that the same idea of expert reporting, synthesis, aggregation, and dissemination, you know, really helpful mobile, map-based application is going to be useful many, many different disasters. But it’s a big undertaking to, to cover more things like this. So we want to take our time to do it well.

Julián
Yeah. Gladly, I live in an area where we don’t have like exposure to wildfires. This is why personally I knew about Watch Duty through like other people that use the service. But I never had the need. But I live in Florida, pretty much two blocks away from the water. And here, every year, we have hurricane season, and we have flooding. And last year was pretty rough for us with everything that happened, and we were missing a way to have better communication. Where to go, evacuation routes, all that information was not available. So having a service like this for those specific also natural disasters will be great. So hopefully you can extend also to cover that in the future.

Dave
Yeah, we really think that it’s a common problem. You know, we we’re going to use a lot of the capabilities we built, both from an operational perspective and a technical perspective, to provide information about evacuation zones, routes, shelters, you know, for flooding there’s more specific things like maybe we can provide an inundation maps or the FEMA flood risk map in a way that you can actually consume on your phone. And I again, I think everyone kind of goes through this journey of being in a natural disaster,

Julián
Yes.

Dave
trying to find the information and then realizing it’s very, very difficult to figure out what’s going on and what to do. And everyone’s doing this kind of on their own. So it’s a big, I think, opportunity to help the public make better decisions.

Julián
Tell me about how was the whole experience during last year fires in Los Angeles. That was a tragedy. I mean, watching it on the news, people that I know that live close by, how they were reporting, coworkers, evacuating because of getting this risk. I imagine that in these type of situations, from a technical perspective, I mean, the human, part here is super important, but also the technical part, by having these, people going to the platform, all of these reports. How was that experience during the LA fires from the human reporter experiences and also from the technical challenges behind of these surge.

Dave
Yeah. It was quite the experience from our perspective, you know, it was a horrible tragedy to witness, our entire organization was focused on it the whole time. And past the, you know, the human tragedy and the loss of property. You know, it was quite a technical stress test for our organization from the engineering and infrastructure perspective. We’ve been growing year over year, pretty much doubling user count each time, and we had large spikes and significant wildfires before. But this was a different order of magnitude. You know, the entire LA county, you know, by the end of the first week, I think we’d had 10 million users, 10 million unique users, which, you know, I think previously our highest monthly active user count was close to 2 or 3. And on a day-to-day basis, it was almost ten x what we’d seen before. So, the engineering team was pretty much up trading shifts for the first three nights and days, trying to keep everything afloat. And then obviously that was true as well for the operations team. But at a technical level, it was, it was quite a challenge. But we were also fortunate that we were far enough into our sort of technical maturity that we had the tools available to maintain uptime, and maintain service during the entire incident.

Julián
Speaking about, like, the technical problem you are solving here. If you can talk about, like, a just quick overview about, architecture or the infrastructure or how the platform is working today, just like an overview of it.

Dave
Yeah, happy to. So we, again this sort of came from the fact that the organization began as a full volunteer operation. But I originally architected everything to be really as simple as possible, and we’ve added complexity as we need it to face new challenges. And new challenges include, you know, 10, 100, 1000 times the traffic that we started with. So we are running Heroku for all of our background services or asynchronous jobs. And that gives all the information that we need to our mobile clients and web services. All the push notifications go through Firebase Cloud Messaging, which is a common bridge to push to iOS and Android. That’s really a fantastic service because we’ve sent over a billion push notifications for zero cost over since the beginning of our organization. And, you know, that is an absolutely critical part of how we send information out. What I think where, you know, sometimes it’s easy to say and think that like, oh, we have scalable, we have horizontally scalable architecture or, you know, even Heroku, you know, it’s easy to, to change the dynos, but there’s always some critical weak link during big scaling events that provides new challenges. And, you know, as much as you try to stress test systems ahead of time, it is pretty hard to replicate, and stress tests something like 100,000 requests a second. So in our case, you know, the the tool that we pulled on hardest in the beginning of this and throughout the, the LA fires was, edge caching. You know, we had recently just before the fires, we had started to work with Fastly through their nonprofit Fast Forward program, and they do edge caching, but we hadn’t actually implemented it. So it was a really interesting problem where we had this tool that was perfect for the job. But we hadn’t actually used it or tested it at all. So, the first day that it was happening, we saw the kind of immensity of the situation and started testing it as the event was going on and implementing it live. What does it look like to put this edge cache out there that’s already, you know, in our DNS? It’s already in the pipeline to our backend servers. But we had not even touched the varnish cache language, which it runs on. So, we had the right tools for the job to keep the service up. But it was also a little stressful to do that live as opposed to the plan that we had, which was, oh, we’re going to implement this January, February, and get it ready for, you know, quote unquote, the next fire season. But the timing didn’t work out. But luckily, we had the tools available, and that edge caching was really critical, you know, 100,000 requests a second at the peak, you know, probably 95, 92% of that traffic, I can’t remember the exact number, was cached at the edge. So that really, really helped with our ability to maintain uptime, which again I think is a big differentiator between our platform and what we’ve seen elsewhere, is that a lot of public tools, for a variety of reasons, just don’t stay up during the emergencies when they’re needed most.

Julián
Capacity planning and scaling planning in these situation is difficult when these events are unpredictable. You don’t know when a tragedy like this one is going to happen. On e-commerce sites, let’s say you know the typical example, okay, we need to be prepared for Black Friday or holidays. And we know that this is the capacity we need to have during these predictable events. But a tragedy can happen like any time. So how do you plan for these, and how do you respond in time so you don’t get a bottleneck while this is starting? How was that process or usually, the lessons learned after what happened a year ago? How do you implement that, scaling and capacity planning during these peak times?

Dave
Yeah, I mean, part of it comes down to overprovisioning, you know, we’d rather spend money and have stability all the time than try to optimize for the infrastructure costs. I have a funny anecdote on our particular version of bursty traffic. We were implementing some functionality with Fastly for DDoS protection, and they had a product that provides automatic DDoS protection and passed their kind of like network level DDoS protection. And I turned it on for a week in testing mode, took a look, everything looked good. And then after I flipped it on for real about four days later, I realized we were getting some traffic blocked. That was no good. Turned it off immediately after we figured out what’s going on. And ended up doing a chat with the product team about it. And the way that our traffic appears for their system, it was indistinguishable from DDoS traffic because we have highly localized bursts of traffic with absolutely no pattern, because it’s based on events. So you’re getting, you know, you get a fire near a major population. We send a push notification out to 2 million people might pick up their phone at the same time. And that really is closer to a botnet DDoS attack than, I would say, most web traffic. So with that in mind, you know, even, you know, we have autoscaling set up in Heroku for our, our backend web servers. But even that takes, you know, 30 seconds, 60 seconds to kick in between both the signal detection, the auto scaling command, and then resources being available. But when we send a push notification out, you know, we try to send all push notifications out in less than 60 seconds, no matter how many people there are. That’s our internal goal. So you can easily have, you know, half a million, a million people picking up their phone and in a 20-30 second period. So, a lot of what I would say traditional auto scaling still doesn’t really work for us. So again, that’s where kind of having a really high-granularity cache on the edge is really the thing that saves us. We’re a very, very read-heavy site. And in some ways, that’s simple to do caching with. But caching is always a little more complicated than you think. The other challenge that we face on a technical level with caching is that data has to be up to date. If you get a push notification saying that there’s a new fire or an update about a fire, it doesn’t matter how fast you pull up your phone. You also need to see that data reflected in the application. So it’s kind of this combination of wanting to maintain a really high level of cache ship rate to protect ourselves as the first line of defense on scaling, but also to have a really either low TTLs or very granular cache validations.

Julián
That’s beautiful, and all of this architecture, I mean, initially, you told me, like, okay, I want to keep this simple. But every time you explain, and you tell me the story, it gets more complex, more elaborate, more pieces that need to fit together play well. How is the team, like the technical team behind, taking care of this infrastructure, this architecture?

Dave
Yeah, and just to the complexity side, you know, we’re coming up on our fifth birthday. So, as an organization, as a nonprofit, as technical decisions, I recently looked back at kind of all the decisions and when we added things and it really has been a combination of proactive and reactive. But I don’t think we would have done as well with the LA fires if it happened in our first year, to put it bluntly. You know, there was a technical maturity, and I think we’ve been doing a good job of adding what we need at the right time instead of prematurely optimizing. And then on the, on the current maintenance, you know, I think during the LA fires, our engineering team was only four people. Now we’re about ten going to be hiring a little bit more. So the team is growing, grew quite a bit in 2025. But it’s still a really small engineering team. And I think that’s where you know, we want to avoid complexity for the sake of it or complexity for the sake of over optimization or optimization. But, you know, critical things like an edge cache or push notification systems or things like that, those have inherent complexity that are critical to kind of what our service is. You know, we view uptime reliability as a core platform feature and functionality. You know, we build more and more stuff each year. The product grows in its service area and its functionality. But the other piece to us is, really, how do we run an incredibly stable and robust system?

Julián
You mentioned that you have been increasing coverage all across the United States. So it means you have been growing. Is the team like located in one place? Is it distributed? How it looks like that infrastructure management from a team a point of view, in the location? It’s all of them in one place? Or you have a physical office, or it’s distributed?

Dave
Yeah, we don’t have an office; we’re completely distributed. There’s kind of a, I guess, a density of people on the West Coast, but we have people on the East Coast as well. It’s been a really great hiring experience to get people that are incredibly motivated, senior software engineers that care deeply about this mission. And I think it’d be much more challenging to hire the same group of people if it had to be co-located for an office. You know, there’s challenges with a remote team, but I think it’s also been great for coverage on call, you know, having people that are awake three hours before me is fantastic. When stuff happens at my 4:00 am, and it’s, you know, 7 a.m. for them on the East Coast, I think the remote team experience for us has been great. It’s also great to have a growing team for a larger on call. You know, because, again, stability in the on-call response is very important. You know, sometimes 20 minutes could be the difference between people, you know, receiving information that lets them make like a life-saving decision or not. So we take the kind of mean time-to-response quite seriously as well.

Julián
That’s beautiful. And now not only the technical team but also the experts that you mentioned, the people that are reporting what’s happening. What’s the medium they use to do these reports? Is there a specific platform you use or a application for these experts? How do you manage to report, verify, and approve what’s happening during a event?

Dave
Yeah, so Watch Duty as a whole org runs through Slack, and that’s the main medium of collaboration and work streams for the reporters. They’re also completely distributed. You know, one of our first reporters, actually lives in New Zealand and had been reporting on

Julián
Wow.

Dave
California wildfires for years before we began. But he provided, for a while, our only nighttime coverage. We need to have 24-hour coverage. So it’s on that side, it’s a whole different set of kind of operational challenges. We’ve subdivided regional areas like California into different, you know, 12 different Slack channels. And they collaborate in those, you know, the engineering team has done a lot of automation work to kind of help their processes. And a lot of that information ends up going into Slack as well for them to interact with as well, you know, as kind of a, an internal view to our application and website that they use as well for reporting.

Julián
That’s wonderful. I mean, yes, a great tool to use for do all of that orchestration of people. One question, how’s the new AI ecosystem, the generative AI, machine learning models, all of these new things that are happening right now, is this, helping Watch Duty in one way or another, you are thinking about implementing any, any sort of AI capabilities, internal or externally, or this is something that it’s better to be managed by just 100% humans behind the show.

Dave
Yeah, I mean, I think it’s an incredibly interesting tool that’s still developing, and the way that we’ve been very opportunistic and how we use it. At the core, we are human verified in everything that we do. So when we use AI or generative AI or machine learning or whatever the term is for what we’re using. It always goes back to a human in the loop. And I think that’s really a powerful way to use it right now is that, you know, especially for stuff like we work in with life and safety and public safety, it’s worth the extra verification steps. And if we can use generative AI or machine learning to speed up the process or reduce overhead for our reporters, it’s incredibly useful. You know, we’ve used it in a handful of ways. We get a lot of inbound information from partners, agencies, fire agencies, and it’s all just going to an email address. So, we quickly realized that the scale of inbound information there in unstructured formats was overwhelming to a single Slack channel. So we started doing regional routing of that information based on the content of the emails. Generative AI is fantastic at that, and it has saved us a bunch of work. It’s the kind of project that would be incredibly hard to do five years ago. They may not even have the name of the fire, they may not have an address, they may not have a state, you know there’s a lot of guesswork and variation if it’s really just somebody’s email. So that kind of extraction of geo information from unstructured text has been very, very useful for us. We’ve also used it for social media monitoring. You know, unfortunately, sometimes the only place that official information is published is Facebook and Twitter. And we realized that our reporters they had 10, 20, 50 tabs open and were refreshing it, especially during the extended aspects of an incident towards when the fire’s in clean up or mop up. You know, those updates really only come out on those platforms. And we were like, okay, we’ll build a tool that monitors all these pages and sends a message to Slack when there’s a relevant update. So that’s been a huge uplift as well. We’re also embarking on a project to start trying to transcribe the radio traffic for wildfire response and disaster response. Because right now, you know that’s where a lot of the information that reporters report on comes from. We have a lot of automation around finding new fire starts and alerting them of that, but they’re listening to firefighters and agencies discuss what’s going on and synthesizing that into the report. So that’s kind of the core information that’s being consumed and then synthesized by our reporters, and not having that in any structured way has really limited our ability to kind of, provide automation help to the reporter. So we’re really excited about that transcription project.

Julián
Great examples of using the right tool for the job. I mentioned machine learning and AI because I remember last year we were talking here on the podcast with a company they specialize in vision models, like the deployment of vision models. And they were working on use case getting data from cameras to specifically detect wildfires as well. So they were doing all of that detection using machine learning, using AI. So I see that there is also like, like a good feed of being able to have that technology at hand for these use cases.

Dave
Yeah, we actually partner with someone that provides us those AI image detections on certain wildfire cameras. So again, it’s again, I think a matter of focus for what Watch Duty does. We want to partner with people that do super-specific things like that, but that may not be our core competency. But I think the vision detection of smoke is is definitely something that’s going to help with how quickly we understand where new fires are.

Julián
Beautiful. All of these, like from the human perspective, the technology behind who you are serving communities and saving lives, pretty much. I’m inspired by what you do. And you mentioned this is a nonprofit. So right now, a bunch of entrepreneurs, people that are like approaching all of these generative AI boom to create new companies, creating new startups, new projects. How can we inspire these builders to create technology for good, rather than just for profit or for any other resource? How can we get all of these, new builders to get into the bandwagon of solving these real problems, to save lives, to do it for good? Do you have like, an advice or words of inspiration? What motivated you at the beginning?

Dave
Yeah. It’s a great question because I think the path to what Watch Duty has done is uncommon. I’m not familiar with everyone in every space on the nonprofit side, but Watch Duty’s approached this like we’re starting a normal business, you know? John and I, the co-founders, came from software in the Bay Area. What we found was that there is funding, if you solve a problem, and you can still make a healthy wage for yourself and your family, and you can solve a fantastic problem. So if you have, if you see a problem, it doesn’t need to be solved in a for-profit way. I think that the avenues to that funding, whether it’s foundational grants or startup, you know, nonprofit startup programs, are not very obvious. And I think that’s what kind of motivates people, the money side is also a big thing, but I think it’s not always clear that this is an option for how to solve things. I think people think of nonprofits as yearly funding requests, you know, end of year, if we don’t get this funding, we’re going to have to shut it down. You know, Watch Duty is now pretty much self-sufficient between foundational grants, donations from the public, and an enterprise plan for utilities and other people with physical infrastructure that need to know about disasters. So we want to run this like something that will operate on its own. We don’t want to have to do yearly fundraising cycles, but I think it’s also incredibly rewarding to do this work. It’s rare to get the opportunity to really impact the public in such a way, and I think that there are a lot of problems that face the world right now and face the public that do fit this model of how do we provide a free or low-cost service to the public to solve a specific need? And offer it to the world, offer it to the US, offer it out, and then if you’re really solving a big problem, I think the funding can find its way to that.

Julián
This is beautiful. Super inspiring. I hope the people that are listening to this now start thinking about ideas, how you can use technology and your skills for good. Dave, thank you so much, not only for sharing your story, but also for sharing what Watch Duty does, which is pretty much the mission. But also, thank you for what you build and the company you built, to support the community and save lives.

Dave
Well, thank you very much. It’s, it’s a huge team and, there have been many years of effort by so many people to get it here. It’s been quite a journey, and we’re happy to keep doing it and keep growing it. It was nice to chat with you, and I hope you have a good day.

Julián
Of course. Thank you so much. And see you on the next episode of the Code[ish] podcast. Bye-bye.

Narrator
Thanks for joining us for this episode of the Code[ish] podcast. Code[ish] is produced by Heroku. The easiest way to deploy, manage, and scale your applications in the cloud. If you’d like to learn more about Code[ish] or any of Heroku’s podcasts, please visit heroku.com/podcasts.

About Code[ish]

A podcast brought to you by the developer advocate team at Heroku, exploring code, technology, tools, tips, and the life of the developer.

Subscribe to Code[ish]

This field is for validation purposes and should be left unchanged.

Hosted By:
Julián Duque
Julián Duque
Principal Developer Advocate, Heroku
@julian_duque
with Guest:
Dave Merrick
Dave Merrick
Cofounder and CTO, Watch Duty