Looking for more podcasts? Tune in to the Salesforce Developer podcast to hear short and insightful stories for developers, from developers.
101. Cloud Native Applications
Hosted by Joe Kutner, with guest Cornelia Davis.
Too often, there's an assumption that putting one's application "in the cloud" simply means hosting your code on a server somewhere--but that's just the beginning. Guest Cornelia Davis, CTO of Weaveworks, talks with Joe Kutner about what it really means to be a cloud native application, discussing everything from resiliency patterns to deployment practices.
Host Joe Kutner is an architect working at Salesforce, and his guest is Cornelia Davis, the CTO of Weaveworks, a platform for infrastructures. Cornelia argues that most companies building complex web-based applications are doing so without fully understanding the unique operational challenges of that environment. Even several well-known patterns, such as adding circuit breakers or retry patterns, are not standardized across the industry, and certainly not across languages, let alone in frameworks and other easily consumable dependencies. In many cases, there are over reliances on infrastructure availability that only become obvious once a problem occurs. Cornelia gives the example of a massive AWS outage that occurred several years ago. For many companies lacking redundancy contingencies, their applications were offline for hours, through no fault of their own.
Another potential conflict between operational patterns and software design emerges around container-based lifecycles. If you have a new application configuration that you want to deploy, Kubernetes, which is designed to be stateless, encourages you to simply get rid of a pod and start up a new one. But it's entirely possible that there's some running code that doesn't know how to pick up these new changes, or even a service which can't recover from unexpected downtime. Considering these issues is the difference between running the cloud and being truly cloud native.
To the industry's credit, Cornelia does see more platforms and frameworks adopting these patterns, so that teams don't need to write their own bespoke solution. However, it's still necessary for software developers and operational engineers to know the features of these platforms and to enable the ones which make the most sense for their application. There is no "one size fits all" solution. As the paradigms mature, so too does one's knowledge of the interconnected pieces need to grow, to prevent unnecessary errors.
Links from this episode
Joe: Hello and welcome to the Code[ish] podcast. I'm Joe Kutner, your host for today. I'm an architect working on the Heroku and Salesforce platforms and with me is Cornelia Davis. Today, we're going to talk about Cloud Native, the Cloud Native Patterns. And, hopefully, today I'll learn if I am a Cloud Native developer. So Cornelia, welcome to the podcast. Can you tell us a little bit about yourself?
Cornelia: Yeah. My name is Cornelia Davis. I am currently the CTO at Weaveworks. Weaveworks, just for those listeners who may not know it, you might know, actually, we've worked through a number of open source projects like Flux and Flagger and Weave Net and some of those things, but we are really in the Cloud Native operations space. We have coined the term... My CEO, Alexis, has coined the term GitOps. That's the space that I work in now, but I've spent the last decade or so working in cloud application platforms. Prior to joining Weaveworks, at the beginning of this year, I was at Pivotal, where I worked on Cloud Foundry, initially helped bring that product to market and worked with a lot of customers to make them successful in the cloud using a Cloud Native application platform. And then later on, more recently in the last four years or so, have really focused on Kubernetes, which, ironically, is... I wouldn't call it an application platform. I call it more of a Cloud Native infrastructure platform. And so kind of bringing those two worlds together.
Joe: Very cool stuff. Can you define GitOps? That's a term you mentioned. And can you kind of maybe give a formal definition and talk about what some of the implications are?
Cornelia: And so we figured out what those patterns were, so that we could... circuit breakers, and retry patterns, and things like that. We have not, as an industry, figured out what it means really to do operations in this cloud setting, where everything is highly distributed and constantly changing.
Cornelia: And that's really what GitOps is focused on. It's focused on a new paradigm for doing operations. Now, the fact that git is in there, it's a snazzy name. And I like to say that GitOps is the central square on the buzzword bingo card these days. So it's kind of a snazzy term, but I like to emphasize the opts part more than the git part. Git does play a role in that a couple of the key patterns are declarative configuration and having an immutable version history of those declarative configurations and git happens to be a really good tool to do that. So the git is really kind of hinting at one of these Cloud Native operational patterns. But I think of GitOps as the whole broader spectrum of the set of patterns that we're going to use to do operations in a Cloud Native setting.
Joe: Yeah, that makes sense. So in your book, you have a statement that I think is related to this and I really like that "The cloud is where we are doing our computing and Cloud Native is how we're doing it." And so you consider GitOps or any of those operational patterns as part of that how. Is that right?
Cornelia: Exactly. And it's more, again now, on the how of how we keep these things running in production. How do we upgrade them? How do we... So one of the patterns in the book actually starts to talk about different deployment things like blue-green deployments or canary deployments. And it talks about those things, I actually talk about them in the book, from the concept, but then I talk about the architectural patterns that need to be in place in your application to support those operational patterns of progressive delivery.
Cornelia: I consider the operational patterns to be, and I've actually heard somebody once say that the softer architecture patterns that we have, they referred to it as designing for operations. And so it's designing for Cloud Native operations.
Joe: The day I read that statement, I had a doctor's appointment and I was talking to my doctor and he asked, "What do you do for a living? What kind of software?" And so I told him I worked in the cloud and that led to him talking about where he stores his photos. And I sort of just let them go and like, yeah, sure. Not really the same thing, but I feel like that statement's going to help me articulate just what I do on a day-to-day basis better. So thank you for that.
Cornelia: My pleasure.
Joe: Do you think there's such a thing as running in the cloud and not being Cloud Native? Are there anti-patterns that were maybe falling into his traps?
Cornelia: Yeah, I mean, that's absolutely a big part of what we've been doing as an industry is helping people understand that going to the cloud... So going to the place of the cloud doesn't mean that you're doing things in a Cloud Native way. And in fact, I mean, the first four words in chapter one of my book are, "It's not Amazon's fault." And I start off by talking about an outage that Amazon had and how there were a whole bunch of online properties, like IMDB and even Nest, that were down for longer than the AWS outage was because by the time the outage came back, and the outage was like five hours... By the time they recovered after the outage was resolved, they were offline for six, seven, eight hours. But Netflix was... Actually, I have a quote in there where, in a blog post, they said, "Yeah, we suffered a brief availability blip."
Cornelia: I mean, for them, it was literally a shoulder shrug. And that's the difference between Cloud Native and being in the cloud. So if we don't follow some of these patterns... And again, going back to the operations and the software architecture patterns, if we don't follow the software architecture patterns, there's a whole host of things that can go wrong. But then, even if we do that and don't adopt these new operational practices, then, again, we're going to be in a world of hurt because when things change, when Amazon has that outage, which they're going to have, they never promised you that they'd never have an easy outage or a region outage. That's why they give you multiple AZs or multiple regions, it's up to you to embrace those. And so, yeah.
Cornelia: We do things like, I talk in the book about, please don't use sticky sessions. When you do sticky sessions, then what that does is it binds your user experience to a particular node and, now, you can't even apply some of these operational patterns. You can't do a rolling upgrade or you can't do... You have to take a maintenance window and you have to wait for things to drain away and all of that stuff. And how do you know when a sticky session can be drained away because it's no longer an open socket, it's something else? And there's lots of mistakes that we can make. And again, I think we're getting better at it from the architectural patterns, but we're still have a ways to go on the operational side.
Joe: Yeah. I think that sticky sessions are a great example of one of those anti-patterns. And I think we're definitely seeing the industry start to move away from those. And it, in my mind, relates very much to the Twelve-Factor App because that was one of those patterns that, when we would talk about the twelve-factor app and stateless processes, was something that we had to, as a community, as an industry, sort of move ourselves forward from. So I think there are some other examples that you talk about in your book, different patterns that are very much related. I think you have a great chapter talking about configuration and environment variables. Can you talk a little bit about the different types of configurations and really how, I think, maybe some of the ideas in your book have gone further than what was originally stated in the twelve-factor app?
Cornelia: So on application configuration, what I think is super interesting about application configuration is the relationship to lifecycle. And so that's a big part of what I did in that chapter. And I think that Application Lifecycle and Configuration were two adjoining chapters, if I remember correctly. And I did that on purpose because of the relationship between them. Now, even today, I still find what I would consider application configuration deeply embedded inside of a code base. There's some value that's in there. Now we've gotten better at that, we've pulled that out, and at least put it into a Properties file. But then there's the question of, is the Properties file compiled into the binary that's distributed? Or is that something that is added later? When you're using something like the Spring Framework, you can have other ways of doing that.
Cornelia: And now the twelve-factor app actually suggested, well, let's use something completely neutral. Let's not worry about whether it is a .properties file in Java, or something else for Ruby, or something else for C-sharp, or something like that. There's something that's uniform across all of these different environments and that's Environment Variables. So let's use environment variables... And what's cool about the environment variables is it allows you then to draw them from a number of different sources.
Cornelia: So you can use some of these operational patterns, something like GitOps, for example, you can use that to deliver environment variables, or you could use some type of configuration service, or there's a number of different ways that you can inject those things into environment variables. And so it's nice from that perspective... The environment variables are nice from that perspective in that it allows you from within your code to just say, look, I don't have to worry about the mechanism.
Cornelia: I know that there's something that's ubiquitous across all of these environments. The relationship, though, to application lifecycle is really interesting because so when you change that environment variable, let's say somebody comes along and wants to do a credentials rotation or something like that, you change a value and environment variable, what is the cycle for picking up that environment variable from within the code?
Cornelia: And this is where the relationship to operational patterns is really interesting as well and tied back to the whole notion of twelve-factor apps and statelessness is that let's take a containers-based way, let's say like something like Kubernetes, let's say my thing is running in that container and I want to deliver new application configuration, this is actually orthogonal to whether it's an environment variable or not. I want to refresh the configuration.
Cornelia: One of the ways that I can do it is I can just say, "You know what? I'll just get rid of that pod and I'll stand up a new one." It's stateless and that allows us some flexibility. It allows us to have some application code that maybe isn't designed in such a way that you can kick it when there's new configuration and have it reinitialize itself, maybe it only can do that on initialization. But now, you've applied some of these other patterns, like statelessness, that allow me to have an operational pattern that I can use to refresh the configuration.
Joe: I love that those ideas sort of harken back to the principles of the twelve-factor app that I think you framed it as the neutrality of environment variables, but still leaves space for accommodating some of these new concerns. For example, as we talk about containers, and I think you'll start to see this on the Heroku platform in the future, too, other mechanisms for providing, in a neutral way, secrets and things like that, that give you the flexibility to, for example, roll credentials without restarting the process, those kinds of things.
Cornelia: Yeah, which of course means that the process has to embrace that. That puts an onus on the process to be able to be refreshable without having to be rebooted.
Joe: You have these three categories in your book of Cloud Native Apps, Cloud Native Data, and Cloud Native Interactions. And I definitely see, between Apps and Interactions, potentially that boundary where I certainly think of a platform like Heroku being able to handle certain interaction patterns for you, but I'm not sure if it's as clear cut as all of the interaction patterns. So I'm curious if you have a better framework or definition for how to decompose those.
Cornelia: Yeah. And what's interesting is that I have this very firm belief that, like I said, I have about 30 or so patterns that I cover and they're listed on the inside covers of the book and I cover all of them. But, honestly, I think that the ones the developer's responsible for implementing themselves is probably maybe only a third of those. Certainly not more than half of them because they can leverage a number of other things for those implementations.
Cornelia: Now, the platform, I think, is one of the places that you can have that, and you're right, that there are a number of patterns. But even before we go there, it's not only the platforms, it's also the frameworks. And so one of the things that I did in the book was I was using the Spring Framework. And the Spring Framework implements... Many of those provides an implementation.
Cornelia: Now, it's up to you as the developer to know that that implementation exists in the framework, to include it, to configure it the right way and all of that stuff. It's also, by the way, up to you as a developer to know exactly what the platform offers. And so that's why I say it's important for the developer to understand those 30 patterns, but they don't necessarily have to implement them.
Cornelia: But one of the things I find really interesting here is that there are some examples in the book where I use the Spring Framework to implement the pattern in a concrete example. So I talk about the pattern, explain it, and then I use it, a concrete example, that my readers can actually follow along. They can pull it out of the GitHub repository. Some of those patterns, if I was writing the book today, if I was writing that chapter today as opposed to writing it two years ago or three years ago, I would probably have the platform do.
Cornelia: So, for example, some of those patterns are now implemented in Istio or implemented in a service mesh. And so that kind of transition from even less responsibility on the developer to do the implementation and then test it within there, but they're actually now using a platform primitive to leverage that pattern to make their software resilient and have certain characteristics in the way that it runs in the cloud.
Joe: Does that just reflect sort of a maturity in the industry that these things are becoming more integral to the technologies that we would choose as part of our platform?
Cornelia: I absolutely think so. I think it's an indication of the maturity of platforms in general. And I spend a lot of time with Kubernetes and I think that the reason that Kubernetes took off wasn't because it was the best orchestration, container orchestration, engine out there, but because it was the best... It was a platform for building platforms. It's designed in such a way that you can add something like service mesh, add something like Istio and Envoy to the mix. And now some of the things that Cloud Foundry, for example, we had a number of services that we created for you that would tie in... So here was where we were actually starting to take some of the things that Netflix had done. So they had created, for example, Eureka. We offered that as a service on the platform, but then it, in particular, integrated best with the framework, which is the Spring Framework.
Cornelia: So there was still a tie there. And you're right. I think I hadn't even thought of that, but thinking about it, what we're seeing is kind of this next level of maturity, where now, in fact, I don't need to do anything on the code side. I take all of that that was in the framework and I implement in the sidecar, for example. But that does introduce some new patterns, like now you're just talking to localhost and you no longer have to worry. Your local code can just talk to localhost. And it doesn't have to worry about doing the service discovery protocol because you've taken client-side load balancing and said, "Okay. I'm going to let the sidecar deal with that. I'm going to let Envoy deal with that," for example.
Joe: So going along with that, I think there is one thing in your book that I might disagree with, or at least in terms of its certainty. And you made a statement that writing Cloud Native software is complex, just flat out complex. And so I'm curious as if it has to be that way because as these platforms mature as we sort of kick those old habits down the road, is it possible that a software developer who is not solving necessarily complex problems, but just trying to build something that's valuable to their business can still have all those Cloud Native characteristics like the resiliency and adapting to change without the software and the apps being complex? Like, is that possible or is it just inherently complex?
Cornelia: Yeah. No, I... And you know what? We're not going to actually disagree on this that much at all. I guess it's a scoping question. As a whole, when you look at something like Netflix, the Netflix application, or the Facebook application or something like that, that is absolutely a complex system. It's got lots of moving parts and the only way that we've been able to manage things like that at the scale that Facebook and Google and those properties reached was because they are really good at understanding what are not only the software architecture patterns, but the operational patterns. These are the unicorns that have figured out those Cloud Native operational patterns. So as a whole, that's there. But I think that we have achieved the right thing if somebody who's working on a component within that system isn't burdened with that complexity.
Cornelia: And I think that is absolutely achievable. And have we achieved that completely? I would say probably not. I think, in places, we have. But again, going back to the Kubernetes space today, today, the developers are increasingly asked to not only worry about the code and the processes and the multithreading that they have within their single microservice, but they are also asked to understand what it looks like from a deployment perspective and those types of things. And I think that we haven't nailed that substrate yet that makes it easy for services to be consumed into the complex holes. And maybe we're 60% of the way there to make the microservice developer's life a little bit easier.
Joe: So you mentioned microservices... And a lot of these patterns are directly related to microservices, whether it's like service discovery or circuit breakers, and maybe this is a restating of the question I just asked before, but is there room for the monolith in a Cloud Native architecture in these Cloud Native Patterns?
Cornelia: Yeah. I mean, that's a really great question. I think that monoliths, in general, break so many of the patterns. So right out of the gate, you're going to be burdened with problems because you're now running in a, let's say, Cloud Native application platform or in an environment like that that assumes those patterns and that can do things based on those patterns, based on that assumption of those patterns, being there. Beyond that, there are definitely monoliths out there where the internal architecture is componentized.
Cornelia: So you've done a good job creating separate services and maybe you're even doing statelessness. And if I'm following those patterns, why am I not calling that a microservices architecture? Well, it has to do with even the way that you bundle those things together. And so there's this idea of this monolithic bundle, even if I've done microservices on the inside, and that type of monolith will probably work well, reasonably well. It depends on, again, how those components are deployed in those types of things. But it isn't going to solve some of the other things that these more loosely coupled microservices architectures do and things like having independent release cycles, being able to do independent blue-green, being able to create bulkheads between the different microservices.
Cornelia: And so I would say that the answer is... I mean, I'm not a purist, I'm a pragmatist. And I want to be able to take things that don't follow 12 factors, maybe they only follow four factors. And I want to be able to bring some of those things over and whether they get this negative label of monoliths or not, I'm very, very interested in bringing those into the fold and getting there incrementally. So in short, I think that there is room for some level of monolithic architectures, monolithic applications, in the cloud and starting to make its way into Cloud Native. And I think the reality is... I work a lot with large enterprises. The reality is that we can't just rebuild everything greenfield. We need to get there incrementally.
Joe: Okay. So I'd like to come back to the twelve-factor app because I just love it so much and I talk about it so much. One of the things that I think you do really well in your book is cover some topics that may have been missed in the twelve-factor app or like I often say, "If we were going to write it again from scratch today, there'd be a whole bunch of things that we would add." And two of those are related, they're visibility, and then logging and metrics. So I think in visibility, you talk about health endpoints. Can you talk about why that's important and how it fits into the Cloud Native Patterns?
Cornelia: Yeah. So I love that you talked about visibility with respect to these probes... And I'm going to use the term that Kubernetes uses. They have these health probes and these health endpoints. Although as a developer of these services, you're responsible for implementing them, what Kubernetes does is it allows you to tap into those and do some interesting things with that. And in fact, I think that's really the answer to your question, which is okay, I've got a bunch of different components and we talked about the fact that I've got lots of components that come together into a relatively complex whole. Whole as in W-H-O-L-E. So there's a larger digital offering that is made up of, composed of, all these little microservices. And it's one thing to know whether one piece of that is working fine.
Cornelia: And that is something that, from a monitoring perspective, you could just say, "Okay. Well, I'm going to monitor these components." What the health check does is it allows us to actually start to get some behaviors where we stitch those things together. So it allows, for example, a client microservice to, let's say you've got some behavior in there... You wanted to find some behavior in the client microservice on consuming another service, so a downstream service. You have a system, and going back to platforms, do you have a system like Kubernetes that is constantly watching those health endpoints and then can actually help the upstream service do something different based on the way that that downstream service is appearing. So use the term visibility. So it's not just visibility in terms of a dashboard. I think a lot of times when we hear visibility, we think, "Okay. Well, I'm just going to bubble things up to a dashboard." But visibility is super important for being able to orchestrate things and automate things as well.
Cornelia: And that's a big part of what I see these probes and these health endpoints are about. I can tell you that, 10 years ago, I didn't really think about programming health endpoints so that some automated robot could do something on my behalf, depending on what that value is, but that's exactly what Kubernetes does. That's how Kubernetes knows, hey, if there's something, there's this health probe, that says, "If I'm not getting back a response from that health probe, I'm going to throw away this pod and I'm going to create a new pod. I'm going to create a new container instance for you." So that's the mind shift that we need to do as developers is to think about robots as the consumers, not just a dashboard somewhere.
Joe: So the subtitle of your book is "Designing change-tolerant software." And I really like how, several times throughout the book, you characterize Cloud Native Patterns and Cloud Native Apps as being able to adjust to new conditions and adapt to change. So I'm wondering if you can give some examples of the kind of change that you're talking about, maybe something that you've experienced.
Cornelia: Yeah. So, I mean, that is really what characterizes the cloud just about more than anything else, I mean, highly distributed. And I've talked about that a few times. So, highly distributed microservices all over the place, network latency networks going in and out. And in fact, that network latency and networks going in and out kind of hints at that change. That's one of the many, many things that can change in the environment. And so that change tolerance is partly there to say, "Hey, stop assuming that your infrastructure is going to be stable." Because your infrastructure is going to change, we talked about the example of it's not Amazon's fault. That's one of the things that I always say is if you, as an administrator, as an operations person, if you ever catch yourself thinking, "Okay. I'm going to run the script and then I'll be done," that "done" word is a bad word in the cloud because you're never done.
Cornelia: There's always some change that's happening. But there's also change that is intentional change that you want to be able to enable. So I'll give you a concrete example here. One of the things that we used to think 10, 15 years ago, we used to think that the way that we achieved security in our software systems was to go through this rigorous process of analyzing everything. We had the security office, and we had the change control people, and we... This is part of what has slowed us down significantly in being able to do deployments, but we did it. It was something we were willing to put up with because we felt like the way that we could be secure is to get everything set up in a secure way and then not allow it to change.
Cornelia: As we've gotten better and better at the cloud, we've started to that realize that there's some really interesting patterns. And we've started to realize that, in fact, security comes through constant change. So for example, the way that we have historically tried to protect ourselves from malware is to have malware scanners, but the malware scanners depend on recognizing the signatures of malware, signatures that are coming across networks, or signatures that are coming across in, I don't know, compute profiles. You start seeing spikes on some regular basis. And those are the things that we tried to look for. But what if instead we said, "You know what? We recognize the fact that we're never going to be able to see all those signatures. There's going to be some clever hacker that gets malware in there that is going to be undetectable"?
Cornelia: What if we just throw away that container instance every single day? And so now the malware, maybe it gets in there, but it doesn't live there for six months collecting credit card numbers. Right? And so that is something where this intentional change, this allows us a completely different security posture, but it requires that we build our software in such a way that it can tolerate that changing condition and the changing condition being, hey, I'm just going to reboot you.
Joe: Well, thank you for joining us today, Cornelia. It's really been a pleasure to talk. If you are interested in learning more about Cloud Native Patterns and Cloud Native Applications, we'll include a link to your book. Cloud Native Patterns from Manning in the show notes. Thanks again.
Cornelia: Thanks so much for having me. It's been a real delight.
A podcast brought to you by the developer advocate team at Heroku, exploring code, technology, tools, tips, and the life of the developer.
Software Architect, Heroku
Joe is a Software Architect working on the Heroku and Salesforce Platforms.
More episodes from Code[ish]
Karan Gupta and Marcus Blankenship
How can applying the right technology choices at the right time impact your coding and business choices? Karan Gupta explains how practicing “pragmatic engineering” can have an oversized impact on business and business efficiency. →
The episode focuses on managing a certificate authority (CA) within an enterprise. The internal CA is compared on many points to PKI on the public internet. →
James Dong and Chris Castle
How much can a day of coding help others? James Dong created a platform to help small businesses impacted by the COVID-19 pandemic sell gift cards online. Learn how this platform, built on Heroku, provided a way for residents to support... →