41. Architecting Multi-Tenancy

Deeply Technical
October 29th, 2019
Episode 41
37:45

Also listen via

41. Architecting Multi-Tenancy

Hosted by Rubert Blumen, Ian Varley

Host Rubert Blumen is joined by Salesforce architect Ian Varley to discuss multi-tenancy. Their conversation covers how multi-tenancy differs from multi-user, popular architectures for multi-tenancy, why Salesforce uses a shared resource architecture, how Salesforce scales horizontally, the economics of a mix of small and large users, the importance of scaling down, scheduling, rate limiting and fairness, the threat model of a multi-tenant system, ensuring isolation between users, and technical challenges of migrating users between shards while maintaining availability and consistency.

Links from this episode

The Architecture Files #10: L33T M10y by Ian Varley
Salesforce developer documentation on multi-tenancy
ACM Queue Condos and Clouds by Pat Helland
The Magic of Multi-Tenancy by Igal Levy

Show Notes

Robert: For Code[ish], this is Robert Blumen. I'm a DevOps engineer at Salesforce. I have with me today, Ian Varley. Ian is a principal software architect at Salesforce with over 20 years experience in software development. Ian and I will be talking about multi-tenancy. Ian has written about this topic for the Salesforce engineering blog as part of a series called the architecture files. Ian, welcome to Code[ish].

Ian: Thank you Robert. I am really happy to be here.

Robert: We're going to talk about multi-tenancy. Let's start off with a definition. What is multi-tenancy?

Ian: Oh, well, that's a great question. The way I like to describe it is I think everybody understands naively that software is all multi-user like that there's lots of different people using the same software at the same time on the web. If you and I are both using Gmail for example, then you know there's not a separate Gmail program running for you than there is for me.
<!– more –>
Ian: So most software you interact with is multi-user. Multi-tenancy takes that to one more level, which is to say that there are groups of users, and these groups might share information amongst themselves, but the group is completely isolated from other groups. So you can think about it almost as a group mode for multi-user software. That's I think the simplest way to describe what we mean from a software perspective about what multi-tenancy is.

Robert: When you're talking about multi-tenancy is the tenant one of these groups?

Ian: Exactly, right. The tenant is the group and that can take a lot of different forms. But for a company like Salesforce, we have a… The brand of multi-tenancy that we use has each paying customer, say of Salesforce, what they're paying for is a tenant, and a bunch of individual user accounts within that tenant. That's their private area of data.

Ian: Now, that sometimes maps to a single company, that's the most obvious case, but there are bigger and more complex cases where one company who is a customer of Salesforce might end up actually having three or four or even more independent tenants because for some reason they need to keep groups of users separate from each other for say compliance reasons or international law or something like that. It doesn't always map exactly to a legal entity. But yeah, that's what I mean when I say tenant. That's the group.

Robert: To give us some more concrete idea of what we could be talking about, how many users might a tenant have in a large enterprise software multi-tenant system?

Ian: Yeah. I don't think there's any particular right answer to that. I mean we certainly have very small tenants that have one, two, three, four users. You know, small, independent companies. We probably also have individual tenants that number in tens of thousands of user accounts within a single tenant. There's no inherent maximum at that point. It's just whatever your software can handle I suppose.

Robert: You bring up Gmail as an example of a multi-user system. I believe when I use it as an individual, it is multi-user. There is the corporate Gmail client where everyone at Salesforce is on Gmail. We can calendar each other and if you try to send a document to somebody outside of Salesforce it'll say, "Hey, you're sending this to someone outside of your org." Would it be fair to say that corporate Gmail or corporate G suite is really a multi-tenant system?

Ian: Yeah. I think that is fair. I think the boundaries are probably a little bit more porous there. You can choose to share data with people outside of your organization in the corporate Gmail world. Whereas, I think for a system like Salesforce, I guess I should mention tenancy is a very basic concept within the Salesforce systems. So there really aren't mechanisms in place and this is very, very conscious. This is by design. There really aren't mechanisms in place that allow you to directly share information from one tenant to another.

Ian: When we think about multi-tenancy, there's a small set of principles that we think about in terms of why we do it and what we want to get out of it. The first most important principle of multi-tenancy, as I think it would be in just any multi-user system, is the principle of isolation, right? You want to make sure it's very, very difficult if not downright impossible for people to accidentally or with bad intention move data across the boundaries of tenants, right?

Ian: So if you're thinking about single user Gmail as a system, if there were a special API call you could make that would go get my email, that would be a big security hole. Right? Likewise, in a multi-tenant system, we want to make it difficult for or difficult or impossible for bad actors to get at the wrong tenant data, for a tenant to accidentally get at another tenant's data. Perhaps more importantly, we want to make it difficult for our engineering team to make mistakes of one type or another that accidentally do the same thing.

Ian: In other words, it shouldn't really be possible for an engineer to check in a bug that all of a sudden has a different tenant reading my tenant's data. When that happens that's very, very serious. So the entire engineering organization of the system is really based around that as a first principle.

Robert: This concept of isolation. You've been talking about how to ensure that there are boundaries between tenants but users within a tenant can share information. I do want to drill down into that more, but I think it would make sense to talk about the main architecture approaches to multi-tenancy first, and then come back and say, how do we implement isolation? You describe in your blog post there are two main approaches to multi-tenancy. What are they?

Ian: Well, the first one is what you might get by default if you use Heroku or any kind of infrastructure as a service type account, which is you get down to the infrastructure level, you get separate resources, right? You get separate containers or VMs or however you want to implement it per tenant. If you have one tenant, there's going to be a set of VMs over here for that tenant, and a set of VMs over here for this tenant and never the two shall meet.

Ian: That is a very straightforward way to do multi-tenancy where you're saying existing containers of one, and I don't necessarily mean like Docker containers, but existing containers of one type or another provide that boundary for you. That's what you might call a separate resource strategy. The other, which is the way that the Salesforce architecture works is what you might call a shared resource strategy.

Ian: In a shared resource strategy, you are actually sharing at the infrastructure level and even at the process and software level, you're sharing all of the resources across multiple tenants. For example, in the world of Salesforce, you're going to have one big database, one big relational database that is actually serving and storing the data for thousands and thousands of different tenants simultaneously. It keeps those separated logically rather than at a physical infrastructure layer.

Ian: By which I mean if there's a table in the system, let's say there's a table for accounts in the system, there's going to be column on that table, which is the tenant identifier, which in the Salesforce systems we call the organization ID. So that organization ID is just a field on the table and literally every query in the system is forced to say, okay, I always have to have a WHERE clause that limits to just a single organization, a single tenant in every part of my software.

Ian: So, in every single line of code throughout the entire system, there's this awareness that one of the things in your current context is, which tenant are you part of? It goes all the way down to queries to the database all the way through all the processes and user interactions and everything. Those are fairly diametrically opposed approaches to multi-tenancy of course.

Robert: If I understand that the one approach is you get your own stuff, and the stuff is maybe running even on completely separate servers. So it's inherently pretty well isolated versus everything is on this potentially the same big server and we have to do some stuff in the software to ensure that every request for any kind of information has to be scoped down to one org ID.

Ian: Yep. Yep. Exactly.

Robert: Great. So at Salesforce, we mostly use a second approach. Why is that?

Ian: Really the reason ultimately is about utilization and cost, right? If we're running a single multi-tenant service rather than lots of small single tenant services, we're making better use of the underlying software or the underlying hardware that we are ultimately paying for, right? So this is particularly important for resources that you're not using all the time.

Ian: If you're talking about ephemeral compute, you can scale that up and scale that down pretty easily, right? You can just even all the way to using like a serverless approach where somebody else worries about scaling it up and down for you. But from a storage perspective, if you are running databases and things like that, it's fairly important that all of the individual tenants when they're not being used, which is a lot of time if you've got a small company, a few people, they're not hitting their CRM system every second of the day.

Ian: So if you've got separate resources actually physically spun up for all of those, it gets really expensive really fast. So just the impact of that on the cost structure of the service but then, I mean, think about also just even on the environment and things like that, there's just a ton of waste there. That's why for the majority of services that Salesforce runs, that's why we run it in that shared resource mode.

Ian: Now of course, it takes a lot more work to build the software in such a way that it's going to work. But then once you've done that, you have that as an option.

Robert: Is it also the case that from an Ops standpoint, it's easier to manage one really big database than say 5,000 small databases that would all need to be backed up and query planned and tuned?

Ian: Yeah, for sure. If you're thinking about that as individual human work, then yes, for sure. Of course, when you get to doing things at scale, that reverses a little bit to be honest because if you're going to be running databases at scale, you need to get to a point where all of that is very, very automated.

Ian: If you think about Heroku as an example of this, right? Of course, Heroku's data services like Postgres and things like that, that is tractable at that scale with millions and millions of databases running because it's literally all automated, right? There is no human that's sitting there pressing buttons and doing all those things.

Ian: That's not really a compelling reason to go one way or the other. You have to get to a level of automation that matches your scale in either case. So certainly in CRM core with the level of complexity of the relational databases that we run in the scale of them, there is a fair amount to do, which is in terms of operational work, that's difficult to automate. But at this point, we've got a lot of automation in there just so that we can run at the scale that we run at.

Robert: We've been talking about it as if there is one great big relational database and all the tenants are in it, but relational databases can only scale out so much. How does this approach horizontally scale when you have very large number of customers as we do at Salesforce?

Ian: Yeah. The way that Salesforce has approached that over the years is to take just a basic sharding approach. Because of the fact that it's a multi-tenant architecture, we can very cleanly separate tenants from each other, which means that if a single instance of a database is getting too large, we can just slice it in half and say, all right, 50% of the tenants go over here and 50% of the tenants go over there.

Ian: That's more or less been the way that the company has scaled since the beginning. For the first few years, I think there really was just one big database, but it was pretty quickly clear that that was going to have its limits. So they started sharding that into more and more. We call them instances. There are a whole bunch of instances of Salesforce now that are identical from a software perspective, identical more or less from a hardware perspective, and then just have different subsets of tenants on them.

Robert: You could envision a scenario where the workload on a particular instance, it's outgrowing the instance. You have to split it into two shards. What are some of the complexities around doing that?

Ian: Yeah, I mean that's one of the interesting… That's one of the principles actually that we think about when we think about multi-tenancy, which is that if you're going to run an architecture like this, it's really important that moving a tenant from one place to another can't be a difficult thing to do. It can't require a whole lot of human intervention.

Ian: It has to be easy. It can't involve a downtime or whole lot of downtime and it needs to be very reliable. So scalability is one of the angles where you might want to do that, right? Where you say, Oh, well this tenant or this group of tenants has been sort of organically growing, and the hardware that they're hosted on is not good enough anymore, and we need to move them around. That's a totally valid reason.

Ian: There are also other reasons why moving individual tenants has to be a first-class operation, has to be easy. One of which is just basic hardware refresh. If you're running in data centers, then you have machines that are on lease. Those leases come up for expiration every so often and you just need to refresh them, right? You need to move to new hardware. For that reason, you need to be able to migrate large groups of tenants together.

Ian: Then there's actually another potentially even sneakier reason why we need to move tenants around, which is what I like to talk about is product interaction latency. Let's say you're a customer, and you're in Japan, and you're running against an instance that is based on the East Coast of the US, you're going to have pretty poor latency on a request by request basis, right? Every time you do that it's traveling across the world.

Ian: For that reason, we spin up physical resources in lots of different geographic regions, and so we might want to say, Oh, actually let's move your tenant closer to where they are to where the users are actually accessing is from. That can be complicated. I mean, you might say, why not just put it near them in the first place. But it's a constantly changing picture. Where we have data centers is always changing, where our customers have users is always changing. There's mergers and acquisitions and all that kind of stuff. So for that reason, it's really important that just picking up a tenant and moving it somewhere else has to be just a first-class operation.

Robert: With a system that has customers on it and were in a lot of different time zones, you have people doing reads and writes. You offer some kind of consistency model. You have messaging or events coming into a system. Is it necessary for Salesforce's use cases to be able to migrate a tenant across instances without the tenant seeing any downtime or weird consistency violations?

Ian: Well, the latter, 100%. If we have weird consistency violations that violates a lot of the basic trust that our customers have in our product. It's a pretty complex product or set of products that has very, very nuanced interaction patterns and people build really complicated stuff on that's very mission critical. So if things are broken from a consistency perspective, like you wrote these three records but now only two of them are around, then that's a big problem. Right?

Ian: That's a first class concern for us is the correctness and consistency of the data. Then as you know for our customers, of course, especially some of the larger customers as you get in particular, in the world of service, we have a product called service cloud where you can run call centers and things like that. As we get into bigger, bigger customers, we're increasingly seeing customers that absolutely need 24/7.

Ian: So we're certainly striving for getting to a point where moving a tenant is a zero downtime operation. We have also, because of the complexity of the product, kind of let the cat out of the bag a little bit in terms of what we're allowing our users to do just in terms of how complex operations can be. So it's really difficult to have both of those things, right?

Ian: It's really difficult to have a complex relational database that's updatable in real-time and can have extremely complex transactions on top and to say, by the way, I'm going to be able to without any downtime atomically and transactionally move this to a physically different geographical location when the data in question could be petabytes of data. You put all those factors together, and you've got a recipe for impossibility.

Ian: There's interesting things we're doing. Tons of interesting things we're doing to reduce that window of how much downtimes you really need to take. For example, we've got a tenant migration process that works in these two phases where first it identifies data that is not changing and it moves that in bulk without the tenant being offline, and simultaneously keeps a record of what changes they are making.

Ian: Then there's a second phase when we say, okay, now take the tenant offline for as brief a time as possible. Catch up all those most recent changes and then turn them back on in the new location. There's lots of interesting tricks that we play to try to get that to be as small as humanly possible. At this point, we are for sure still in a point where we can't just atomically flip a switch and say, hey, at one second, you're operating on the East Coast of the US and the next second, you're in Europe. We're not able to do that just yet.

Robert: You'd brought up the issue a little while ago, and I pushed it off about isolation. Now, we have some fundamentals of the architecture to drill down into how Salesforce implements isolation when tenants are using the same underlying storage systems.

Ian: Yeah. At root, it's really as simple as I said before. Of course, nothing is ever that simple. But a lot of the products is backed by relational database and in every table, in that relational database, there's a column that says organization ID and the organization ID is part of every single query that runs in the system, and it limits your access to a single tenant.

Ian: Now, how do we make sure that someone doesn't just write a query that says… that leaves off that or that works across tenants in ways it shouldn't. There's a whole wide variety of mechanisms. In the product, the basic mechanism is we have a static checker that looks at every piece of SQL that goes against the system and it won't let you check it in if it doesn't have the appropriate conditions in it.

Ian: As far as the way that the product works, that's the first line of defense. The second line of defense is from an indexing perspective, the database tables are very tightly indexed on this organization ID and they're very, very large. So if you try to run a query that's properly limited and indexed to a single organization, it's going to have perfectly good performance characteristics. If you leave that off, it's going to take hours to run.

Ian: So you'll notice pretty quickly during development like, "Hey, why is my query not running" even before you get to the point of the static checker. So between that and a whole bunch of other failsafes and mechanisms that we have in place, that's how we protect that level of isolation across the entire stack. That's a fair amount of work to do, but it does give us these benefits of the shared resource, software level multi-tenancy.

Robert: I'm aware that Salesforce offers different kind of containers in which customers can write their own apps that run against Salesforce APIs. I'm going to guess you don't just leave it up to the customer to add and organization ID equals us on all of their own queries. How do you protect the customer against either behaving badly or simply making a mistake?

Ian: Yeah. There's two varieties of that. The first like just accessing the wrong data for the wrong tenant. That's just not possible through the API at all if the API is in every respect only is single tenant at a time. Now, it's possible if you phished somebody, and you got their credentials and then you log into the API as them, that's a different story, right? But from the perspective of legit usage of the system, there's no way to go across those tenant boundaries with an API query or UI request or anything like that. You just can't do it.

Ian: Now, where it gets a little trickier is in terms of resource utilization, because we are fundamentally running on the same servers under the covers here. If tenant A comes in and issues a really expensive request, like say I want to crunch all these billion lines of data and do a report on them and then customer B comes in at the same time say and says, I want to do a relatively simple request, show me a page.

Ian: It's possible that there's going to be contention? Well, certainly there's going to be contention just for basic resources like memory and CPU. Now, most of the time, these are highly concurrent products that handle this really, really well and we monitor very closely the CPU load and memory load and so forth. Right? We don't really generally get into contention situations like that that we're not expecting. We also have plenty of limits in the product itself that prevent a single tenant from just gobbling up all the resources.

Ian: Now, there's there's really interesting corner cases that you can get into. For example, with asynchronous work, we allow customers to take a bunch of work and say, I want this work to be done asynchronously via a message queue. Now this is where things get a little dicey. If one customer encodes or puts in a whole bunch of messages and say, I want you to process the following 10,000 messages, right? Then another customer right after that says, Oh, I just want you to process this one.

Ian: There's really interesting questions about fairness there. It's like, well you'd like to do things in first arrival order as a general principle, but in a case like that, should you be interleaving those requests for different tenants and say, "Okay, we'll do the first thousand from tenant A and then we'll take a little break and do the one from tenant B and so forth. This kind of fairness computing can get really tricky, and it's one of those things that because we've opted for software level multi-tenancy, we don't have an easy out there. We have to actually answer all those questions ourselves.

Robert: There have been a number of attacks on single servers in the last few years, the row hammer type things that enable… I've shown that there are ways to leak information in multi-user systems. Is that part of the threat model for Salesforce's multi-tenant system?

Ian: Yeah. Oh, for sure. We have lots and lots of threat models because we have lots of threats. The principle we really try to rely on is defense in depth. So, barriers at a whole bunch of different levels to ensure that if a bad actor is attempting to do something, (a) they have a lot of hoops to jump through to get from one place to the next and across those boundaries of various sorts. And then (b), if they are doing so that they're leading a trail that's notifying us that we're detecting what's happening and getting in right away and responding and freezing them out.

Ian: We do a lot of work internally to ensure that, and lots of exercises and things like that to ensure that we're trying to think about things from every possible attack angle. I don't think that multi-tenancy from that perspective is inherently any riskier than a single tenant architecture because once, if an attacker is in a system and is getting access to those resources, it's bad regardless of which way you spin it. So the fact that it's one tenant or multiple, it's immaterial at that point.

Robert: You're talking about the tenant who wants to do an enormous amount of work could potentially create starvation for other tenants who are doing smaller amounts of work. I could see a couple of economic models. One is use as much resources as you want will need to charge you for what you use.

Robert: Another would be putting caps and say, well, we can't necessarily, we're not infinitely elastic, so we do need to put some limits on what any one tenant can do for the fairness of all. How do you come down or do we… Is there a single right side of that? Or how does Salesforce come down on that?

Ian: Yeah, there's not a single right side. Obviously, both are good approaches, and it depends in part on the business model of the software that you're trying to sell. Some kinds of software work really well in a pay as you go model, and some don't. For a lot of the core product at Salesforce, like the sales products and service products and things like that. It has traditionally been very much a limits model, right? Rather than a pay as you go model.

Ian: That primarily is because of the scaling approach we talked about, right? It's a relational database, and we scale it up until a point where we need to split it and then we split it. So it is not internally elastic in that way where we could just say, "Oh, sure. If you want to use a hundred servers worth of CPU in the next hour, you can go for it and we'll just charge you for it." We just don't have that option just architecturally in the core model.

Ian: Now, there's plenty of other parts of Salesforce that do that. That do exactly that. Salesforce is a really big company and has a lot more than sales and service. There's marketing and there's commerce and there's collaboration and there's API connectivity and a whole bunch of other things. Right? That's going to vary pretty widely from one part of Salesforce to the next.

Ian: For sure, I mean, as you know, in the Heroku world, it's very much the other model, right? Where we say, Oh yeah, if you want to spin up more dynos you should absolutely do that and you'll get a bill for it. So as in every good question in software engineering, the answer is it depends.

Robert: You have talked about Salesforce running on first party data centers. How would public cloud possibly change your answer to that?

Ian: Yeah. Salesforce runs in public cloud and first party. We have public cloud instances today that run on various different public cloud providers. Fundamentally, that has the potential to change the game. Right? But what it really comes down to is what's the mode in which you're running the actual software for the individual tenants.

Ian: If you're still running in a massively multi-tenant fashion at the software level, then your ability to do that is a little bit more limited, right? You still have some of the same constraints. You still have to think about it the same way in terms of taking large groups and splitting them and migrating and things like that. Given the same architecture that we have, it doesn't inherently change any of those things.

Ian: Where it does start to become really interesting and really transformative is it gives us a lot more options in terms of, I guess I would say the sizing and the shaping of pools of tenants that live together in an instance. It allows you to be a lot more responsive with that and say, Oh, based on the growth of an individual tenant, we have a shorter lead time to spin up a new set of infrastructure that allows us to handle just those requests.

Ian: Again, as I said before, Salesforce is lots of different architectures all in one. And so there are certainly aspects of the Salesforce architecture where that elasticity of public cloud is turning out to be extremely important. Even today in our ability to serve really spiky workloads in the way that we handle commerce sites or the way that we handle marketing, some of those more seasonal workloads, that's already turning out to be exceptionally beneficial.

Ian: But to be clear, if the question is what does Salesforce do in X, the answer is both or all of the above, right? Because we have just so many different parts of the product.

Robert: I'm going to change the subject a bit here. What if a super, really big important customer, they have this one thing and the way the system works now, it doesn't quite do what they want. Can you just do this one off for them?

Ian: That is an excellent question. Early on in the process of going down the road of software level multi-tenancy, we came to a fairly important realization. You really need a certain level of isolation at the conceptual level or independence, I guess maybe is a better word for it in terms of the way that the software behaves differently for different tenants. We could very easily long ago have said, Oh, we've got a big deal on the line, but they really need this feature to work differently than it works for everybody else. So would you mind just forking the system and having one version for them and one version for everyone else?

Ian: And very early on we answered that type of request with a very emphatic no, absolutely not. There's a whole host of reasons why we don't want to get into that situation of forking our code. It makes life harder for our engineering teams. Right? Which of course then ultimately makes life harder for our customers because we'd be running instead of one service, several slightly different services. So that really wrecks our ability to move quickly and innovate.

Ian: If you think about software as a service in general, one of the big advantages that software as a service has over packaged software is just the amount of time that we spend on maintenance and patches and testing of old custom versions, right? That's a whole lot of work that we don't want to have to do because we want to be putting that work into innovation and new features and so forth. Right? It's really important that we don't get into a situation where a single customer needs to have that level of control either over the code or over the release schedule, right?

Ian: Because when we think about the way that we release updates to our software, if a single customer's needs can force us to stay, say on an older version for some arbitrary amount of time or dictate other parts of the schedule, that can cause a whole lot of complexity for us. So it's kind of almost as bad as a code fork, right? So really good multi-tenancy practices require us to keep our engineering pipeline totally independent of the needs of any one tenant. We just absolutely need really clean lines there.

Ian: The way that we then kind of work with that is since Salesforce is a platform, if there's custom things that an individual customer wants to do that are different from somebody else, that's absolutely fine. They can do that inside the software rather than changing the software. By which I mean that a lot of the software is metadata driven at two different levels.

Ian: One level being, there's just a lot of configuration options you can choose. You can turn functions on and off, and you can change the way things appear, and you can change who has permissions to what at lots of different levels. But it's also a customizable platform in terms of metadata at the user level because they can write code. They can come in and say, Oh, when you save the following object, run the following code, and it will change the behavior of the system just for that one tenant.

Ian: Because of that very strict approach to that, that was really what drove Salesforce down this path of metadata being this sort of central driving force around how we empower customers to represent their own business without throwing a wrench into the software delivery process for us.

Robert: I like to start wrapping up. Is there any one lesson you've learned about multi-tenancy where you wish you learned it a bit sooner than you had?

Ian: One of the things that think is very apparent to us now that wasn't necessarily apparent early on is the bidirectionality of scale. I'll explain what I mean by that. On the one hand, it's super important for you to be able to scale up resources for a tenant. It's somewhat obvious why that would be the case, right? You have a tenant that starts small and then they grow, and they grow, and they grow, and all of a sudden they need to do more complex things.

Ian: That's what we were talking about before in terms of the ability to move tenants around is one way to deal with that. There's other ways as well in terms of scalable storage and things like that. That's one direction of scale, it's scaling up. It's also really important, and this is the core principle that we've thought about to be able to scale down. We were talking before about the cost structure of serving the architecture.

Ian: If you think about it, in any kind of system, but in particular a system like Salesforce, there's eventually going to be lots and lots of small tenants. Why is that? Well, there's a whole bunch of reasons. For one thing, the way that the sales cycle works for enterprise software is if you're a CTO for a big company, you don't just swipe your credit card and start paying for a fully loaded system somewhere, right?

Ian: There's this complicated dance where the two sides spend months working together, building customized demos, making sure that the systems are really good fit, all that stuff. That means because of the way that a sales funnel works, there's just going to be a lot of demo environments and there's going to be a lot of these small tenants that they need to be extremely cheap and resource efficient when you spin this up, especially considering that a whole lot of them are going to be idle most of the time, right?

Ian: So if you think about that, that means overall our systems will be dominated by mostly idle tenants, right? So we need to take that into account when doing cost planning. That really ultimately that need to scale down to lots and lots and lots of small tenants is one of the real reasons why the software level multi-tenancy approach that Salesforce has taken really shines.

Ian: Because it's our way of saying, yeah, new tenant creation is super, super cheap. It's a few rows in a database somewhere and you're off to the races. That's a thing that I think if you're starting on a new project in terms of tenancy, a lot of times you might think to yourself, well, the infrastructure level tenancy is absolutely fine because we don't have to do any work for it. We're just going to spin up separate VMs and containers for each tenant and we're off to the races.

Ian: You really need to look at what the requirements of your process are going to be and ask yourself like, am I ultimately going to be dominated by lots and lots of small sandbox environments and demo environments and scratch organizations and things like that. If so, how do I make sure that that's not the dominant term in my cost equation.

Ian: That is a thing that I'm glad we… It's kind of the opposite of question. I'm glad we got that one right very early on. And it really was predicated the way that the system ended up working from a business perspective over time. That's the advice I guess I would generally give to someone who is thinking about this question.

Robert: You could have an enormous number of old, archived, defunct or otherwise idle tenants and they cost you almost nothing as a few rows in a few database tables. Am I understanding that correctly?

Ian: Yep.

Robert: Do you have any data on what proportion of old tenants would fall into that close to zero cost category out of relative to active?

Ian: No, I don't. I don't have data on numbers. As far as the resource utilization of the system, obviously it's not much, right? The resource utilization of the system is dominated by active tenants because they're the ones who are doing things. It's almost something that we don't really have to pay that much attention to in terms of numbers. But if we had a different architecture, we would be paying a whole lot of attention to how many idle tenants are there and how much are they costing us and so forth. But the answer to us right now is it's basically free.

Robert: Okay. Ian, it's been a pleasure talking to you. Thanks so much for coming on the podcast.

Ian: Yeah. Robert, thanks so much for having me. It was a real pleasure talking to you.

Robert: For Code[ish], this has been Robert Blumen. I've been speaking with Ian Varley.

About Code[ish]

A podcast brought to you by the developer advocate team at Heroku, exploring code, technology, tools, tips, and the life of the developer.

Subscribe to Code[ish]

Hosted By:

Robert Blumen

Lead DevOps Engineer, Salesforce
@robertblumen

with Guest:

Ian Varley

Principle Architect, Architecture Strategy, Salesforce
@thefutureian