InfoQ Homepage Presentations Things SaaS Builders Keep Getting Wrong
Things SaaS Builders Keep Getting Wrong
Summary
Jon Topper explains 10 common mistakes in building SaaS platforms, including not baking in tenancy from day one and failing to automate tenant provisioning. He shares strategies to mitigate risk, such as calculating baseline costs, establishing a strong product philosophy, and avoiding custom, on-prem, or multi-cloud deployments to ensure a scalable and profitable business.
Bio
Jon Topper is the founder of The Scale Factory, an award-winning AWS partner, now part of Ten10. His team of experts help SaaS companies, and other businesses, get more out of their cloud platforms.
About the conference
Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.
Transcript
Jon Topper: Mistakes in all businesses cost us time and they cost us money. Arguably, in the technology world, the mistakes that we make are potentially larger, faster, more money losing than any other business at all. It's important to try and avoid them as far as possible. Bill Gates is quoted as saying that it's fine to celebrate success, but it is more important to heed the lessons of failure. Actually, if you told me as a recent graduate in 2001 that I'd be quoting Bill Gates on stage when he was being vilified on Slashdot around his embrace, extend, and extinguish policy, I'd be very surprised. Here we are.
Community, things like QCon, meetup groups, other conferences, communities of practice, they've always been very important to me throughout my career. I think community allows us to heed the lessons of failures that other people have made before we get the opportunity to make those mistakes ourselves. What I'm hoping to do is to share some of those mistakes that we see fairly commonly in the SaaS landscape, in the hope that you can avoid those in your own building.
My name's Jon. I founded The Scale Factory 16 years ago now, or thereabouts. We're an AWS partner, so we work with SaaS customers for the most part. In 2023, we were the global SaaS partner of the year for AWS. Hopefully that means that I can speak with some authority on this topic. It does mean that the examples I'm using will be fairly AWS-centric, which I know this is a multi-cloud track. Obviously, AWS is the winning cloud anyway, so that's the one that you're all using.
SaaS (Software Sold as a Service)
I just wanted to make sure we're all on the same page about SaaS. SaaS is software sold as a service. It's typically based on the web or in apps. It's almost always these days hosted in the cloud because cloud and SaaS are a really good match for each other for a number of reasons. It's typically provided over the internet and billed on a subscription basis. VCs really like SaaS businesses for this reason. Subscription revenues gives them future predictable cash flow, and that's what VCs like to see. It's what all business owners like to see, actually. SaaS is typically billed on that basis.
The main difference between SaaS and any other type of platform you might be building is that it is built for multiple tenants. There are more than one customer in your landscape that you have to worry about. That's the real defining factor here that we'll talk quite a lot about to some extent. I mentioned VCs really like SaaS. This is data from Dealroom about VC investments. Substantial amounts of money goes into SaaS every year.
In fact, in 2024, pretty much 50% of all deployed venture capital, at least of that tracked by Dealroom, was deployed into SaaS businesses. A little less so this year, when in fact the VC investment landscape has dropped significantly over the last couple years as a result of the market changes post-COVID. It means that most SaaS businesses are VC-backed startups. You're playing with other people's money. It makes it all the more important not to make some of these mistakes.
Mistake 1: Building Something Users Don't Want
The first mistake that you need to avoid, and this is a mistake that's common to all startup businesses essentially, is building things that people don't want to buy. If you look at the stats of startup fails, and there's a couple of websites that track these and tell the stories of failure, most startups fail because they ultimately fail to solve a problem that users will pay for. Will pay for is the important qualifier. You can solve as many problems for users as you like, but if they're not going to hand over any money to you for those solutions, then you don't have a viable business. Or it's because they don't make their way to that solution quickly enough before the venture money runs out. The way that we mitigate against this type of failure is obviously to adopt Lean startup thinking.
This book was released a long time ago, and so the practice has moved on to some extent, but the fundamentals still apply, which is that what we're trying to do is put something in front of users as quickly as possible, observe them using the thing that we've built, and use the information that we learn as part of that observation to build our next iteration and keep going from there until we find our way to the solution that makes sense. Which means that in the early days of a startup, or a SaaS building endeavor, we need to be optimizing for learning. Your mission as a SaaS builder in the early days, probably the first year or so, is to get to a minimum viable product and establish a commercial model for that with the types of customers that you want to sell to. You need to be talking to those users often, and frequently, and early. You might even consider not building software for some of the solution, which I know as engineers, you're like, that's not what we're here to do.
Some good startups have been built by initially prototyping on things like Google Sheets and that kind of thing in the background of a website, and so you're not committing lines of code to a problem that you haven't yet solved. Once you've prototyped it, and it's shown as working, and that users are willing to engage with it and hand over money, then you can spend time actually building the platform itself. It also means you need to build in continuous delivery practices early. We've been doing this for about, actually, the 16 years that I've been running The Scale Factory, building CI/CD pipelines and optimizing for allowing developers to get change from their laptops or their brains into production quickly and safely, and frequently, because that's the key to this learning. You also need to measure the right things, so putting telemetry in your applications to learn what your users are doing. You probably want to do user testing and all that good product thinking.
One thing that I think is helpful from a cultural perspective is, as a business, celebrating learning things over and above celebrating shipping features. If you optimize for shipping features, then you might well end up shipping the wrong things, whereas the celebrating learning allows you to celebrate the thing that's actually important at this stage.
Architecture and tenancy. I said that the main distinction between SaaS and other types of platforms is tenancy. Your architecture, when you're building SaaS, is defined based on a number of decision points, characteristics. If you're selling business to business, the architecture that you build is necessarily going to look different from if you're selling business to consumer. B2B tend to be smaller numbers of large transactions, and B2C tends to be large numbers of small transactions.
The number of customers you're selling to is a defining factor as well. Where they're based, either physically, geographically, or legally is a consideration as well. How much they're willing to pay will also dictate the architectural choices that you make. If you are selling into businesses like healthcare or life sciences, financial services, the regulated industries, there is likely to be regulatory consideration for you to take into account as well. Your architecture is defined based on both what the customers need or what the customers want, the features that they're looking for, the money they're willing to spend, and so forth. The performance characteristics they need. The desires for SLAs they have around availability and security.
In those regulatory spaces, your architecture is determined partly also by the needs of the compliance regime, or rather things that the government cares about that you might not necessarily care about yourself. If you're building in one of these environments, often you have to think about regulatory and compliance considerations earlier on in the lifecycle than you would strictly like to, and so they should be built into the architecture as you start building.
We have a number of options when it comes to tenancy in the cloud. It's a spectrum, I suppose, rather than three distinct options. At one end of the spectrum, you have a tenancy model that we call a pool tenancy model, where all of the infrastructure that you deploy for your SaaS platform is shared by every tenant or every customer. That's the cheapest way of achieving multi-tenancy, but it's the least isolated approach. It's also the easiest to manage because everything is in essentially one place.
At the other end of the spectrum is the silo model, where you deploy unique infrastructure for every customer or every tenant, and of course that's more expensive because your baseline costs become higher. It's more complicated because now you have infrastructure to run for every tenant that's deployed. It is also the most isolated model because you can point at, for example, an AWS account and say, that there is the data that relates to tenant A or tenant 1.
The middle option, the bridge tenancy model, is a compromise solution where maybe some resources are shared and some are unique on a per-customer basis. We've worked for a long time now with a pharmaceuticals technology company who, as well as the data that they handle on behalf of their customers, they also have some data services that provide ontology lookups for medical terms and mappings of generic drug names to brand names and so forth. That data doesn't change between tenants, so they provide that as a shared infrastructure. It's read-only in the case of their use case, so that's all shared. All the tenants use that, but the systems that process the customer's data, which includes patient-identifiable data, is separated from one another on a per-tenant basis. In the cloud, you have these various options around isolating tenants.
At the bottom, the least isolated model is the application layer where you are writing software or making changes in your software to separate tenants from one another. Right at the top there is the AWS account layer where you might have a completely separate AWS account for each tenant to live in, and there are options in between there as well. The MedTech customer I just mentioned, for some of their customers, they use container layer isolation, so they have some shared container platform where the identity under which that container runs relates specifically to the tenant, and that's how they manage the isolation consideration. Of course, these all scale from lowest operational complexity to highest, and have different cost characteristics, and so forth.
Mistake 2: Not Building Tenancy Concepts into Your Application Components
The mistake that we see around this stuff is not building tenancy concepts into the application software as a first-order consideration. This is fairly common around inexperienced SaaS builders for a start, but also SaaS that comes out of other businesses where it was built to support a business process within an enterprise, and then somebody realizes that it could be packaged up and sold off as a product in and of its own right. Tenancy has never been considered in the first build and needs to be added later on. If you're not thinking about baking tenancy concepts into your application from day one, you are walking yourself into a position where you can only adopt a more siloed tenancy model.
One of my earliest customers on AWS, actually, was a business that was building a thing that looked exactly like Facebook but was built for business customers. It wasn't Facebook for Business because that didn't exist yet, but they were building something that looked a little like that. What they'd found was that they had plenty of customers that were interested in using it, and every customer that they wanted to roll it out for needed its own EC2 instance, its own database, and we were building the automation around that, so we had first-hand experience of this mistake fairly early on. The way you might think about this in an application level mainly relates to how you're handling data, really. One common pattern, if you're using a relational store, is to think about tenants as foreign keys in every row of data that you own, so in every table you have this tenant ID column that's a foreign key into the tenants table.
Then you use either code in your ORM or however you're referencing your data model to put in place where constraints to make sure that you're only fetching data for the tenant that you're acting on behalf of at that time. Or if you're using something like Postgres, which has row-level security, you can bake these security policies into the security model of the database itself. Another option is to deploy a table per tenant. That can get hairy if you're using certain DBMSes because you run out of tables eventually, and then you get into database sharding, but that's a story for a different time. Or you might spread it across different schemas, as you could if you were in Postgres as well. You can tell I like Postgres. You should just be using Postgres as far as I'm concerned. Not only that sort of relational data, there are ways of doing this with databases like DynamoDB and so forth. There are a bunch of patterns available with that sort of thing in it.
You also want to consider it in terms of your object storage or file storage, baking the tenant ID into the paths that you're using to store your objects or your files means that even if you deploy this application in a siloed model and only have one tenant, you have this support for multiple tenants if you were to build out a pool as well. In this case with S3, for example, you can use IAM policies to restrict access to S3 bucket paths. Your resources that are running as tenant 1, for example, will take on an identity that has an IAM policy related to it that only allows it to get at tenant 1's data, for example. Most SaaS is business to business, or at least in terms of SaaS that VCs have invested in. This is more data from Dealroom. I think that's 88%, something like that. We are solving B2B problems for the most part in the SaaS space. B2B, eventually you will need to build a siloed tenancy model. That seems to be the trend that we've seen.
Even if you start in a pool model, the larger the customer you try and sell to, the more risk-averse they become, or the more risk-averse they are. The compliance obligations that that business has, they will enforce upon you, even if the data that you're holding for them isn't regulatorily controlled data. They just take a non-nuanced approach to asking you to build certain controls into the stuff that you're selling to them. Large B2B customers are almost always going to require siloed tenancy to some extent. If you can't support a siloed tenancy model, you might not make a sale to those types of businesses. You'll have the sales team on your back demanding this if you don't have it from early on as well.
Mistake 3: Not Calculating Baseline Per-Tenant Costs, and Testing This with the Market
The issue with that is that deploying a siloed tenancy model is the most expensive way of managing tenancy across SaaS. Depending on the architectural choices you've made, there will be a baseline cost to running a tenant that you'll be able to calculate. If you don't calculate that cost upfront as you're making the design, before you start building anything, and make sure that your customers are willing to pay you that money and then some margin on top, then you might get yourself into trouble. This is a cost graph from one of our customers with whom we're building an AI platform. This is their per-tenant cost. Each colored line is a different tenant. It's a very elastic workload. You can see some tenants are paying 240ドル a month, for example, and some almost nothing.
The baseline cost for what they're building is pretty inexpensive because it's a very elastic architecture. In the case where actually the baseline cost was 240ドル a month, you need to make sure that your customers are willing to pay at least 240ドル a month.
Otherwise, your architectural choices are going to price you out of the market. In fact, the MedTech customer I mentioned earlier, when they first built their platform, they were selling to big pharma companies, so Roche and Pfizer and those kinds of guys. There's a limited number of those customers. They all have a lot of money. The product that my customer was selling was an antibiotics, not vitamins product. It was something that those customers needed. There was a legal requirement for them to provide a process, and the customer we were working with was selling them tooling to do that. Once they'd saturated the upper echelons of the pharmaceuticals market, and they started talking to the other mid-market players, they realized that those players were not going to pay their money that they were asking for, for that product.
Some of the work we're doing, I mentioned there's a container-based tenancy model there, we ended up in that position, built that container-based model in order to reduce the total cost of deploying that on a per smaller tenant basis. Now they have a bridge model where their smaller customers have some shared infrastructure and a shared container platform, and their larger customers have their own instances running the software. They can support both of those, but they each come at separate price points.
Unsurprisingly, complexity increases with the number of tenants. Everything is easy for small n. As soon as n grows, everything gets much more complicated. It's important to think about that as you're designing your approach to this stuff. Processes and operational approaches that work for 3 or 4 tenants, once you scale that up to 20, to 200, to 1,000, that all looks quite different. It's important to look ahead and think about that when you're building. Have some idea of your eventual total addressable market as you're planning this. The type of complexity we're talking about here is the provisioning and teardown of tenants. As people buy or even just trial the platform, resources need to be provisioned. They need to be torn down when a customer stops paying or the trial ends, and you don't make a sale.
The managed configuration for each tenant, and that config is a range of things. It might include feature flags for that tenancy. It might include some of the billing setup, some of the identity stuff. Managing identity across tenants, another phase of complexity you probably have. If you're using Cognito on AWS, for example, you'll have a user pool in every tenant, at least one, to manage identities there. Billing is a consideration. Attribution of cost, making sure that you can tell what each user is spending, essentially. Gathering metrics so that you can understand how that platform is running. Then the lifecycle of updating software as you make changes. All of that for small n is quite straightforward. If you're building software today, you're familiar with that. Once you scale that up to large numbers of paying customers, it becomes a different story.
Mistake 4: Not Automating Tenant Provisioning
Mistake four, really, is not automating tenant provisioning. You need to be thinking about automating provisioning and all these other lifecycle issues. This is a trend we've seen a lot over the last few years, a lot of work we've picked up in that period of time. The reason why it becomes important is because if you've got a sales cycle that is 2 months, 3 months long, which is not unusual in the enterprise space, sometimes 6 or so, once you've got the customer to agree to trial your application, you need to put that application in front of them as soon as you can because they are, at that point, interested in looking.
If you're reliant on your engineering teams or your DevOps folk to provision new tenants for every demo or every sale that's been made, then you're going to be sat around waiting for people to finish their feature work to get on with that stuff. It's a distraction as well. It's pretty rote work. If you've got a larger sales team, such as you might have if you've just received investment, you've got to hire a bigger sales team. They're all off talking to enterprises. They're all bringing back requests for new tenants into the business. It could become untenable. You don't want engineering to become a bottleneck for the commercial side of the business. That's an absolute no-no. You don't win any friends if you do that. Thinking about automating this stuff early enough that it doesn't start causing a problem already is the thing to do. How we do that, we're going to talk about the two different considerations of SaaS. These are patterns that I think we've seen emerge over the last few years. AWS have certainly been talking about them. We've been talking about them, which is the distinction between the application plane of your platform and the control plane.
The application plane is where, obviously, your application components run. It's where your container platform might live. It's where the web application runs. It's where your services are. It's where your API gateways are. It's where your databases are, and so forth. The control plane is an orchestration layer that takes care of the things that run within the application plane and automates things like tenant onboarding, billing, automation, and so forth. Control planes, you can't just go and buy one off the shelf. If you could, I'd probably have built one and I'd be selling it. They differ across all SaaS platforms because the needs of each SaaS platform is different.
The concepts are the same in each case. The types of things that a control plane needs to be able to do look that way. If you are building on AWS, there's a thing called the SaaS Builder's Toolkit that you can go and have a look at. It's on GitHub, which is an attempt to provide a reference architecture for this sort of thing. Classic AWS fashion, it's full of Lambdas and Step Functions and Java, I think. Maybe Python. We've used it to refer to. We haven't implemented it as-is, but it's a good reference point.
We'll walk through the things that a control plane will do for you. Control plane typically lives in its own account. You can think of it as its own application or suite of applications. It provides config for the tenants and things like the software artifacts that get deployed into your application plane and the services that are required for running or controlling the platform. Let's see what happens when we provision a new tenant. We've made a sale or our sales team have gone out and commissioned a demo. There'll be some form of interface, maybe an API or a web interface to express that you would like to provision a new tenant. That tenant and its data then ends up in the config store. The control plane kicks off a provisioning run which will provision a brand-new AWS account. This is a siloed tenancy model in this particular case.
That AWS account, typically the way we would build this is to vend it out using Control Tower and to provision an account baseline into it using the primitives that Control Tower provides. The account baseline is your starting point for security, monitoring, logging, and so forth of those accounts. The control plane takes care of which region we put this into. It may well be that this customer is in the UK, in which case we'll provision it into London. Maybe it's in the U.S., we'll provision it to the U.S. The key thing here is that the control plane can live anywhere. Your application plane might need to live in different places.
Once the scaffolding of a new AWS account has been set up and identity is managed and all that stuff, the next thing that happens is that we provision the components that the application is going to require. In our world, we might use Control Tower for this as well. The service catalog would allow us to vend out components for a particular application into that account. As an example here, I've deployed a container platform. There's a database. There's a Cognito instance and a load balancer. We haven't put any data or code into this yet. This is just provisioning of components. Next up, we create a deployment pipeline for the tenants.
In the final world, the control plane will have a number of deployment pipelines, one for every tenant, to orchestrate deployment on behalf of that tenant. As well as provisioning the deployment pipeline in the control plane, we're provisioning some resources inside the tenant account to receive events and perform operations on behalf of that pipeline. It's just a security separation that allows us to work in the SaaS Builder Toolkit version of this. That's all done using EventBridge and Step Functions. It's quite straightforward and quite serverless. There are no resources to run all the time.
Then we use the deployment pipeline to deploy the latest version of the application into the tenant account. That's the deployment pipeline kicking off activity within the orchestration layer inside the tenant account. The orchestration layer is then causing the resources within the tenant account to fetch, for example, container images from Elastic Container Registry to maybe run some bootstrapping and provision data into the identity store, into the database as a starting point.
At this point, the control plane might then send out automated emails to the end customer or the admin user of the end customer and say, your platform's ready now. Do these things in order to set yourself up as an admin user. Off they go, and start using the platform.
We're also going to, at that point, set up metering. As the application starts and is running, there'll be some mechanism by which you are measuring usage of that application in order to report that back into the control plane. That'll vary based on the SaaS that you're building. Actually, a lot of our customers don't worry about this sort of thing because they're billing by the user year or something, so there's not as much effort to go to. If this was a usage-based SaaS, such as the AI SaaS I mentioned, the cross-billing is based on the amount of SageMaker inference that they do, and so they count that and then report it back into the billing side of the control plane. That might interface with the AWS Marketplace in order to do the billing and taking the money off the customer itself.
In fact, at this point, you might have integrated Marketplace with the control plane, so the actual purchase of the SaaS product by your customer could be orchestrated by Marketplace, if that was a way that you wanted to sell. Then, subsequently, there'll be some management to do with the application. You'll release a new version maybe multiple times a week, in which case the deployment pipeline is going to take care of that, kicking off the database migrations and the new versioning and the blue-green deploying and all of the good stuff that you're doing for application deployment.
The control plane, you can see it's pretty complex. There's a lot of stuff going on there. It's provisioning and tearing down tenants. It's deploying new software versions. It's orchestrating patching. It's managing billing and cost attribution. It's doing central logging and monitoring, and all that stuff. These all, to me, sound like DevOps things. If they sound like DevOps things to you as well, that's probably correct. The distinction here is that the DevOps-y stuff that's going on here is orchestrated through some central platform that is tenant-aware. It's also region-aware. It might be cell-aware if you're deploying a cell-based architecture. This is all the things that you know, probably, but maybe orchestrated in a way that you're not as familiar with. I stole this from the AWS blog on tenant onboarding because it's a prettier diagram than mine. This is an example of the same sort of thing, but with some more specificity.
In this case, the SaaS control plane has been built using Lambdas. The trigger that comes in fires the Lambda that kicks off some tenant provisioning, which orchestrates some code pipeline that pushes some deployment out into CloudFormation, which then pushes CloudFormation into the application plane and manages the deployment across multiple tiers of applications. In the real world, what you might be thinking about is that deployment may be doing some canary deployment into one tenant to start with and seeing how that goes, or grouping tenants together to deploy them, or maybe deploying EU tenants during EU non-working hours, and the same for the U.S., and so forth. Although I'm showing this using CloudFormation, there's no reason this can't work with Terraform, or Pulumi, or whatever else you're using these days. Many of our customers are Terraform users, and the orchestration that we build for them is Terraform-based.
Just a brief thing on Control Tower and landing zones because I don't know if that's a universally understood thing. If you're building on AWS today and you are not using Control Tower and having built a landing zone, you are missing out on a lot of good stuff. We build Control Tower landing zones for SaaS businesses frequently. I think we've built about 100 of these over the last 4 or 5 years. A landing zone is your governance layer. It's a set of standards that you have determined or that your platform team have determined should be used for all cloud operations within your organization such that if you as a product team member wants to build on AWS, you come and click a button, and Control Tower will vend you an AWS account with all of that governance applied from day one before you can get into it.
That governance will include account baselining, logging, monitoring, identity, and so forth. A lot of businesses built things that look like this using Terraform before Control Tower was available. This in combination with a control plane and AWS Marketplace is a great way of building this thing on AWS. Not restricted to SaaS, though, this is great for enterprises as well.
Mistake 5: Delivering Unique Features, or Different Versions to Tenants
Let's talk about demanding customers. I'm sure when I say demanding customers, you all have an idea who I'm talking about because you've all worked with at least one. Some customers can be very demanding. Sales teams unfortunately are programmed to say yes to things. Often, a demanding customer will get you in a position that you don't want to be in because your sales team has said yes to them. Mistake five, delivering unique features or different versions to tenants. This will kill operations teams if you are not careful. SaaS really, philosophically speaking, means delivering the same software to everybody, maybe with feature flags or features turned on or off based on what our customer is paying. If you allow unique features per customer, you'll have a bad time because it becomes operationally very complex.
Similarly, if you allow customers to dictate to you or decide when and where software updates happen, then you're baking operational complexity into your world there as well. You probably sacrifice most of your weekends to software upgrades because enterprises like to see software upgrades happen out of hours. The best way to manage this is to have a really strong product philosophy. This is a product consideration or a leadership consideration. It's important, I think, as a SaaS business to have a really clear vision about what it is that you're building and for whom. Crucially, on the flip side of that, what you are not building and who you are not serving. Having established that philosophy, get yourself and your sales team comfortable with something called a strategic no.
The strategic no would be explaining to a sales prospect how you can't do what they're asking for because it doesn't align with your product vision. They might laugh you out of the room for using that language, so probably try and describe it a little different. Also, you can strategically say no by offering alternative solutions using features that do exist, for example, or suggesting workarounds, or providing integration hooks that allow them to build the thing that they want alongside the thing that you are selling to them. Providing a really clear rationale as to why you are not prepared to do that on their behalf.
Mistake 6: Deploying Your Software into Your Customers' AWS Accounts
Mistake six, demanding customers sometimes will ask you to deploy your software into their AWS accounts. We've seen this a few times. Initially, customers were just afraid of the cloud. We don't want to use that, it's cloud. Then they were like, we've adopted cloud now, so the way that we can manage our risk around us buying your software is that you do it in our cloud. That's also not the right way to go. They ask this because they believe that it helps with their security and compliance posture. It might be a solution for them, but I would advise against saying yes to this type of request, because you essentially lose control of how you deploy things or how you operate stuff, how you architect things.
Typically, a customer who is insisting on you building in their AWS account also have strong opinions about how you build in there, possibly which databases have been blessed by their security team and so forth, and that gets pretty hairy pretty quickly. It's definitely going to increase your operational complexity, and operational complexity at larger N is already difficult, so not adding any more of that is probably worthwhile. The other thing this introduces is intellectual property concerns because if you are running this in your customers' accounts, they can see into those accounts. If you're writing your code in an interpreted language or anything, there is a possibility that they can go rummaging around in your software, which you would probably prefer not to do.
Philosophically, if you're selling software that's deployed into customers' AWS accounts, you're not selling software as a service, you are selling software. Philosophically, this isn't really SaaS, and you should probably push back on that.
You can address the concerns that those buyers have by being really clear about how the tenancy separation works. If this is a security and compliance concern for them, you can show them that you are allowing them to hit those requirements through the architectural decisions that you've made.
Some of the work that we've done with customers selling to larger businesses is writing security and architecture documentation that satisfies compliance teams about this stuff. You can also, if the concern is about network connectivity or something else, there are ways in which you can architect that using PrivateLink or Direct Connect so that they can feel comfort that their data isn't ever leaving a network that they have some responsibility for. I think it's important to clearly communicate the benefits of SaaS to those customers, particularly to the buyer. The security team won't care. The buyer will care about improved service quality, and the fact that you can release updates to the software more quickly, and the lower cost of doing that as well. One thing that we find helps when selling to enterprise buyers is to get your platform certified against some standard.
Many of our customers who sell to enterprise businesses found that by achieving ISO 27001, that opened up deals that wouldn't otherwise be available to them, and in some cases shortened the sales lead time for other deals. Agreeing contractually the things that the customer cares about around security and availability and so forth in your service level agreements is also a way that you can convince people of this stuff. There are other hacks you can use. We've got a customer who built a really high security platform that had sight of all email flowing through a business, and the way that they solved for this problem was by building a platform that consumed KMS keys from the customer's account.
The customer still had control over this ripcord where they could say, no, we don't trust that platform anymore, and pull the key or pull the access to the key, and then the account for our customer couldn't get at or couldn't read that data anymore. They were protected from those sorts of incidents in their view, and probably in real life as well. It's not always hand in hand. If you're selling this type of platform and you are building on AWS, this type of thing you can sometimes roll an AWS rep out to go and soothe people over it. Big enterprise buyers typically have an AWS account manager who they can go to with these sorts of questions or some solutions architects who are aligned to them, and having AWS help allay the concern of your customer around this stuff, is quite useful as well. Adds a bit of clout.
Mistake 7: Building Your Solution to Be Multi-Cloud, or "Cloud Agnostic"
Unfortunately, I think multi-cloud is bullshit. It is a mistake, I believe, to build your SaaS solution to be multi-cloud or cloud agnostic. Multi-cloud may have its place in the enterprise. I actually don't believe that either, to be fair. Larger customers, larger buyers, might believe that they can dictate which cloud vendor you deploy onto. This keeps coming up throughout my career actually. SaaS businesses saying, so we've built this platform on AWS, but do you know any Azure? Because we're talking to Walmart, and Walmart are in competition with Amazon and therefore will not use AWS for anything. Which again is bullshit because you can see that Walmart hire AWS engineers. It comes up often.
As a small or medium business, and certainly an earlier stage business, you really shouldn't be worried about building on more than one cloud platform. I think this is another philosophy question. I think you just push back and say, sorry, we don't support Azure today. I don't think you should be thinking about building an Azure version of your offering or a GCP version of your offering until you've saturated the market for people who will buy your existing AWS version. Assuming you've started building on AWS, which you should. Why not multi-cloud? Duplicated effort is the main thing. If you're building on the cloud, the right thing to do is to adopt all of the high-level services that the cloud vendor provides so that you don't have to build those things. I'm talking about managed databases, managed monitoring platforms, managed logging services, managed security services, and so forth.
If you were to build on a second cloud, having established one, then you would start from scratch, build another entirely new infrastructure for your applications, and of course build an entirely new control plane as well, probably, or at least the orchestration parts of that as well. Everything gets harder and more complicated. You're going to do that for just one customer? That doesn't seem like a good idea to me. Some of you are probably thinking, but Kubernetes. Kubernetes helps with multi-cloud and cloud agnosticism.
Unfortunately, that's bullshit as well. If you're thinking about solving this problem by building on Kubernetes, if you're building on the cloud, making good architectural choices, when you're running Kubernetes, you're going to use the cloud vendor's Kubernetes platform, and you're going to deploy your application containers on there, and maybe a couple of other sidecars and so forth. You're still going to be adopting the managed services of your cloud vendor, because if you don't do that, and you are thinking of Kubernetes as a fully agnostic solution, what you're going to do is re-implement all those managed services inside Kubernetes yourself. Now not only are you building a SaaS, you're also building a monitoring platform and a security tooling and managed databases. Don't do that. That's foolish.
Mistake 8: Deploying Your Solution On-Prem for Customers
It's also, as an extension to this, I think foolish to allow a customer to talk you into deploying your solution onto their network, on-prem, in their data centers. It's 2025, we shouldn't be doing that anymore. It's probably actually worse than going onto a second cloud. This is how we used to deliver software in the pre-SaaS days. I think IBM used to call this ISV, Independent Software Vendors. The way that this worked typically was that your developers write a new version of the software, and maybe once a quarter, if you were lucky, they'd print a DVD or a CD-ROM, or maybe they'd put it on the internet to download if they were feeling novel. They'd throw that at the customer's sysadmin, who'd get around eventually to scheduling the deployment of your new version of the software.
Then users would use that on the infrastructure that the customer provided to you. You as the vendor, when something went wrong, your helpdesk and your engineers would be fighting firewall rules and VPNs and stuff to get in and troubleshoot it. This was hell. Who did this? I see the scars and the tears. We don't want to go back here again. It's not where we want to be. Because the issue here is that in this world, the customer is defining all these things. They are saying, we're going to be on this type of Dell server, and the network is laid out like this, and we're using this version of Red Hat. Yes, the commercial version because we're spending money on things we don't need to buy. Sorry Red Hatters. Maybe you do need to buy it now. Determine the file system layout. We did this for a customer where the operations team insisted that the MySQL databases be held under like opt something, something, something, because that was a mount point from a SAN. It was horrible.
The customer defines the security controls, the monitoring that's being done, and how frequently things can be released. This just gets in the way of everything. More to that, the more customers you have, the more unique deployments you have of your application, and every deployment becomes a beautiful and unique snowflake. The issue with taking this approach is that even though in the past the sales team might say, "Yes, we can do that. It's just software. Just software that runs on some servers. You've got some servers. We'll put our software there". Unfortunately, those sales teams would never price in the complexity of this stuff. We are literally talking about scaling out operations teams in your SaaS business in order to support this great number of unique deployments. It was never a good idea then. It's definitely not a good idea now in 2025.
Mistake 9: Not Thinking Sufficiently About Backups and Disaster Recovery
A bit of a change in tack. We frequently see people building SaaS not really thinking about backups and disaster recovery. Everybody is better at building platforms than they are at securing or running them was the findings that we had found in reviewing about 200 infrastructures using the AWS Well-Architected Framework. We're now at about 700 or 800, I think. This is still true. The reason for that is people are optimists. Optimism is a source of high-risk thinking. If you are unwilling to accept that things could go wrong, you are not going to sit and think about the ways in which they will and the ways in which you could avoid or mitigate or recover from that. A better philosopher on this topic is Werner, CTO of Amazon, who says everything fails all the time. Amazon scale, I'm sure that's true. It's also a good philosophy. It's why AWS has been built in a way that can withstand certain failure models. It's why they give you architectural guidance to avoid certain failure models as long as you use the primitive that the platform provides.
The best way of avoiding this type of optimism is to do something called a premortem. Imagine that the platform has failed. This is a tabletop exercise. Sit around, imagine how the platform has failed. Write down how it could have happened, and then repeat that. As you do that, you build a risk register, essentially. Then you can prioritize the mitigation of those risks by likelihood and impact. If you've done ISO 27001 or something adjacent to that, this is the type of thinking that you bring to the ISO landscape.
Then, crucially, adding mitigation and improvements to your product sprints so that you are constantly evolving the platform to avoid failure modes. The DR strategies that you can use here, again, as with all architecture, it depends. You've got a spectrum of options. At one end, you just back up your databases, and then if everything goes wrong, you restore them again, perhaps having reprovisioned all your infrastructure. That's a strategy that requires that your recovery point objective and recovery time objectives are quite lax, and that you're willing to have an outage of a long period of time.
The other end of the spectrum, you've got fully active-active multi-site deployments, which are more expensive to run because you are running the same platform in multiple locations and managing cross-region data replication and stuff. AWS improves every year on what they offer around cross-region services. This is getting easier over time, but no cheaper, I think is probably the best way to think about it.
Mistake 10: Not Classifying Your Data Assets
Finally, mistake 10, which relates to this, not classifying data assets. You need to know where your data is, what type of data is in there, and who has responsibility for it in order to build a proper backup and DR plan, and also, really, to build a good security model around it. What we recommend in this space is to think about building a data catalog, essentially, so that you know what type of data lives where. What service is it stored in. How it's categorized from a security perspective. How often you are backing them up. How long you need to retain them. Who's responsible for that, who's going to make decisions about it. Here you can see Ben Wyatt is responsible for the audit logs, which we are obliged by our compliance regime to keep for 7 years. Card data is, of course, PCI DSS restricted, so we're going to lock that down much more than we would lock anything else down.
Summary
In summation, you should make sure that you are building something customers want. It doesn't matter if you're building SaaS, you should just do that if you're a business. If you're building SaaS, you should bake tenancy considerations into design from day one, and make sure that you understand the costs of those choices, and to make sure that the market can bear that type of cost model. You should build a control plane to automate tenant provisioning and lifecycle, and to orchestrate all the DevOps-y goodness that goes on in your SaaS. You should avoid being talked into custom, multi-cloud, or on-prem deployments by sales teams mostly. Develop and stick to your product philosophy. Hold on to it, because it will save you from a lot of wailing and gnashing of teeth. You should classify your data assets, secure them appropriately. You should build a robust backup and DR solution.
Mistake 11: Listening to Consultants Instead of Understanding Your Customer Base
A final bonus, 11 out of 10, don't listen to consultants instead of understanding your customer base. These are lessons that I think I and my team have learned. There are circumstances where some of what I've said probably don't apply. In particular, the pharmaceutical customer I was talking about earlier, they do actually have to allow customers to roll out their software on their own schedule, because every software rollout requires, by the laws of their compliance regime, weeks of manual testing, and so it isn't possible for them to build a SaaS that doesn't do that. It has to make commercial sense. You have to make decisions based on what is actually commercially sensible. If you can't make profit and avoid burning out your teams, then you probably shouldn't be doing that thing.
See more presentations with transcripts
This content is in the DevOps topic
Related Topics:
-
Related Editorial
-
Popular across InfoQ
-
AWS Introduces EC2 Instance Attestation
-
AWS Launches Amazon Quick Suite, an Agentic AI Workspace
-
Google Introduces LLM-Evalkit to Bring Order and Metrics to Prompt Engineering
-
Java News Roundup: OpenJDK, Spring RCs, Jakarta EE, Payara Platform, WildFly, Testcontainers
-
Three Questions That Help You Build a Better Software Architecture
-
Cloud and DevOps InfoQ Trends Report 2025
-