Hi, ladies and gentlemen. Welcome to migrating SQL Server to the Cloud. I'll be your host for this session, Brent Ozar. I'm a Microsoft Certified Master, which just means I've made a lot of expensive mistakes with SQL Server. And now, I try to help other folks avoid those same mistakes.
A lot of what I do as a performance tuning consultant these days is moving some of my clients towards the cloud because it is, of course so reactive. Lets you adjust capacity so much more easily than those options that we had down on Premises. In this session, I'm going to really focus on people who have an array of different SQL servers.
I work a lot with small businesses. I understand why small businesses want to jump to the cloud because they don't have full Realtime Database administrators. But typically, y'all attending here in the audience are the kind of people who have a bunch of SQL servers with a legacy amount of different versions, different configurations. Some of them are clusters. Some of them use log shipping.
And you got this whole mishmash of different servers. You want to know how you're supposed to map these things up into the cloud. You might have heard that lift and shift is typically bad. It's not the most cost effective way to move your SQL servers up. And you're wondering what your other options are and which of your SQL servers make sense to do each of those options.
Keep in mind that as we talk through this, everything that I'll discuss in here expires. The cloud, of course, changes rapid fire that's why you're here. I always tell people don't learn the cloud until you actually have to go jump into it because otherwise you're kind of wasting your time learning things that are about to expire. As we record this here for Quest Empower. This is all true as of Fall 2022. But if you're watching a recording of this just keep in mind that the strategy stuff may be the same, but the tactics can be different depending on things like the size of the servers, what options are available up in Amazon Azure and so forth.
Now, we're doing this-- I'm pre-recording all of the material. But the good part about that is that I have unlimited time to interact with you in chat. So as you're going through this, feel free to type in any questions that occur to you as we're working through it. I'll be interacting live and chat. And that's true with all of the sessions today at Quest Empower.
So let's get started. When you're moving SQL Server to the cloud, you need to gather three metrics about your SQL servers. You need to gather number one, how much data are we hosting, number two, how fast is this server going, and number three, how hard is it working, the server working in order to achieve that speed? These sound really simple. And yet still most of us don't have these metrics for most of our databases.
So number one for the first one, all you really have to do is go in. You can even go into Windows Explorer if that's what you want to do. If you want to get a little bit fancier in terms of measurement gathering, you can go in and look at things like [INAUDIBLE] backup files or MSDB DBO backup files to go track the sizes of your databases over history. Number two-- and I should explain to you on that number one, I give different advice up in the cloud for SQL servers that have, say one terabyte of data as opposed to 10 gigabytes of data. And we'll talk about here in a second what normal ranges are for each of these metrics.
Number two, how fast is our SQL server going? How many batch requests a second is it processing? It's really kind of funny that even today in 2022, there's not a measurement of how many queries SQL server is running per second. You would think that there would be an accurate number there. And there is a metric for transactions per second but not every query is considered a transaction. Every query is in part of a batch request.
And those of you who do T-SQL tuning will recognize that some batches are bigger than others. Some batches have 1,000 T-SQL statements in them. There are those big long ugly stored procedures that you absolutely hate to tune. Others are just a single one off query. But even if I measured queries per second, that's not really accurate to compare one SQL Server to another, either because some queries are super simple.
Insert into table a list of values, other queries or that terrible data warehouse report that joins 50 different tables together all of which are cross joins and left outer joins. But all I'm trying to do is get a rough idea of just how many queries per second. This thing's handling and that's what the perfmom counter batch requests a second is good for.
And the number three, wait time ratio over time. This one's a little bit tricky. And you really want to monitoring tool in order to capture this. Wait time ratio tells you that over how many seconds, let's say that you're watching the SQL server for 10 seconds straight, how many seconds of wait time did you pile up during those 10 seconds?
The more wait time that you pile up on the clock for every second that passes by or every hour that passes by, the more time that SQL server spending waiting on storage or blocking or latches, the more time that it spend waiting. That's a good indication that there's smoke coming out of the side of your SQL server. Generally, companies call me for performance tuning, consulting, whenever their wait time ratio is up pretty high over one, meaning for every hour on the clock they are spending hours of time waiting on some external resources.
Now, I really recommend that folks use third party monitoring tools. And I'm not just saying that because I'm here at Quest Empower. I believe that if I was a database administrator full time ever again, I would require a monitoring app in order to get my job done.
It's like you look at these three numbers, they come from totally different places. A lot of tools don't show you. If you're going to run DMV queries, a lot of them don't show you how big the server is. A lot of them don't show you batch requests the second. A lot of them don't show you wait time ratio. And they certainly don't trend it over time.
I've done the best that I can with the first responder kit scripts. If you go to FirstResponderKit.org. I've got a GitHub repo over there where we've got a bunch of open source scripts. It's totally free to go download and use. It is by no means a replacement for third party monitoring software.
Because for example, this will only give you metrics about your SQL server as of right now. If you run sp_BlitzFirst, you run that with the seconds equals 60 parameter. What that does is it takes a 60 second sample of your weight stats and gives you a rough idea of things like how much your-- how many queries per second you're doing, batch requests per second, how much CPU utilization you have and so forth.
If you scroll down in those results sets down in the server info section, the section that's highlighted there on the screen, down in the server info section, you can see right now during this 62nd sample, how mini batch requests a seconds you did, what your wait time was, your total database size and so forth. Now, like I said the bummer of this is that this is only as of right now.
And what you're really going to want is you're going to want to sample it over time, because you're going to want to know what your peak business hours look like, what you're after hours looks like if you're dealing with data warehouses that do data loads after hours. So you can run a SQL agent job every 15 minutes and don't do anything other than 15 minutes.
Don't do it every minute. Don't think that you're going to get more information because you're not. There's things hardcoded into sp_BlitzFirst that are really tuned for 15 minute samples. So you run them every 15 minutes.
And it'll let you keep seven days worth of history in there. This is not as good as a third party monitoring tool because of course, there's also no centralization. There's no gathering all of this data across all of your SQL servers.
So now what's normal when you start to gather these metrics, what's normal for each of those metrics? Let's start with number one, SQL servers by total data size, how much data is stored on that server. What I've done is I've separated out the SQL server population. And I said based on the metrics that I've collected, about one quarter of the audience, about one quarter of the SQL servers in the audience have between 0 and 50 gigabytes worth of data altogether and about another 23% have 51 to 250 gigabytes of data.
To really, you can think about it, is half of the SQL servers out there are small enough that they could fit on a USB thumb drive or an SD card going into a camera. I'm not saying that it's easy to make those SQL servers to perform necessarily. You can write bad code that makes servers of that size crawl.
But it's just easier to make code perform when you're handling less than 250 gigabytes of data. It's easy to throw memory at the problem. It's easy to throw CPU cores at the problem.
But you pop up to the next tier. And about 30% of SQL servers out there host between 250 gigabytes and 1 terabyte worth of data. Now in the old days, back in the 2000s, we used to think of one terabyte as the starting point for what we called very large databases, or VLDBs. And it used to be very rare to see SQL server databases of one terabyte or larger.
Not true anymore. These days, it's relatively common. One in five SQL servers hosts more than one terabyte worth of data.
But a lot of you out here who have more than one terabyte of data don't realize that you're on the high side. You're in the top 20%. And you see how I color coded those green, yellow, and red. Red doesn't mean that you can't go into the cloud. You absolutely can.
But in green, this is where the cloud really focused first. This doesn't take much power, lots of these can pile into the same hardware Microsoft can host tons of these all inside the same server, because it just doesn't take that much hardware in order to make a 0 to 50 gigabyte data-- database fast. Backup, restore, and check CHECKDB are easy.
So this is one of the reasons that small businesses absolutely love the cloud. It's easy to deal with these up in the cloud. Now, that's the first metric, the total database size. Now, let's take number two, batch requests per second.
When we're dealing with batch requests a second, which is you can think of it as how many queries per second we're running, about one third of the audience, one third of the SQL servers in the audience run 0 to 10 batch requests a second. Now, look. I know some of you are saying that you have a data warehouse. And it only does say maybe, five or 10 reports every 10 seconds. It's not doing that many at all.
And you're still having performance problems because the queries are absolutely hideous. And they scan all kinds of data that's absolutely fair and true. But for most of the audience out there, these servers that are running 0 to 10 batch requests a second, they're internal infrastructure servers, their development and test SQL servers, they're report servers that mostly sit around idle most of the time, those are absolutely the green zone for moving into the cloud. You get to 11 to 100 batch requests a second, things start to become a little bit more complicated.
For example, at 11 to 100 batch requests a second, it's really easy to have a blocking pileup because if you take out a lock on a table and 100 queries are running per second against that table, well next thing you know, you've got 100 queries blocked, then 200 queries blocked, then 300 queries blocked and so forth. And you get up into the red zone there when you're running 100 to 1,000 to over 1,000 batch requests a second. That is by no means hard for SQL server.
I've got clients that do 50 to 100,000 batch requests a second. But that's much more challenging because if you start a blocking firestorm when you've got 50,000 queries per second coming through, it's really easy for your cluster to start failing health checks and to failover a SQL server from one instance to another. Those kinds of organizations tend to be very well tuned, very well rehearsed. They work really hard to make sure that they don't run into problems.
That's why when I look at the green zone down there, 0 to 10 batch requests a second, that's also why the cloud focused here first. You can think of these as blogs or company website databases that weren't doing e-commerce. It's easy to find maintenance windows on these if we need to take a 32nd outage to failover a cluster from one node to another. It's not like we're interrupting that many queries.
It's easier to deal with the transaction log at this size. I don't have to worry about someone doing-- I don't have to worry as much about someone doing a begin tran and holding out locks that block all kinds of other transactions. And in most of the cases here, 0 to 10 batch requests a second, you probably don't need a full-time database administrator. Now again, data warehouses, they're different. You probably do need a DBA.
Finally, with the last metric as a reminder, weight time ratio it's probably the most challenging metric for a lot of us in this audience, because we haven't seen that metric before. It's something that weight of our performance monitoring tools, things like Spotlight use a lot. The higher that weight time ratio is, the more your SQL server is waiting on stuff.
Well, if we look at what's normal out in the cloud, about half of the audience is basically sitting idle for every hour on the clock. Their SQL server isn't even generating 15 minutes worth of wait time. For most of the time, most of their processors are sitting around idle not doing anything. That can mean that there's no load on the server.
But what else it can also mean? It can also mean that you've tuned the daylights out of your SQL server. So even though you have a whole lot of queries running per second, they all respond really quickly.
That green zone down there is again, where it's really easy to move that thing to the cloud. The hardware requirements are likely low. It's going to be easy for companies like Microsoft and Amazon to host. It's going to be easy to find maintenance Windows because if the box goes down for 30 seconds and comes back up, we're probably not dealing with a huge blocking firestorm or a whole bunch of storage weights.
So let's zoom out and look at what we've talked about so far and think of these as small, medium, and large. If you have-- for question number one, total data size, if you only have 0 to 250 gigabytes of data and on batch requests a second, you only have 0 to 10 batch requests a second. And under wait time ratio, you've got less than 15 minutes an hour worth of wait time.
Then you know what? You could do platform as a service. You can do infrastructure as a service VMs. You could pretty much just throw that thing over the fence up to any cloud provider and they're probably going to do OK. Especially in the year 2022, that just isn't a very high barrier for most cloud companies to manage.
But when you go up a little higher and let's say, remember these are ands, if you have 250 gigs to one terabyte worth of data and you have 10 to 100 batch requests a second and you're already waiting. And every hour, you're waiting an hour's worth of time on things like storage. You're probably going to be able to go to platform as a service. You're probably going to be able to go to infrastructure as a service, but you just have to be aware that you're going to have to put some work in.
You may have to tune indexes. You may have to use edge case features for that cloud provider. And when you get up on the high side, when you get up over a terabyte, when you get up over 100 batch requests a second, and you get up over 1% ratio on your wait time, you might not be able to go to either platform as a service or infrastructure as a service.
These aren't black and whites. And of course, most you in the audience have mixes. Some of your metrics are green. Some of them are yellow. Some of them are red.
The goal here of this red, yellow, and green color coding is just to give you a real quick rough idea of how easy the migration will be for you going up to the cloud. There are absolutely some very small low-load databases that have a very tough time going up into the platform as a service. For example, if you're hooked on service broker and you try to use say Amazon RDS or Azure SQL DB, you may have a harder time.
On the flip side, I have Win Client with a 10 terabyte database. And they moved up to infrastructure as a service. And they're doing just fine. But it's because they put a whole ton of work into that database in order to make it work well.
So now I've got my three answers in there for metrics. I'm going to go gather those first before I approach my stakeholders, because I want to be roughly-- understand it roughly how scared I need to be before I go talk to the stakeholders. And then for the stakeholders, these are the people who run those databases and the people who care about them passionately.
In a perfect world, I'm going to get them over coffee or corner them somewhere and just ask them three questions. Number one, are you content with your application's current performance? You've been working in tech for a while. And you know what the answer to that is. The answer to that is no. No one is ever happy in IT about anything. Everybody's always got something to complain about.
So the way that I would ask it instead is, if I migrate this application to the cloud and we get exactly the same performance up in the cloud that we get today on Premises, is this project going to be a success or a failure? If they tell me that I can migrate this thing to the cloud and it'll have the same performance and they'll be happy with it, great. That means my job is easier.
Because if somebody comes in and says afterwards, our performance used to be so good on Premises, and I have monitoring that I can point to and go that query that you're talking about, it performed exactly the same back on Premises as it does today, you're just old man yelling at cloud. The more prepared that I can be for that if I know which queries I need to be monitoring, which business processes I need to be monitoring, I might even be able to cut costs.
Because frankly, with the way that a lot of us manage SQL servers on Premises, we would buy hardware like every three years. And we would try to make a guess on what kind of hardware we needed. We were often wrong.
In the cloud, I can tune dials all the time and cut my costs, end up saving money as long as the people are OK with the same performance. But if on the other hand, those people, those stakeholders tell me that if we go live on the cloud and the same performance is a failure, a-ha. This helps me explain to the audience when my stakeholders say well right now, we have this level of server, why is the cloud so much more expensive?
I can come right back to this slide and I can say, hey remember that told me you were totally pissed off about your current performance, it doesn't matter how cheap your current hardware was. Doesn't matter if you were running this thing on an Etch A Sketch. You were already disappointed and you thought that application's performance was a failure, therefore that is why we have to invest more as we go up into the cloud.
And investing more doesn't necessarily mean hardware. It can mean tuning queries, tuning indexes. Very often, that's exactly what I do. When I help clients go to the cloud is this so that they can cut their bills when they go up to the cloud.
Companies when they read brochures about the cloud, when they read magazines, they hear that they're going up there to cut costs. They're not going to have to run their own data centers anymore, not going to have to host pay for networking or pay for redundant power or cooling. And some companies truly do go up to the cloud in order to cut costs. And they succeed but very often the way that they're doing that is that they are cutting performance at exactly the same time.
The same time that they're cutting costs, they are also cutting performance because frankly, a lot of us are overpowered in terms of our SQL servers. On the other hand, some companies go to the cloud and they say you know what? Now is also the time where we need to fix performance because we were never really satisfied with this five-year-old hardware that we were running on. Having this answer helps you understand the work that you're in for.
Question number two. How much growth do we expect this year? I'm not going to ask people to say, hey, tell me exactly do you expect 5.6% growth? Do you expect 23.8% growth? Nobody really knows.
After all, we can't give answers to like that about our data. If somebody asked me, how much larger our tables are going to be next year? How the heck am I supposed to know? But what IAM asking the stakeholders to tell me is, do you have any aggressive growth plans?
And I'll give you a really good example, one of my clients said, well you know what? We need to bring the marketing team in because I've heard rumors that they're going to hook up with a football game this year. They're going to sponsor a big championship football game, put their name all over it. And because of that, we expect a whole lot more traffic.
And I'm like OK, great. That at least tells me that I should expect more traffic. Whereas another one of my clients said when we were moving their apps to the cloud, you know what? All we're really doing is we're end of lifing these applications.
We're going to lift and shift them up into the cloud. But they are on their death march already anyway. We are going to go ahead and yank all our applications and start rewriting them as cloud native. The load that we expect to have on these applications is gradually going to drop over time.
Their answers on this one help you decide whether or not you need to do things like an always on availability group, whether you need to add bigger or smaller replicas quickly, or if you can get away with an el cheapo VM that you're just going to gradually scale down over time. Then the last question that I'm going to ask my stakeholders can I contain our developers to only using features that are built into Azure SQL DB?
I love Azure SQL DB. I don't necessarily use it as a first landing point for most of the projects that I work on because frankly, it's got some pretty tough limitations. Boy, that first list there, right there. Single database, no cross-database queries or transactions.
Sure, the Microsoft people will tell you, hey yes, it does cross database transactions and cross database queries. But it's absolutely terrible. You've got a predefined the tables ahead of time. There's no flexibility. It's nothing like what you're used to with regular SQL server.
There's no file table, open query, no fancy pants replication. If you want to do like transactional bidirectional replication snapshots, you don't get SQL Agent jobs. And you know what? I'm fine with that.
I'm fine with all of that. Because frankly, if your developers can stick to only using Azure SQL DB features, that gives you the escape route. But if you wanted to put this thing into Azure SQL DB a year from now, two years from now, you could. I know DBAs have a love hate relationship with Azure SQL DB. They say, oh, that's not as powerful as real full blown SQL server.
And I'm not talking about Azure SQL DB managed instances. Microsoft has this thing where they want to reuse the same name for a gazillion different products. I'm specifically talking about the product called Azure SQL DB with no other words after it.
It's a really limited-- it really is a stretch. It's a limited version of SQL server but it helps you manage way more databases with less full time DBAs. That makes-- gives you superpowers. You don't want to have to waste your time.
All right. So now, you need to start thinking about which of your databases are going to go where. You're going to put together a spreadsheet across your entire infrastructure or a table of course, if you want to be slick. Number one, what's the total database size on the server? Number two, what's the batch request per second on that database server? Number three, what's its wait time per core over time its wait time ratio?
Number four, do I need the same performance up in the cloud or is it-- does it need to be faster up in the cloud? Five, how much growth do I expect this year for this server? Just, is it going to remain the same? Is it going to be a percentage growth? Is it going to be an order of magnitude?
That's as specific as I ever get. And then number six, can we stick ourselves to just as your SQL DB features? Then armed with that for each of my servers, that helps me be really confident about how I answer the questions about where this thing is going to go.
If it's small, small in terms of size, low batch requests second, low wait time ratio. I don't even need to do any load testing. All I need to do is throw that thing up in platform as a service and call it a day. Whereas on the other extreme, if you're already in the red zone the whole way down, and they also tell you the same performance would be a failure and that they have more growth coming on the horizon, well, the cloud is going to look very different to you than the cloud looks to other people.
So now, we got to start thinking about which ones we have to put into infrastructure as a service because there are these big huge thunderclouds that are nasty. We have to start making this tough decision of IaaS versus PaaS infrastructure as a service versus platform as a service. Where are we going to land our databases?
And I've tried to decode it down into just one slide. On the left hand side for infrastructure as a service. You manage everything. You manage the high availability. You manage the disaster recovery.
You manage the patching. You manage the backups. You can access the backups.
You can do any crazy SQL server features you want. Heck, you can even use any old versions of SQL server that you want. I've got clients running SQL server 2008 up in the cloud because they can.
Infrastructure as a service is great for people who attend webcasts like this, because a lot of you have in your job title database administrator, architect systems administrator, site reliability engineer. You are the kind of infrastructure plumber who is comfortable monkeying around with all kinds of features.
Over on the right hand side, platform as a service. So either Azure SQL DB, Amazon RDS, Google Cloud SQL. They manage high availability.
They manage disaster recovery. They manage the patching. They manage the backups. Even to the extent for Microsoft Azure SQL DB, they get the backups and you do not. That's a bit of a stress point for some people.
But remember, if you decide to opt in to platform as a service, the whole point of this is you want to let someone else do your dirty work. This platform as a service is really best for companies that specialize in development. And frankly, that's a lot of people who are not on this call. A lot of people on this call while you have lots of developers, your company has already chosen to specialize in systems administration or database administration.
And that's why they have people like you on the staff. People like you, systems administrators, site reliability engineers, database administrators think of infrastructure-as-a-service as like a sysadmin buddy. A sysadmin is a service who is comfortable building anything alongside you. They are the buddy that hangs out in the cubicle with you.
They're willing to do anything in order to get a bottle of tequila. If you want to do some kind of dumb stunt like use a combination of always on availability groups plus log shipping, going across three different cloud providers. If you want to do something that crazy you are welcome to do something that crazy.
Platform as a service is not. Platform as a service is a lot like database administrator as a service. And we kind of jokingly say that DBA stands for, don't bother asking or doesn't believe anyone. And for some of us in this audience for years, we have been the database administrators who set and lay down the laws.
We're the ones who say for example, no, you can't have this feature. No, you can't have that feature. That's not something that I believe in. That's not something that I endorse.
I don't believe in replicating your data somewhere else. I don't believe in using ServiceNow broker, whatever that is. We're used to being the ones who tell other people no. And now, we are the ones who are being told no because in Platform-as-a-service, you don't get to negotiate how high availability works.
You don't get to negotiate how the patching works. Sometimes I work with customers who say they want to go to Platform-as-a-service but they want to pick a specific cumulative update or they want to pick a specific version of SQL server. And if that isn't available, either because it's too new or because it's too old, the platform as a service provider can tell you no.
There's just a simple compatibility list, for example. They control when the patch has happen. And it's not like you can get in with them and say, hey, look, I need you to postpone that cumulative update for two or three months so that I can change my application.
This is especially important for those of us who work with third party vendors because sometimes you have those crotchety third party vendor applications that say things like, we only support SQL server 1942 with Service Pack One. If you need those kinds of fancy pants requirements, you're not going to be able to use platform as a service. You are going to have to use infrastructure as a service.
So when we sketch out your migration map going up to the cloud, there are three metrics that you're going to have to gather, total data size, batch requests a second, wait time per core per second over time. Then there are three questions that you're going to ask stakeholders.
Is the same performance a success or a failure? How much growth do we expect in the coming year? And can we restrain ourselves to just the Azure SQL DB feature set?
If you get green fuzzy feelings the whole way down across that and whole entire list you, my friend, are the perfect fit for platform as a service. I would encourage you to choose platform as a service. Not because I'm drinking some kind of Kool-Aid or because Microsoft or Google or Amazon tell me to say things like that, but because I want you to spend as little of your career as possible doing things that suck. And to be honest, things like patching and backups are things that suck.
If you find yourself in the red zone the whole way down, well, you're probably going to be doing infrastructure as a service. You're more likely to be doing lift and shift as they say replicating your exact infrastructure up in VMs and continuing to do that same dirty job that you're doing today. Now, I've kind of only just started the whole tasting buffet of everything that you have to learn in the cloud.
As a reminder, everything that's inside here expires. Those metrics that we talked about change what's normal in terms of green. Red, yellow, and green change all the time. So one next step to learn more is I have a training course called Running SQL Server in AWS and Azure. It's a one day training class. You can learn more about that over at BrentOzar.com/go/cloud.
So now that we're towards the end, you have I'm sure a million questions about what you do next because doing cloud migrations is not a small, simple process. So next step to learn more. First, I have a one day class called Running SQL Server in Amazon and Azure. You can go to that over BrentOzar.com/go/cloud.
Then, though, feel free to ask questions over in the live chat. I kind of think of live chat as free consulting. As long as you keep those questions confined to running SQL server in the cloud-- don't go ask me to fix your service broker. But ask questions that are relevant to your migration that you're looking at planning. And I can answer those questions live now over in the chat.
Plus, make sure that you repeat that process through the rest of today's presentations over at Quest Empower as well. The other presenters are also able to answer your questions live while the session is going on. Don't hold your questions to the end hoping that maybe they're going to cover-- come first-- cover something. By all means, jump right in and start asking your questions. I'll be answering your questions now over in chat.