[AUDIO LOGO] Denormalization-- utilizing denormalization strategies to improve. Now, first of all, I don't understand denormalization.
Oh. Quick. Well, you can't understand denormalization until you understand normalization. So the normalization of data is a way to store data to maintain integrity so that especially in complex data, if you have a list of your friends, you don't need to normalize or denormalize that. That's it.
So you may be familiar with second normal form, third normal form. I believe there's a fourth normal form. What this does is it's a way of organizing data so that data exists only in one place, which means there's no redundancy that isn't built in for performance and security reasons. There's no redundancy in that data where I may, as I change information about Mike O'Donnell that I may lose that connection to all of the records that may sit somewhere else, that may talk about all of Michael O'Donnell's integrity.
So it's all about referential integrity. So the idea with normalization is to say we have Mike O'Donnell, and then we have Mike O'Donnell's orders. We don't put them all in there where Mike O'Donnell's records has that because it's bad for performance. And there's problems. But by normalizing data, now you've got these relationships so that when I delete Mike O'Donnell from my database, are there orphan records out there of Michael O'Donnell's activity that may be floating around there with now no parent records.
So that's what normalization was about, relational databases. That's at the core of maintaining data integrity. Now, that doesn't always lend itself to good performance because now you've got lots of connections and joints between these different tables that are connected that have been created through this normalization process that are slowing down the queries because they become very complex. And instead of just going and reading stuff out of one table, now it's got to say, go get this for me from this table and then find things from this table but only when they have this. And it slows down the performance.
So a lot of what's going on in migrations is moving to higher performance databases. They want things like real-time analytics so that when Michael O'Donnell is out on his Google site or his Amazon site, that they can quickly come up with new ideas of things to sell Michael O'Donnell in real time. Highly normalized data in relational databases does not lend itself to that very well. So you do have to denormalize in order to performance, but you have to understand that denormalization so that you can account for it because you cannot denormalize and cause data integrity problems. If that's the case, then you've, again, lost before you started.
So what happens with things that normally from a referential integrity would happen in a database management system in the relational world now has to be handled separately, either by some of the back-end processes of a NoSQL database or something that now needs to be built into the application to make sure that the application has taken care of that in order to get the performance from the database. So it's a very tricky thing where you're now starting to-- and document databases are the same, where now you're breaking all the normalization rules and stuffing everything into record because one record is really easy, quick and fast to read. Right?
So I understand now. But isn't there isn't there a problem there? OK, am I creating more problems? OK, I get a benefit in terms of performance, but am I creating more problems down the road?
Not if you handle it-- not if you're aware of what you're doing in that denormalization process and you account for it. Again, it's all about accounting for things. There might be some cat lovers out there, so I'm not going to use what I normally say. But there's a lot of different ways to cook a turkey. All right.
Yeah.
And it really depends on what you want. Do you want crispy skin? Do you want juicy meat, or do you not want to die of salmonella poisoning? So it depends on your priority. If your priority is real time, then you need to understand that this data has been denormalized. The impact of that denormalization is that we need to build these safeguards in so that we don't lose data integrity.
That was the reason why Oracle became so popular and relational databases were so popular because it took the need to handle that in an application and put it and built it right into the database. It was great. But that was when we were doing a lot of batch processing. But as times change, application code and capabilities on that side become better and better over time. We don't necessarily need that. So now it's always a--
It's in fashion. It kind of comes and goes.
It's not a fashion. It's just what's your priority? So nobody 30, 40 years ago was trying to do online analytics on a data transaction because data transactions weren't happening online. We didn't have an Amazon. Right?
Yeah.
Right. Now we do. We want that real-time satisfaction, that [KISS] right from the user interface. So now we have to have performance. So that becomes the priority. So now that that's the priority, what are we going to do to mitigate the risk that that priority presents to our organization? We denormalize, understand that denormalization, and make sure that we account for it somewhere else within that system to make sure that we're not introducing problems for the sake of expediency and speed.
This is a very early-on decision, , design decision to be done or is this something you just uncover.
Yeah, I would say not the denormalization, but I think that when you're migrating a database, there's a lot of different migrations. You're migrating from one version of Oracle to another. You're