Hello. My name is Adrian Moir. I'm a senior product manager and technology strategist at Quest.
Welcome to the workshop. Today we are going to be talking about anomaly detection. Here you can see a lovely picture. We are in the forest. And that's going to play a little bit of a part as we get into talking about the technology that is behind anomaly detection.
So we've got to think about what is anomaly detection. Well, it does what it says, really. It detects anomalies. But the key thing here is to how it how it does that and how it works through, and what we're doing underneath and the technologies used underneath that go and detect those anomalies.
So let's get into some more kind of deeper thoughts around this. One of the things that we utilize this kind of technology for is looking at backup data. Now we know that backups are very important. And what we want to be able to do is to ensure that the backups aren't being affected by anything.
The days that we have now with ransomware and other attack vectors, one of the major things you need to be able to do is recover data. So therefore we want to know when things are good and when things are potentially going wrong inside that infrastructure. So one of those things that we use anomaly detection for is to look at the data flow and to see if the data flow or its constructs are changing.
So if we think about this anomaly detection, you could sit there and write out a whole bunch of points and have a static model. But of course, everybody's data protection data streams are going to be different. And their timings are different, their schedules are different, their retention times are going to be different.
So there's a lot of moving parts going on that you need to be kind of aware of if you're trying to build a fixed model. So what we do is we start to employ technology. So you start to see technology coming in, and it starts to look at things going on.
And then, at that point, hang on. That doesn't look right. OK. We found an anomaly. Let's go and check that.
So it's running through a model. And then, when we look to the model, it says yeah, that doesn't fit. We send out an alert.
So the idea here is to use artificial intelligence and machine learning to actually look at the kind of content that's coming in on a regular basis and automatically build a model. And it's effectively learning the way that your data is coming in, the way your data is leaving, the way your data changes. And in that way, at a very low level, you can actually build a model, see the way things work.
So how do we do this? Well, if we do this, there's a couple of ways that you can do this. But if you think about backup data in general, it's like a time series set of data.
So a time series is very good for this kind of anomaly detection technology. And one of the technologies that we use in Quest for this is called an isolation forest technology. Now an isolation forest uses a set of random questions to try and branch each piece of data set that's coming in to work out. So if you've ever played the game Guess Who, which you can see right above me here, when you're asking questions of who's got blonde hair, who wears glasses, if you ask questions like this on data, you can actually start to separate the things that are outside of a cluster.
Now I'm coming back to the forest bit here. If you think about all those forest trees clustering together in a forest, then if you can think about multiple forests and then you can think about trees that are not in the forest, so you're trying to find the trees through the forest. So if you can't see the forest for the trees, you kind of get the thing that I'm going onto here.
However, let's think about how this actually works. So what we've got to do is determine what is anomalous and what is not. And so the way that the isolation forest works is it creates a cluster.
So as we go forward, we can actually see things as they come in . And we start to say build this over a period of time. And we start to see the clusters forming. And then things that are lying outside of those clusters can be deemed to be anomalies.
Now obviously, how far out do you go from that cluster before you declare that? So you can actually have a banding methodology around this forest that allows you to do this. So you can have good anomalies or really bad anomalies. So anything that has a score of minus one in the light blue area, it's obviously an anomaly.
Whereas you come in a little bit closer and you come into the inner pieces, obviously, the ones that are clustered, that's our forest. That's all the trees together in one place. That's going to be scoring a one, so that's pretty good.
If you start to move out, you got 0.75. Is that an anomaly, or is that just a slight change? So there are things that you have to consider as well. Once the model is learned over a period of time, you may wish to change things subtly in your backup environment.
So will it cause an anomaly? May do, depends on how subtle the change is or how big the change is. So obviously, if you do major changes, then you'd expect to have to go and get the model that you've already got and get it to relearn a bit more. That's one of the other things that we also do is to actually have the model automatically relearn over a period of time as well so it actually keeps up with changes that you're making that are correct as opposed to spotting the big changes that are incorrect.
And one of the other things that we can actually do with anomaly detection is to provide some level of prediction as well. And this is where it gets interesting. Because what you want to be able to do once you have that model created is to be able to look forward and say these are all the things that I have in the past. I'm expecting it to be like this in the future so that I can predict what may happen. So we have a level of prediction going on as well here.
So let's think about this in terms of training and prediction as sort of two main phases that you have with an anomaly detection. Once your model is trained on your data set of what goes in and out on a regular basis over a period of time, we then get to the point of saying OK, the model's there now. We can start predicting what looks good and what looks bad. And so this gives us the ability to look at this.
So if we think about this, a graph is a good way of looking at this as we go forward. You can start to see certain areas that will kind of correlate across time. So there's a little step here, and then we have this kind of slope up. And then you notice a bit further on there's another step there, and then there's a slope up.
So this could be data being added. This could be data that's the normal data arriving. The drop in between could be data disappearing. And this might be a regular thing.
So if you do this over a period of time, and it all looks the same each time, you get the ability to say well, this is how it should work normally. So the next data in, the next data out should also fit that prediction. So once you get to that point, you can start saying if that prediction now is not doing this, you can raise an anomaly as well.
So time series based data in an isolation forest is ideal for working with backup data. It allows us quite a bit of flexibility OK.
So we've been broadly talking about technology and anomaly detection. Let's look at what it does in one of our products. We have a technology called QoreStor. It is a backup target technology, and it's extremely advanced. It's a software solution as well.
So it means you can run it pretty much on any kind of hardware as long as it meets the requirements. You can run it in virtual machines. You can run it locally.
You can run it in the cloud. It doesn't really matter. That's extremely agile where you can actually have this technology. So let's have a look at what this does and what the anomaly detection part is going to impart for us in here.
So these are all the kind of technologies that you have inside of QoreStor. They're all available to you, whether you want to utilize object backup technologies, cloud technologies, encryption technologies, immutability technologies. There's a whole ton of stuff inside here with protocol accelerators. As you can see, we are a Veeam ready repository and also we have there object with immutability, ready badge there as well.
So when you utilize any technology that will work with QoreStor, you get the ability to have this advanced anomaly detection. And it comes with it. There's no extra charge for that. Everything we do inside QoreStor is included. So you've got all this lovely technology set there for you.
So now, if you want to augment what you're already doing with your backup technologies, you can utilize QoreStor. you'll get technologies like deduplication, compression, encryption, encryption in-flight, encryption at rest. We actually have a patented encryption that re-enciphers passphrases with a no knowledge technology, which is kind of cool. And then we get the anomaly detection is then thrown in.
So we've now got this ability to look at anybody's backup data that's coming into QoreStor and look at it in a deeper way and understand what's being added, what's being changed, and what's being taken away. Very important to think about you know that change, too. You might want to know if mass deletions are going on because that's a bad thing for backup data to just suddenly disappear.
You don't want that to happen. So, again, this technology being at that storage layer enables us to do a bit more a bit deeper rather than just comparing incremental backups each time. So this is a better way of looking at it at a more holistic way.
And there are even two levels as well. So within QoreStor, we have a technology called Storage Groups, where each storage group can have its own deduplication and its own encryption settings. So it's really good for MSPs. They can separate custom data.
But really kind of interestingly is that there is one level of anomaly detection at the storage group layer, and then there's a sublevel down. So what we have inside a storage group are containers. Now these containers are the end points where the data gets to. And inside those containers, they then have a level of anomaly detection as well.
So we've got two layers going on here. Now you can set one up at the top to do everything, so everything in the storage group, and then all the containers will inherit that, or you can have individual containers doing things that you can either have them turned on or turned off if you're not worried about one, but you are worried about the other. So, again, it's very agile in the fact that you can actually not just have to have one big tick and it's all on. You can actually be kind of targeted where you want that anomaly detection to run, which is very useful.
So there we are. That was a brief look at anomaly detection. And you're probably thinking great. Where can I find out more?
Well, to find out more, it's very, very straightforward. It's probably the best URL you can ever get. Quest.com/qs. That's going to take you there. That's going to find all the information. And you're going to have the ability to actually augment whatever kind of backup software you have, and get that working with QoreStor and get your anomaly detection.
That's all from me from the workshop today. Thank you very much for watching. See you soon.