This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Best way to get quickly access about 300gb of data? mount drag/drop too slow

I've setup a cross country replication (from West coast core to East coast core) of 4 machines. (physical machines NOT virtual). I'm testing Virtual Standby as a method of booting up one of the machines so I can get to the Exchange EDB's on cutover night. There is about 300gb between 2 EDBs. I'll need "quick" access to that data so I can import the mailboxes into my East enviroment. I can't wait 5-6 hours to mount the drive and copy the data out.

Theory is, I boot up the Virtual Standby, and can much more quickly get to the data. BUT I had a new theory; why can't I mount the VMDK as a drive on another VM. DONE. I tried that and Windows didn't like the drive. "Invalid" or something like that.

So, my question; what are the methods to quickly get to backed up data? (when it's about 300+GB) simple file restores are quick, 300gb is NOT quick when using traditional mount and drag/drop.

thanks!

Parents

0 Tudor.Popescu over 9 years ago

Hi Wayne:
Your post shows some confusion and may mislead other people as we both talk about things located at different ends of a large spectrum. As such I am in the situation to explain some issues that may be of interest for our mid-marked customers.

Let's start with the basics. IOPS means IO operations per second, which in turn means the amount of read or write operations that could be done in one second time. Rapid Recovery works with data in blocks of fixed size of 8KB. (Please note that backup data is deduplicated BEFORE being committed to the repository).
As such, the amount of deduplicated data reaching the repository is related to the Repository capacity of absorbing it.

At the same time there are various operations executed at repository level such as Rollups, Deferred Deletes, Mountability checks, Attachability checks, Recovery Point checks, which read, consolidate and delete unnecessary data. These operations consume storage resources which are measured in IOPS as well.

In a normal usage situation you have data coming in (and this takes IOPS dedicated to write) and data being processed on the repository at the same time which requires other iops dedicated to read. Since the number of IOPS a storage system is able to deliver is fixed, there is competition between the various jobs running at the same time. Moreover, both read and write input/output operations are performed in a random manner which as everybody knows, diminishes the overall storage system performance.

As such, in most cases, the storage system is the main bottleneck in Rapid Recovery performance. The "Highest Active Time" in Windows Resource Monitor condenses the whole story into a number.

To complete the picture, here are two more points to make.

First is about ingesting data. Basically, how quickly can data backed up from the protected machine reach the repository. There are three elements here:
1. The load on the machine to be backed up -- this pertains to available IOPS on the volumes to be backed up, CPU and memory load.
2. The health/speed of the network connection to the core (by default the transfer is made over 8 streams; obviously jitter and dropped packets do not help).
3. The in-line deduplication process. There is a balance between the deduplication speed, the size of the dedupe cache and the data that is actually committed to repository. To explain, a part of the incoming blocks are similar with the blocks already present in the repository and are dropped before hitting the storage .

The most common bottleneck in this process is the load on the system to be backed up (including available IOPS). Everything else is normally much faster.

Second, there is the replication process. I won't go into detail as some of my previous explanations do apply. Suffice to say, this is the only typical case where the bottle neck is related to the connection speed.

Another point I did not make yet is the AV protection. As everyone knows, all data traveling to and from the storage system is intercepted by the antivirus filter drivers (even if the AV software is disabled). Most AV filter drivers are not designed to deal with a large amount of data and may create problems and terrible bottlenecks. Applying some exclusions may help.

These being said, let's look at numbers. Many times customers see replication speeds that are higher than the theoretical network speed. For the record, the network speed is measured in Mb/s (Megabits/s) and the Transfer/Replication speed is measured in MB/s(Megabytes/s). As such, you need to divide the network speed by 8 to express it in MB/s.
To go back, many customers with slow WAN connections -- for instance 8Mb/s (1MB/s) -- see replication transfers of 1.5 - 2 MB/s. This is due to having deduplicated & compressed data transferred through the wire.

In your post, Wayne, you say the Rapid Recovery has a max transfer rate of 125mbs -- if you mean 125Mb/s -- this would be ~13MB/s which is obviously not the case. If it is 125MB/s which I believe it is, that is the theoretical speed of a 1Gb/s network connection. Taking into account that depending on the network specs, TCP/IP needs 5-10% overhead, it does not look that bad.

In this case, since the bottleneck shifted to the connection side, increasing the available IOPS that the storage system can deliver will not improve performance.

Now, regarding "system that is capable of writing at multiple GBps to Long Term Archive SSD's". There are many great storage systems available on the market. Some of them -- the most performant ones -- are specialized for specific tasks, some others are designed for general usage but at a lower performance level.

In your case, based on the description you provided, it looks that it is a specialized system designed for archiving. This is very different of what a Rapid Recovery Repository needs. For instance if your system is supposed to ingest data through flash serialization (as Nible Solutions do), and most likely it is not optimized for random reads, it won't really help improve performance dramatically :). Optimized archives are supposed to be contiguous. Remember, Rapid Recovery works with 8KB blocks. Even if your system is optimized to wok with large blocks, it won't help.

The DL4000 was a splendid machine in its time as it balanced well the ratio price/performance. If properly updated and upgraded it still works great. Beside the local storage, depending on the license type, it is possible to add one or two MD1200 storage enclosures.

I fully agree that replacing its original drives with 15K SAS drives would improve general performance -- which boils down to IOPS. Please note that DL4000 and later appliances, use RAID 6 -- which adds to the read/write penalty -- to avoid disk punctures.

As a recovery strategy, you are right, it makes sense to recover the system disk and then do a live recovery for the data volumes. Since the typical System Disk size is around 300GB (and seldom all of it is used), assuming that the drivers for dissimilar hardware BMR were prepared in time, it takes only minutes to recover the System Drive and then, due to live recovery, the users can use the files and applications while the rest of the restore is still running.

Hope that this helps.
Cancel
Up 0 Down

Cancel

Reply

0 Tudor.Popescu over 9 years ago

Hi Wayne:
Your post shows some confusion and may mislead other people as we both talk about things located at different ends of a large spectrum. As such I am in the situation to explain some issues that may be of interest for our mid-marked customers.

Let's start with the basics. IOPS means IO operations per second, which in turn means the amount of read or write operations that could be done in one second time. Rapid Recovery works with data in blocks of fixed size of 8KB. (Please note that backup data is deduplicated BEFORE being committed to the repository).
As such, the amount of deduplicated data reaching the repository is related to the Repository capacity of absorbing it.

At the same time there are various operations executed at repository level such as Rollups, Deferred Deletes, Mountability checks, Attachability checks, Recovery Point checks, which read, consolidate and delete unnecessary data. These operations consume storage resources which are measured in IOPS as well.

In a normal usage situation you have data coming in (and this takes IOPS dedicated to write) and data being processed on the repository at the same time which requires other iops dedicated to read. Since the number of IOPS a storage system is able to deliver is fixed, there is competition between the various jobs running at the same time. Moreover, both read and write input/output operations are performed in a random manner which as everybody knows, diminishes the overall storage system performance.

As such, in most cases, the storage system is the main bottleneck in Rapid Recovery performance. The "Highest Active Time" in Windows Resource Monitor condenses the whole story into a number.

To complete the picture, here are two more points to make.

First is about ingesting data. Basically, how quickly can data backed up from the protected machine reach the repository. There are three elements here:
1. The load on the machine to be backed up -- this pertains to available IOPS on the volumes to be backed up, CPU and memory load.
2. The health/speed of the network connection to the core (by default the transfer is made over 8 streams; obviously jitter and dropped packets do not help).
3. The in-line deduplication process. There is a balance between the deduplication speed, the size of the dedupe cache and the data that is actually committed to repository. To explain, a part of the incoming blocks are similar with the blocks already present in the repository and are dropped before hitting the storage .

The most common bottleneck in this process is the load on the system to be backed up (including available IOPS). Everything else is normally much faster.

Second, there is the replication process. I won't go into detail as some of my previous explanations do apply. Suffice to say, this is the only typical case where the bottle neck is related to the connection speed.

Another point I did not make yet is the AV protection. As everyone knows, all data traveling to and from the storage system is intercepted by the antivirus filter drivers (even if the AV software is disabled). Most AV filter drivers are not designed to deal with a large amount of data and may create problems and terrible bottlenecks. Applying some exclusions may help.

These being said, let's look at numbers. Many times customers see replication speeds that are higher than the theoretical network speed. For the record, the network speed is measured in Mb/s (Megabits/s) and the Transfer/Replication speed is measured in MB/s(Megabytes/s). As such, you need to divide the network speed by 8 to express it in MB/s.
To go back, many customers with slow WAN connections -- for instance 8Mb/s (1MB/s) -- see replication transfers of 1.5 - 2 MB/s. This is due to having deduplicated & compressed data transferred through the wire.

In your post, Wayne, you say the Rapid Recovery has a max transfer rate of 125mbs -- if you mean 125Mb/s -- this would be ~13MB/s which is obviously not the case. If it is 125MB/s which I believe it is, that is the theoretical speed of a 1Gb/s network connection. Taking into account that depending on the network specs, TCP/IP needs 5-10% overhead, it does not look that bad.

In this case, since the bottleneck shifted to the connection side, increasing the available IOPS that the storage system can deliver will not improve performance.

Now, regarding "system that is capable of writing at multiple GBps to Long Term Archive SSD's". There are many great storage systems available on the market. Some of them -- the most performant ones -- are specialized for specific tasks, some others are designed for general usage but at a lower performance level.

In your case, based on the description you provided, it looks that it is a specialized system designed for archiving. This is very different of what a Rapid Recovery Repository needs. For instance if your system is supposed to ingest data through flash serialization (as Nible Solutions do), and most likely it is not optimized for random reads, it won't really help improve performance dramatically :). Optimized archives are supposed to be contiguous. Remember, Rapid Recovery works with 8KB blocks. Even if your system is optimized to wok with large blocks, it won't help.

The DL4000 was a splendid machine in its time as it balanced well the ratio price/performance. If properly updated and upgraded it still works great. Beside the local storage, depending on the license type, it is possible to add one or two MD1200 storage enclosures.

I fully agree that replacing its original drives with 15K SAS drives would improve general performance -- which boils down to IOPS. Please note that DL4000 and later appliances, use RAID 6 -- which adds to the read/write penalty -- to avoid disk punctures.

As a recovery strategy, you are right, it makes sense to recover the system disk and then do a live recovery for the data volumes. Since the typical System Disk size is around 300GB (and seldom all of it is used), assuming that the drivers for dissimilar hardware BMR were prepared in time, it takes only minutes to recover the System Drive and then, due to live recovery, the users can use the files and applications while the rest of the restore is still running.

Hope that this helps.
Cancel
Up 0 Down

Cancel

Children

No Data