Category Archives: Tapes

Storage must go on a diet

Nowadays, the capacity of the hard disk drives (HDDs) are really big. 3TB is out and 4TB is in the horizon. What’s next?

For small-medium businesses in Malaysia, depending on their data requirements and applications, 3-10TB is pretty sufficient  and with room to grow as well. Therefore, a 6TB requirement can be easily satisfied with 2 x 3TB HDDs.

If I were the customer, why would I buy a storage array, with the software licenses and other stuff that will not only increase my cost of equipment acquisition and data management, it will also increase the complexity of my IT infrastructure? I could just slot HDDs into my existing server, RAID it with RAID-0 (not a good idea but to save costs, most customers would do that) and I have a 6TB volume! It’s cheaper, easier to manage with Windows or Linux, and my system administrator doesn’t have to fuss about lack of storage experience.

And RAID isn’t really keeping up with the tremendous growth of HDD’s capacity as well. In fact, RAID is at risk. RAID (especially RAID 5/6) just cannot continue provide the LUN or volume reliability and data availability because it just takes too damn long to rebuild the volume after the failure of a disk.

Back in the days where HDDs were less than 500GB, RAID-5 would still hold up but after passing the 1TB mark, RAID-6 became more prevalent. But now, that 1TB has ballooned to 3TB and RAID-6 is on shaky ground. What’s next? RAID-7? ZFS has RAID-Z3, triple parity but come on, how many vendors have that? With triple parity or stronger RAID (is there one?), the price of the storage array is going to get too costly.

Experts have been speaking about parity-declustering,  but that’s something that a few vendors have right now. Panasas, founded by one of forefathers of RAID, Garth Gibson, comes to mind. In fact, Garth Gibson and Mark Holland of Cargenie-Mellon University’s Parallel Data Lab (PDL) presented a paper about parity-declustering more than 10 years ago.

Let’s get back to our storage fatty. Yes, our storage is getting fat, obese, rotund or whatever you want to call it. And storage vendors have been pushing a concept in hope that storage administrators and customers can take advantage of it. It is called Storage Optimization or Storage Efficiency.

Here are a few ways you can consider to put your storage on a diet.

  • Compression
  • Thin Provisioning
  • Deduplication
  • Storage Tiering
  • Tapes and SSDs
To me, compression has not taken the storage world by storm. But then again, there aren’t many vendors that tout compression as a feature for storage optimization. Most of them rather prefer to push the darling of data reduction, data deduplication, as the main feature for save more space. Theoretically, data deduplication makes more sense when the data is inactive, and has high occurrence of duplicated data. That is why secondary storage such  as backup deduplication targets like Data Domain, HP StoreOnce, Quantum DXi can publish 20:1 rates and over time, that rate can get even higher.
NetApp also has been pushing their A-SIS data deduplication on primary storage. Yes, it helps with the storage savings in primary but when the need for higher data transfer rates and time to access “manipulated” data (deduped or compressed), it is likely that compression is a better choice for primary, active data.
So who has compression? NetApp ONTAP 8.0.1 has compression now and IBM with its Storewize V7000 started as a compression device. Read about IBM Storewize in my blog here. Dell has Ocarina Networks, which was recently unleashed. I am a big fan of Ocarina Networks and I wrote about the technology in my previous blog. EMC, during the Celerra days of DART has compression but I don’t hear much about it in their VNX. Compression is there, believe me, embedded all the loads of EMC marketing.
Thin Provisioning is now a must-have and standard feature of all storage vendors. What is Thin Provisioning? The diagram below shows you:
In the past, storage systems aren’t so intelligent. You ask for 10TB, you are given 10TB and that 10TB is “deducted” from the storage capacity. That leads to wastage and storage inefficiencies. Today, Thin Provisioning will give you 10TB but storage capacity is consumed as it is being used. The capacity is not pre-allocated as in the past. Thin provisioning is a great diet pill for bloated storage projects. 
Another up and coming feature is storage tiering. Storage tiering, when associated to storage optimization, should include hierarchical storage management (HSM) and tape-out as well. Storage optimization solutions should not offer only in the storage array itself. Storage tiering within the storage array is available with most vendors – IBM EasyTier, EMC FAST2, Dell Fluid Data Management and many others. But what about data being moved out of the storage array? What about reducing the capacity of the data online or near-line? Why not put them offline if there isn’t a need for it?
I term this as Active Archiving, something I learned while I was at EMC. Here’s a look at EMC’s style of Active Archiving:
Active Archiving promotes the concept of data archiving and is not unique only to EMC. Almost all storage vendors, either natively or with 3rd party vendors, can perform fairly efficient data archiving in one way or another. One of the software that I liked (and not unique!) is Quantum Stornext. Here’s a video of how Quantum Stornext helps reduce the fat of the storage.
With the single-copy sharing feature of Quantum Stornext to multiple disparate OSes, there are lesser duplicate files in storage as well.
Tapes have been getting a bad name in the past few years. It has been repositioned and repurposed as an archive medium rather than a backup medium. But tape is the greenest and most powerful storage diet pill around. And we should not be discount tapes because tapes are fighting back. Pretty soon you will be hearing about Linear Tape File System (LTFS). In a nutshell, Linear Tape File System (LTFS) allows you to use the tape almost as if it were a hard disk. You can drag and drop files from your server to the tape, see the list of saved files using a standard operating system directory (no backup software catalog needed), and use point and click to restore. How cool is that!
And Solid State Drives (SSDs) makes sense as well.
There are times that we need IOPS and using spinning drives, we have to set up many disk spindles to achieve the IOPS that we want.  For example, using the diagram below from the godfather of storage, Greg Schulz,
The set of 16 spinning HDD drives on the left can only deliver 3,520 IOPS. The problem is, we have wasted a lot of disk space, as seen in the diagram below. This design, which most customer would be accustomed to, may look cheaper but in actual fact, is NOT.
If the price of a Fibre Channel HDD is RM2,000, the total of 16 would make up RM32,000.00. That is not inclusive of additional power and cooling and rack space and also the data management costs. Assuming the SSDs costs 5 times more than the Fibre Channel HDD. SSDs are capable of delivering very high IOPS. Here I am putting a modest 5,000 IOPS per SSDs. With just 2 SSDs (as the right design suggests), the total costs is only RM20,000. It has greater performance room to grow, and also savings in data management, power and cooling.
Folks, consider SSDs as part of your storage diet plan.
All these features are available, in whole or in part, and they are part of the storage technology offerings that is out there. With all these being said, are you doing something about it? Get off your lazy bum and start managing your storage and put your storage on a diet!!!

Can snapshots replace traditional backups?

Backup is necessary evil. In IT, every operator, administrator, engineer, manager, and C-level executive knows that you got to have backup. When it comes to the protection of data and information in a business, backup is the only way.

Backup has also become the bane of IT operations. Every product that is out there in the market is trying to cram as much production data to backup as possible just to fit into the backup window. We only have 24 hours in a day, so there is no way the backup window can be increased unless

  • You reduce the size of the primary data to be backed up – think compression, deduplication, archiving
  • You replicate the primary data to a secondary device and backup the secondary device – which is ironic because when you replicate, you are creating a copy of the primary data, which technically is a backup. So you are technically backing up a backup
  • You speed up the transfer of primary data to the backup device

Either way, the IT operations is trying to overcome the challenges of the backup window. And the whole purpose for backup is to be cock-sure that data can be restored when it comes to recovery. It’s like insurance. You pay for the premium so that you are able to use the insurance facility to recover during the times of need. We have heard that analogy many times before.

On the flip side of the coin, a snapshot is also a backup. Snapshots are point-in-time copies of the primary data and many a times, snapshots are taken and then used as the source of a “true” backup to a secondary device, be it disk-based or tape-based. However, snapshots have suffered the perception that it is a pseudo-backup, until recent last couple of years.

Here are some food for thoughts …

WHAT IF we eliminate backing data to a secondary device?

WHAT IF the IT operations is ready to embrace snapshots as the true backup?

WHAT IF we rely on snapshots for backup and replicated snapshots for disaster recovery?

First of all, it will solve the perennial issues of backup to a “secondary device”. The operative word here is the “secondary device”, because that secondary device is usually external to the primary storage.

Tape subsystems and tape are constantly being ridiculed as the culprit of missing backup windows. Duplications after duplications of the same set of files in every backup set triggered the adoption of deduplication solutions from Data Domain, Avamar, PureDisk, ExaGrid, Quantum and so on. Networks are also blamed because network backup runs through the LAN. LANless backup will use another conduit, usually Fibre Channel, to transport data to the secondary device.

If we eliminate the “secondary device” and perform backup in the primary storage itself, then networks are no longer part of the backup. There is no need for deduplication because the data could already have been deduplicated and compressed in the primary storage.

Note that what I have suggested is to backup, compress and dedupe, AND also restore from the primary storage. There is no secondary storage device for backup, compress, dedupe and restore.

Wouldn’t that paint a better way of doing backup?

Snapshots will be the only mechanism to backup. Snapshots are quick, usually in minutes and some in seconds. Most snapshot implementations today are space efficient, consuming storage only for delta changes. The primary device will compress and dedupe, depending on the data’s characteristics.

For DR, snapshots are shipped to a remote storage of equal prowess at the DR site, where the snapshot can be rebuild and be in a ready mode to become primary data when required. NetApp SnapVault is one example. ZFS snapshot replication is another.

And when it comes to recovery, quick restores of primary data will be from snapshots. If the primary storage goes down, clients and host initiators can be rerouted quickly to the DR device for services to resume.

I believe with the convergence of multi-core processing power, 10GbE networks, SSDs, very large capacity drives, we could be seeing a shift in the backup design model and possible the entire IT landscape. Snapshots could very likely replace traditional backup in the near future, and secondary device may be a thing of the past.