Category Archives: Storage Optimization

Storage must go on a diet

Nowadays, the capacity of the hard disk drives (HDDs) are really big. 3TB is out and 4TB is in the horizon. What’s next?

For small-medium businesses in Malaysia, depending on their data requirements and applications, 3-10TB is pretty sufficient  and with room to grow as well. Therefore, a 6TB requirement can be easily satisfied with 2 x 3TB HDDs.

If I were the customer, why would I buy a storage array, with the software licenses and other stuff that will not only increase my cost of equipment acquisition and data management, it will also increase the complexity of my IT infrastructure? I could just slot HDDs into my existing server, RAID it with RAID-0 (not a good idea but to save costs, most customers would do that) and I have a 6TB volume! It’s cheaper, easier to manage with Windows or Linux, and my system administrator doesn’t have to fuss about lack of storage experience.

And RAID isn’t really keeping up with the tremendous growth of HDD’s capacity as well. In fact, RAID is at risk. RAID (especially RAID 5/6) just cannot continue provide the LUN or volume reliability and data availability because it just takes too damn long to rebuild the volume after the failure of a disk.

Back in the days where HDDs were less than 500GB, RAID-5 would still hold up but after passing the 1TB mark, RAID-6 became more prevalent. But now, that 1TB has ballooned to 3TB and RAID-6 is on shaky ground. What’s next? RAID-7? ZFS has RAID-Z3, triple parity but come on, how many vendors have that? With triple parity or stronger RAID (is there one?), the price of the storage array is going to get too costly.

Experts have been speaking about parity-declustering,  but that’s something that a few vendors have right now. Panasas, founded by one of forefathers of RAID, Garth Gibson, comes to mind. In fact, Garth Gibson and Mark Holland of Cargenie-Mellon University’s Parallel Data Lab (PDL) presented a paper about parity-declustering more than 10 years ago.

Let’s get back to our storage fatty. Yes, our storage is getting fat, obese, rotund or whatever you want to call it. And storage vendors have been pushing a concept in hope that storage administrators and customers can take advantage of it. It is called Storage Optimization or Storage Efficiency.

Here are a few ways you can consider to put your storage on a diet.

  • Compression
  • Thin Provisioning
  • Deduplication
  • Storage Tiering
  • Tapes and SSDs
To me, compression has not taken the storage world by storm. But then again, there aren’t many vendors that tout compression as a feature for storage optimization. Most of them rather prefer to push the darling of data reduction, data deduplication, as the main feature for save more space. Theoretically, data deduplication makes more sense when the data is inactive, and has high occurrence of duplicated data. That is why secondary storage such  as backup deduplication targets like Data Domain, HP StoreOnce, Quantum DXi can publish 20:1 rates and over time, that rate can get even higher.
NetApp also has been pushing their A-SIS data deduplication on primary storage. Yes, it helps with the storage savings in primary but when the need for higher data transfer rates and time to access “manipulated” data (deduped or compressed), it is likely that compression is a better choice for primary, active data.
So who has compression? NetApp ONTAP 8.0.1 has compression now and IBM with its Storewize V7000 started as a compression device. Read about IBM Storewize in my blog here. Dell has Ocarina Networks, which was recently unleashed. I am a big fan of Ocarina Networks and I wrote about the technology in my previous blog. EMC, during the Celerra days of DART has compression but I don’t hear much about it in their VNX. Compression is there, believe me, embedded all the loads of EMC marketing.
Thin Provisioning is now a must-have and standard feature of all storage vendors. What is Thin Provisioning? The diagram below shows you:
In the past, storage systems aren’t so intelligent. You ask for 10TB, you are given 10TB and that 10TB is “deducted” from the storage capacity. That leads to wastage and storage inefficiencies. Today, Thin Provisioning will give you 10TB but storage capacity is consumed as it is being used. The capacity is not pre-allocated as in the past. Thin provisioning is a great diet pill for bloated storage projects. 
Another up and coming feature is storage tiering. Storage tiering, when associated to storage optimization, should include hierarchical storage management (HSM) and tape-out as well. Storage optimization solutions should not offer only in the storage array itself. Storage tiering within the storage array is available with most vendors – IBM EasyTier, EMC FAST2, Dell Fluid Data Management and many others. But what about data being moved out of the storage array? What about reducing the capacity of the data online or near-line? Why not put them offline if there isn’t a need for it?
I term this as Active Archiving, something I learned while I was at EMC. Here’s a look at EMC’s style of Active Archiving:
Active Archiving promotes the concept of data archiving and is not unique only to EMC. Almost all storage vendors, either natively or with 3rd party vendors, can perform fairly efficient data archiving in one way or another. One of the software that I liked (and not unique!) is Quantum Stornext. Here’s a video of how Quantum Stornext helps reduce the fat of the storage.
With the single-copy sharing feature of Quantum Stornext to multiple disparate OSes, there are lesser duplicate files in storage as well.
Tapes have been getting a bad name in the past few years. It has been repositioned and repurposed as an archive medium rather than a backup medium. But tape is the greenest and most powerful storage diet pill around. And we should not be discount tapes because tapes are fighting back. Pretty soon you will be hearing about Linear Tape File System (LTFS). In a nutshell, Linear Tape File System (LTFS) allows you to use the tape almost as if it were a hard disk. You can drag and drop files from your server to the tape, see the list of saved files using a standard operating system directory (no backup software catalog needed), and use point and click to restore. How cool is that!
And Solid State Drives (SSDs) makes sense as well.
There are times that we need IOPS and using spinning drives, we have to set up many disk spindles to achieve the IOPS that we want.  For example, using the diagram below from the godfather of storage, Greg Schulz,
The set of 16 spinning HDD drives on the left can only deliver 3,520 IOPS. The problem is, we have wasted a lot of disk space, as seen in the diagram below. This design, which most customer would be accustomed to, may look cheaper but in actual fact, is NOT.
If the price of a Fibre Channel HDD is RM2,000, the total of 16 would make up RM32,000.00. That is not inclusive of additional power and cooling and rack space and also the data management costs. Assuming the SSDs costs 5 times more than the Fibre Channel HDD. SSDs are capable of delivering very high IOPS. Here I am putting a modest 5,000 IOPS per SSDs. With just 2 SSDs (as the right design suggests), the total costs is only RM20,000. It has greater performance room to grow, and also savings in data management, power and cooling.
Folks, consider SSDs as part of your storage diet plan.
All these features are available, in whole or in part, and they are part of the storage technology offerings that is out there. With all these being said, are you doing something about it? Get off your lazy bum and start managing your storage and put your storage on a diet!!!
Advertisements

Ocarina rising

After more than a year since Dell acquired Ocarina Networks, it has finally surfaced last week in the form of Dell DX Object Storage 6000G SCN (Storage Compression Node).

Ocarina is a content-aware storage optimization engine, and their solution is one of the best I have seen out there. Its unique ECOsystem technology, as described in the diagram below, is impressive.

Unlike most deduplication and compression solutions out there, Ocarina Networks solution takes storage optimization a step further.  Ocarina works at the file level and given the rise and crazy, crazy growth of unstructured files in the NAS space, the web and the clouds, storage optimization is one priority that has to be addressed immediately. It takes a 3-step process – Extract, Correlate and Optimize.

Today’s files are no longer a flat structure of a single object but more of a compounded file where many objects are amalgamated from different sources. Microsoft Office is a perfect example of this. An Excel file would consists of objects from Windows Metafile Formats, XML objects, OLE (Object Linking and Embedding) Compound Storage Objects and so on. (Note: That’s just Microsoft way of retaining monopolistic control). Similarly, a web page is a compound of XML, HTML, Flash, ASP, PHP object codes.

In Step 1, the technology takes files and breaks it down to its basic components. It is kind of like breaking apart every part of a car down to its nuts and bolt and layout every bit on the gravel porch. That is the “Extraction” process and it decodes each file to get the fundamental components of the files.

Once the compounded file object is “extracted”, identified and indexed, each fundamental object is Correlated in Step 2. The correlation is executed with the file and across files under the purview of Ocarina. Matching and duplicated objects are flagged and deduplicated. The deduplication is done at the byte-level, unlike most deduplication solutions that operate at the block-level. This deeper and more granular approach further reduces the capacity of the storage required, making Ocarina one of the most efficient storage optimization solutions currently available. That is why Ocarina can efficiently reduce the size of even zipped and highly encoded files.

It takes this storage optimization even further in Step 3. It applies content-aware compactors for each fundamental object type, uniquely compressing each object further. That means that there are specialized compactors for PDF objects, ZIP objects and so on. They even have compactors for Oil & Gas seismic files. At the time I was exposed to Ocarina Networks and evaluating it, it had about 600+ unique compactors.

After Dell bought Ocarina in July 2010, the whole Ocarina went into a stealth mode. Many already predicted that the Ocarina technology would be integrated and embedded into Dell’s primary storage solutions of Compellent and EqualLogic. It is not there yet, but will likely be soon.

Meanwhile, the first glimpse of Ocarina will be integrated as a gateway solution to Dell DX6000 Object Storage. DX Object Storage is a technology which Dell has OEMed from Caringo. DX6000 Object Storage (I did not read in depth) has the concept of the old EMC Centera, but with a much newer, and more approach based on XML and HTTP REST. It has published an open API and Dell is getting ISV partners to develop their applications to interact with the DX6000 including Commvault, EMC, Symantec, StoredIQ are some of the ISV partners working closely with Dell.

(24/10/2011: Editor note: Previously I associated Dell DX6000 Object Storage with Exanet. I was wrong and I would like to thank Jim Dtuton of Caringo for pointing out my mistake)

Ocarina’s first mission is to reduce the big, big capacities in Big Data space of the DX6000 Object Storage, and the Ocarina ECOsystem technology looks a good bet for Dell as a key technology differentiator.