Category Archives: Backup
I like the way Amazon is building their Cloud Computing services. Amazon Web Services (AWS) is certainly on track to become the most powerful Cloud Computing company in the world. In fact, AWS might already is. But they are certainly not resting on their laurels when they launched 2 new services in as many weeks – Amazon DynamoDB (last week) and Amazon Storage Gateway (this week).
I am particularly interested in the Amazon Storage Gateway, because it is addressing one of the biggest fears of Cloud Computing head-on. A lot of large corporations are still adamant to keep their data on-premise where it is private and secure. Many large corporations are still very skeptical about it even though Cloud Computing is changing the IT landscape in a massive way. The barrier to entry for large corporations is not something easy, but Amazon is adapting to get more IT divisions and departments to try out Cloud Computing in a less disruptive way.
The new service, is really about data storage and data backup for large corporations. This is important because large corporations have plenty of requirements for data storage and data to be backed up. And as we know, a large portion of the data stored does not need to be transactional or to be accessed frequently. This set of data is usually less frequently used, for archiving or regulatory compliance reasons, particular in the banking and healthcare industry.
In the data backup operations, the reason data is backed up is to provide a data recovery mechanism when a disaster strikes. Large corporations back up tons of data every day, weeks or month and this data only has value when there is a situation that requires data relevance, data immediacy or data recovery. Otherwise, it is just plenty of data taking up storage space, be it on disk or on tape.
Both data storage and data backup cost a lot of money, both CAPEX and OPEX. In CAPEX, you are constantly pressured to buy more storage to store the ever growing data. This leads to greater management and administration costs, both contributing heavily into OPEX costs. And I have not included the OPEX costs of floor space, power and cooling, people (training, salary, time and so on) typically adding up to 3-5x the operations costs relative to the capital investments. Such a model of IT operations related to storage cannot continue forever, and storage in the Cloud offers an alternative.
These 2 scenarios – data storage and data backup – are exactly the type of market AWS is targeting. In order to simplify and pacify large corporations, AWS introduced the Amazon Storage Gateway, that eases the large corporations to take some of their IT storage operations to the Cloud in the form of Amazon S3.
The video below shows the Amazon Storage Gateway:
The Amazon Storage Gateway is a piece of software “appliance” that is installed on-premise in the large corporation’s data center. It seamlessly integrates into the LAN and provides a SSL (Secure Socket Layer) connection to the Amazon S3. The data being transferred to the S3 is also encrypted with AES (Advanced Encryption Standard) 256-bit. Both SSL and AES-256 can give customers a sense of security and AWS claims that the implementation meets the data storage and data recovery standards used in the banking and healthcare industries.
The data storage and backup service regularly protects the customer’s data in snapshots, and giving the customer a rapid recovery platform should the customer experienced on-premise data corruption or data disruption. At the same time, the snapshot copies in the Amazon S3 can also be uploaded into Amazon EBS (Elastic Block Store) and testing or development environments can be evaluated and testing with Amazon EC2 (Elastic Compute Cloud). The simplicity of sharing and combining different Amazon services will no doubt, give customers a peace of mind, easing their adoption of Cloud Computing with AWS.
This new service starts with a 60-day free trial and moving on to a USD$125.00 (about Malaysian Ringgit $400.00) per gateway per month subscription fee. The data storage (inclusive of the backup service), costs only 14 cents per gigabyte per month. For 1TB of data, that is approximately MYR$450 per month. Therefore, minus the initial setup costs, that comes to a total of MYR$850 per month, slightly over MYR$10,000 per year.
At this point, I like to relate an experience I had a year ago when implementing a so-called private cloud for an oil-and-gas customers in KL. They were using the HP EVS (Electronic Vaulting Service) to an undisclosed HP data center hosting site in the Klang Valley. The HP EVS, which was an OEM of Asigra, was not an easy solution to implement but what was more perplexing was the fact that the customer had a poor understanding of what would be the objectives and their 5-year plan in keeping with the data protected.
When the first 3-4TB data storage and backup were almost used up, the customer asked for a quotation for an additional 1TB of the EVS solution. The subscription for 1TB was MYR$70,000 per year. That is 7x time more than the AWS MYR$10,000 per year cost! I have to salute the HP sales rep. It must have been a damn good convincing sell!
In the long run, the customer could be better off running their storage and backup on-premise with their HP EVA4400 and adding an additional of 1TB (and hiring another IT administrator) would have cost a whole lot less.
Amazon Web Services has already operating in Singapore for the past 2 years, and I am sure they are eyeing Malaysia as their regional market. Unless and until Malaysian companies offering Cloud Services know to use economies-of-scale to capitalize the Cloud Computing market, AWS is always going to be a big threat to CSP companies in Malaysia and a boon of any companies seeking cloud computing services anywhere in the world.
I urge customers in Malaysia to start questioning their so-called Cloud Service Providers if they can do what AWS is doing. I have low confidence of what the most local “cloud computing” companies can deliver right now. I hope they stop window dressing their service offerings and start giving real cloud computing services to customers. And for customers, you must continue to research and find out more which cloud services meet your business objectives. Don’t be flashed by the fancy jargons or technical idealism thrown at you. Always, always find out more because your business cost is at stake. Don’t be like the customer who paid MYR$70,000 for 1TB per year.
AWS is always innovating and the Amazon Storage Gateway is just another easy-to-adopt step in their quest for world domination.
Nowadays, the capacity of the hard disk drives (HDDs) are really big. 3TB is out and 4TB is in the horizon. What’s next?
For small-medium businesses in Malaysia, depending on their data requirements and applications, 3-10TB is pretty sufficient and with room to grow as well. Therefore, a 6TB requirement can be easily satisfied with 2 x 3TB HDDs.
If I were the customer, why would I buy a storage array, with the software licenses and other stuff that will not only increase my cost of equipment acquisition and data management, it will also increase the complexity of my IT infrastructure? I could just slot HDDs into my existing server, RAID it with RAID-0 (not a good idea but to save costs, most customers would do that) and I have a 6TB volume! It’s cheaper, easier to manage with Windows or Linux, and my system administrator doesn’t have to fuss about lack of storage experience.
And RAID isn’t really keeping up with the tremendous growth of HDD’s capacity as well. In fact, RAID is at risk. RAID (especially RAID 5/6) just cannot continue provide the LUN or volume reliability and data availability because it just takes too damn long to rebuild the volume after the failure of a disk.
Back in the days where HDDs were less than 500GB, RAID-5 would still hold up but after passing the 1TB mark, RAID-6 became more prevalent. But now, that 1TB has ballooned to 3TB and RAID-6 is on shaky ground. What’s next? RAID-7? ZFS has RAID-Z3, triple parity but come on, how many vendors have that? With triple parity or stronger RAID (is there one?), the price of the storage array is going to get too costly.
Experts have been speaking about parity-declustering, but that’s something that a few vendors have right now. Panasas, founded by one of forefathers of RAID, Garth Gibson, comes to mind. In fact, Garth Gibson and Mark Holland of Cargenie-Mellon University’s Parallel Data Lab (PDL) presented a paper about parity-declustering more than 10 years ago.
Let’s get back to our storage fatty. Yes, our storage is getting fat, obese, rotund or whatever you want to call it. And storage vendors have been pushing a concept in hope that storage administrators and customers can take advantage of it. It is called Storage Optimization or Storage Efficiency.
Here are a few ways you can consider to put your storage on a diet.
- Thin Provisioning
- Storage Tiering
- Tapes and SSDs
I have known Atempo for years and even contacted them once when I was in NetApp several years ago. I don’t know much about them until a friend recently took up the master resellership of Atempo here in Malaysia. And when people ask me “Atempo who?”, I would reply “3 gals, 1 guy and 1 LB handbag”.
Atempo, is a company that specializes in data protection and archiving solutions and has been around for almost 20 years. They compete with Symantec Netbackup, Commvault Simpana and Bakbone Netvault and I have seen their solutions. It’s pretty decent and with an attractive price as well. Perhaps they don’t market themselves as strongly as some the bigger data protection companies, but I would recommend to anyone, any day. If you need more information, contact me.
But the usual puzzled faces will soon go away once they start recognizing Atempo’s solutions because that is where my usual Atempo introduction comes from – their solutions.
Atempo has 5 key products
- Time Navigator (TINA)
- Live Navigator (LINA)
- Atempo Digital Archive (ADA)
- Atempo Digital Archive for Messaging (ADAM)
- Live Backup (LB)
I promised last week I will look deeper into HP StoreOnce technology and I did. As I mentioned in my previous blog, HP StoreOnce technology now embedded in its D2D series of secondary, target backup devices that does the job with no fuss and no fancy bells and whistles.
Here’s the lineup of the present HP D2D solutions.
HP Malaysia has constantly reminded me that their D2D deduplication solution is much more price competitive than their competitors and this is something you, the readers, have to find out on your own. But I do believe that they are. Unfortunately they did not have the first mover’s advantage when Data Domain took the industry by storm in 2009, since HP StoreOnce was only launched with much fanfare last year in June 2010. Despite that, there still plenty of room in the IT market to grow, especially in HP’s huge set of customers.
Without the first movers advantage, HP StoreOnce has to differentiate itself from the existing competitors such as EMC Data Domain and Quantum. Labeling their deduplication technology as version 2.0 (whereas the competitors are still at “Version 1.0”?), HP StoreOnce banks on 3 key technologies. They are
- Sparse Indexing
- Intelligent Block Size Management
- Reduction in Disk Fragmentation
Out of these 3, sparse indexing is the most interesting but I will save the best from last. Let’s start with Intelligent Block Size Management.
HP StoreOnce uses a variable chunking method with a smaller granularity of 4K in size and this is managed intelligently, thus achieving a higher deduplication ratio compared to its competitors which either uses a fixed chunking method or with a variable chunking method of larger block sizes in the range of 8K to 32K. The HP Lab’s testing reveals that the space savings was significant when compared with others.
Below are a set of results for a PowerPoint presentation and you can see for yourself.
(NOTE: Please note that the savings/deduplication ratio can be very different and can range from good to bad for different types of data. Video and images files are highly encoded. Seismic and geo-mapping files are highly compressed. It is very likely that most deduplication solutions cannot achieve a high percentage with these types of files)
Point #2 talks about Reduction in Disk Fragmentation. The inherent benefits from Intelligent Block Size Management brings about the Reduction in Disk Fragmentation. The smaller chunks means lesser space wastage, especially when the block size is 4K or lower. HP StoreOnce also uses an intelligent algorithm to place the blocks that are perceived to be related close to one another. Hence this “locality” presence helps and the retrieval and restore process will be faster and more efficient.
Sparse Indexing is where HP StoreOnce touts to be a game changer. Today’s data is already as massive as a mountain, and it’s going to get bigger and growing faster. Using “Version 1.0” type of deduplication, the hashes created are stored in either memory or on disks. However, the massive data sets (especially unstructured data) are already producing massive amounts of hashes. Hashes are used to identify unique data blocks but the avalanche of unstructured data means that most deduplication solutions are generating more and more hashes, making most Version 1.0s hashes sluggish and difficult to retrieve.
Sparse Indexing addresses this hash problem (by the way, HP StoreOnce uses SHA-1 hash) by intelligently sampling a small chunks and creating a very fast index lookup mechanism that stays in the system’s memory all the time. As the engineers at HP Labs put it
Instead of holding every index item in RAM ready for comparison, the HP team keeps just one in every hundred or so items in RAM and puts the rest onto a hard drive. Duplicate data almost always arrives in bursts. In other words, if one chunk of the arriving stream is a duplicate, it is very likely that many following chunks are duplicates. Sparse indexing takes advantage of this phenomenon by storing the sequence of hashes of the stored chunks next to each other on disk. As a result, a ‘hit’ in the sample RAM index can direct the system to an area of the disk where many duplicates are likely to be found.
Sparse Indexing is not unique in the industry, but the engineers at HP Labs have put their thinking hats on and applied it to improve the search and looking up of the hashes in the StoreOnce deduplication technology.
Further savings are also achieved when the deduped data is compressed with the LZ (Lempel-Ziv) compression method before it is stored into the disks.
The HP StoreOnce technology is 100% fully concocted in the renown HP Labs and according to sources, this technology will indeed permeate across all HP StorageWorks (HP has since renamed it to HP Storage) line. With this strategy, HP hopes to address the “fragmented and complicated” (as quoted by HP) deduplication and data protection strategy across the enterprise. By “fragmented and complicated”, they mean that the deduplicated data constant has to be rehydrated and deduped again as the data moves across different IT devices and functions.
In a perfect world, HP wants their StoreOnce technology to be like the diagram below.
However, one very interesting fact that I found was HP does not believe that primary storage deduplication is a good idea. They claim that it complicates the whole thing. Whether HP likes it or not, NetApp has been dishing out primary storage deduplication for several years now and you don’t see their customers unhappy with NetApp about this feature.
In one of the HP Business whitepapers I read, one of the takeaways was
I was like, “Whoa! What’s this?”. I felt bemused about what was mentioned in the whitepaper. After all the best claims of the HP StoreOnce technology, I can’t help but to think that this could be a banana skin on the pavement for HP.
Unfortunately, I am having a COW about it!
Snapshots are the inherent offspring of the copy-on-write technique used in shadow-paging filesystems. NetApp’s WAFL and Oracle Solaris ZFS are commercial implementations of shadow-paging filesystems and they are typically promoted as Copy-on-Write filesystems.
As we may already know, snapshots are point-in-time copy of the active file system in the storage world. They perform quick backup of the active file system by making a copy of the block addresses (pointers) of the filesystem and then updating the pointer maps to the inodes in the fsinfo root inode of the WAFL filesystem for new changes after the snapshot has been taken. The equivalent of fsinfo is the uberblock in the ZFS filesystem.
However, contrary to popular belief, the snapshots from WAFL and ZFS are not copy-on-write implementations even though the shadow paging filesystem tree employs the copy-on-write technique.
Consider this for a while when a snapshot is being taken … Copy —- On —- Write. If the definition is (1) Copy then (2) Write, this means that there are several several steps to perform a copy-on-write snapshot. The filesystem has to to make a copy of the original data block (1 x Read I/O), then write the original data block to a new location (1 x Write I/O) and then write the new data block to the location of the original data block (1 x Write I/O).
This is a 3-step process that can be summarized as
- Read location of original data block (1 x Read I/O)
- Copy this data block to new unused location (1 x Write I/O)
- Write the new and modified data block to the location of original data block (1 x Write I/O)
This implementation, IS THE copy-on-write technique for snapshot but NetApp and possibly Oracle guys have been saying for years that their snapshots are based on copy-on-write. This is pretty much a misnomer that needs to be corrected. EMC, in its SnapSure and SnapView implementation, called this technique Copy-on-First-Write (COFW), probably to avoid the confusion. The data blocks are copied to a savvol, a separate location to store the changes of snapshots and defaults to 10% of the total capacity of their storage solutions.
As you have seen, this method is a 3 x I/O operation and it is an expensive solution. Therefore, when we compare the speed of NetApp/ZFS snapshots to EMC’s snapshots, the EMC COFW snapshot technique will be a tad slower.
However, this method has one superior advantage over the NetApp/ZFS snapshot technique. The data blocks in the active filesystem are almost always laid out in a more contiguous fashion, resulting in a more consistent read performance throughout the life of the active file system.
Below is a diagram of how copy-on-write snapshots are implemented
What is NetApp/ZFS’s snapshot method then?
It is is known as Redirect-on-Write. Using the same step … REDIRECT —- ON —– WRITE. When a data block is about to be modified, the original data block is read (1 x Read I/O) and then the data block is written to a new location (1 x Write I/O). The active file system then updates the filesystem tree and its inode address to reflect the location of the new data block. The original data block remained unchanged.
- Read location of original data block (1 x Read I/O)
- Write modified data block to new location (1 x Write I/O)
The Redirect-on-Write method resulted in 1 Write I/O less, making snapshot creation faster. This is the NetApp/ZFS method and it is superior when compared to the Copy-on-Write snapshot technique discussed earlier.
However, as the life of the filesystem progresses, fragmentation and holes will cause the performance of the active filesystem to degrade. The reason is most related data blocks are no longer contiguous and the active file system will be busy seeking the scattered data blocks across the volume. Fragmented filesystem would have to be “cleaned and reorganized” to regain its performance lustre.
Another unwanted problem using the Redirect-on-Write snapshot technique is the snapshot resides in the same boundary as the active filesystem. Over time, if the capacity consumed by the snapshots could overwhelm the active filesystem, if their recycle schedule is unchecked.
I guess this is a case of “SUFFER NOW/ENJOY LATER” or “ENJOY NOW/SUFFER LATER”. We have to make a conscious effort to understand what snapshots are all about.
Backup is necessary evil. In IT, every operator, administrator, engineer, manager, and C-level executive knows that you got to have backup. When it comes to the protection of data and information in a business, backup is the only way.
Backup has also become the bane of IT operations. Every product that is out there in the market is trying to cram as much production data to backup as possible just to fit into the backup window. We only have 24 hours in a day, so there is no way the backup window can be increased unless
- You reduce the size of the primary data to be backed up – think compression, deduplication, archiving
- You replicate the primary data to a secondary device and backup the secondary device – which is ironic because when you replicate, you are creating a copy of the primary data, which technically is a backup. So you are technically backing up a backup
- You speed up the transfer of primary data to the backup device
Either way, the IT operations is trying to overcome the challenges of the backup window. And the whole purpose for backup is to be cock-sure that data can be restored when it comes to recovery. It’s like insurance. You pay for the premium so that you are able to use the insurance facility to recover during the times of need. We have heard that analogy many times before.
On the flip side of the coin, a snapshot is also a backup. Snapshots are point-in-time copies of the primary data and many a times, snapshots are taken and then used as the source of a “true” backup to a secondary device, be it disk-based or tape-based. However, snapshots have suffered the perception that it is a pseudo-backup, until recent last couple of years.
Here are some food for thoughts …
WHAT IF we eliminate backing data to a secondary device?
WHAT IF the IT operations is ready to embrace snapshots as the true backup?
WHAT IF we rely on snapshots for backup and replicated snapshots for disaster recovery?
First of all, it will solve the perennial issues of backup to a “secondary device”. The operative word here is the “secondary device”, because that secondary device is usually external to the primary storage.
Tape subsystems and tape are constantly being ridiculed as the culprit of missing backup windows. Duplications after duplications of the same set of files in every backup set triggered the adoption of deduplication solutions from Data Domain, Avamar, PureDisk, ExaGrid, Quantum and so on. Networks are also blamed because network backup runs through the LAN. LANless backup will use another conduit, usually Fibre Channel, to transport data to the secondary device.
If we eliminate the “secondary device” and perform backup in the primary storage itself, then networks are no longer part of the backup. There is no need for deduplication because the data could already have been deduplicated and compressed in the primary storage.
Note that what I have suggested is to backup, compress and dedupe, AND also restore from the primary storage. There is no secondary storage device for backup, compress, dedupe and restore.
Wouldn’t that paint a better way of doing backup?
Snapshots will be the only mechanism to backup. Snapshots are quick, usually in minutes and some in seconds. Most snapshot implementations today are space efficient, consuming storage only for delta changes. The primary device will compress and dedupe, depending on the data’s characteristics.
For DR, snapshots are shipped to a remote storage of equal prowess at the DR site, where the snapshot can be rebuild and be in a ready mode to become primary data when required. NetApp SnapVault is one example. ZFS snapshot replication is another.
And when it comes to recovery, quick restores of primary data will be from snapshots. If the primary storage goes down, clients and host initiators can be rerouted quickly to the DR device for services to resume.
I believe with the convergence of multi-core processing power, 10GbE networks, SSDs, very large capacity drives, we could be seeing a shift in the backup design model and possible the entire IT landscape. Snapshots could very likely replace traditional backup in the near future, and secondary device may be a thing of the past.