Category Archives: Virtualization
This is Part 2 of my previous blog about VAAI (vStorage API for Array Integration) with more details about VAAI. VAAI offloads some of the I/O related functions to the VAAI-enable storage array, hence giving the hypervisor more compute and memory resource to do it other functions. And the storage array, upon receiving the VAAI command, will execute whatever that is required of it.
Why is VAAI important? What does it do that makes it so useful and important to the hypervisor?
VAAI is about a set of new SCSI commands. And there are 3 important ones:
And if you want to see the VAAI Hardware Accelerated Full Copy (aka XSET) in action, here’s a little video showing how it is done in an EMC environment.
The primary significance and noticeable benefit is definitely performance. The secondary benefit, though not so obvious, is allowing VMware and its hypervisor to scale because it does not get bogged down by some of the I/O functions that it is not meant to do.
There were some new additions in vSphere 5.0 for VAAI. From its FAQ, it listed in ESX5.0, support for NAS Hardware Acceleration is included with support for the following primitives:
- Full File Clone – Like the Full Copy VAAI primitive provided for block arrays, this Full File Clone primitive enables virtual disks to be cloned by the NAS device.
- Native Snapshot Support – Allows creation of virtual machine snapshots to be offloaded the array.
- Extended Statistics – Enables visibility to space usage on NAS datastores and is useful for Thin Provisioning.
- Reserve Space – Enables creation of thick virtual disk files on NAS.
So, there you have it folks. Why VAAI? Here’s why.
First of all, let me apologize. I am guilty of not updating my blogs as regularly as I did in the past. Things got a bit crazy after Christmas and I had to juggle several things that demand more of my attention but I am confident things will sort itself out soon enough.
Today’s topic is about VMware’s VAAI (vSphere vStorage API for Array Integration). This feature was announced more than 3 years ago but was only introduced in vSphere 4.1 July 2010 and now with newer enhancements in the latest release of vSphere 5.0.
What is this VAAI and what does this mean from a storage perspective?
When VMware came into prominence in version 3.0/3.5 time, the whole world revolved around the ESX hypervisor. It tried to do everything on its own, in its own proprietary nature. Given its nascent existence then, ESX had to do what it had to do and control everything with its hypervisor universe. Yes, it was a good move then and it did what it was supposed to do. This was back when server virtualization was in its infancy, and resources requirements were less demanding.
Hence when VMware wants to initialize VMs, or create VMDK files on the datastore, or creating clones or snapshots, or even executing VMotion and Storage VMotion, it tends to execute it at the hypervisor level. For example, when creating virtual disks with VMFS, most of the commands to initialization of the disks were done at the VMFS level. Zeroing the virtual disks would mean sending zeroing commands to the actual physical disks on the shared storage. And this would go on back and forth, taxing the CPU cycles and memory on the hypervisor layer, and sending wasteful and unnecessary zeroes over the network to the storage array. This was very inefficient, wasteful and degrades the performance tremendously, especially at the hypervisor layer (compute and memory).
There are also other operations such as virtual disks locking that locks up the entire LUN that housed several datastores. Again, not good.
But VMware took off like a rocket, and quickly established itself as a Tier 1, enterprise server virtualization solution addressing the highest demands of the enterprise. It is also defining the future of Cloud Computing, building exorbitant requirements as it pushes forward. And VMware began to realize that if the hypervisor is to scale, it needs to leave the I/O operations to the “experts”, and the “experts” here being the respective storage array itself.
So, in version 4.1, VAAI (vStorage API for Array Integration) was introduced as an API suite, following 3 other earlier APIs – vStorage API for Site Recovery Manager (SRM), vStorage API for Data Protection and vStorage API for Multipathing.
In a nutshell, as I have mentioned before, VAAI offloads I/O and storage related operations to the VAAI-capable storage array (leave it to the experts) as shown in the diagram below:
Of course, the storage vendors themselves has to rework their array OS layer to integrate with the VAAI API. You can say that the VAAI are “hooks” that enhances the storage connectivity and communications with vSphere’s hypervisor. But then again, if you look at it from the other angle, vSphere need the storage vendors more in order for its universe to scale. Good thing VMware has a big, big market share. Imagine if there are no takers for the VAAI APIs. That would be a strange predicament instead.
What is the big deal that we get from VAAI? The significant and noticeable benefit is increase performance. By offloading the I/O functionality and operations to the storage array itself, the hypervisor and the compute and memory resource are not bogged down, resulting in higher performance and better response time to serve its VMs and other VM operations.
I am going off to another meeting and I shall write of VAAI in more details later. Until the next entry, adios and have a great year ahead.
I wanted to sign off early tonight but an article in ComputerWorld caught my tired eyes. It was titled “EMC to put hardware into servers, VMs into storage” and after I read it, I couldn’t help but to juxtapose the articles with what I said earlier in my blogs, here and here.
It is very interesting to note that “EMC runs vSphere directly on the storage controllers and then uses vMotion to migrate VMs from application servers onto the storage array, ..” since the storage boxes have enough compute power to run Virtual Machines on the storage. Traditionally and widely accepted, VMs should be running on servers. Contrary to beliefs, EMC has already demonstrated this running of VMs capability on their VNX, Isilon and Symmetrix.
And soon, with EMC’s Project Lightning (announced at EMC World in May 2011), they will be introducing server side PCIe-based SSDs, ala Fusion-IO. This is different from the NetApp PAM/FlashCache PCIe-based card, which sits on their arrays, not on hosts or servers. And it is also very interesting to note that this EMC server-side PCIe Flash SSD card will become a bridge to EMC’s FAST (Fully Automated Storage Tiering) architecture, enabling it to place hot, warm and cold data strategically on different storage tiers of the applications on VMware’s VMs (now on either the server or the storage), perhaps using vMotion as a data mover on top of the “specialized” link created by the server-side EMC PCIe card.
This also blurs the line between the servers and storage and creates a virtual architecture between servers and storage, because what used to be distinct data border of the servers is now being melded into the EMC storage array, virtually.
2 red alerts are flagging in my brain right now.
- The “bridge” has just linked the server back to the storage, after years of talking about networked storage. The server is ONE again with the storage. Doesn’t that look to you like a server with plenty of storage? It has come a full cycle. But more interesting and what I am eager to see is what more is this “bridge” capable of when it comes to data management. vMotion might be the first of many new “protocol” breeds to enhance data management and mobility with this “bridge”. I am salivating right now of this massive potential.
- What else can EMC do with the VMware API? This capability I am writing right now is made possible by EMC tweaking VMware’s API to maximize much, much more. As the VMware vStorage API is continually being enhanced, the potential is again, very massive and could change the entire landscape of cloud computing and subsequently, the entire IT landscape. This is another Pavlov’s dog moment (see figures below as part of my satirical joke on myself)
Sorry, the diagram below is not related to what my blog entry is. Just my way of describing myself right now. ;-)
I am extremely impressed with what EMC is doing. A lot of smarts and thinking go into this and this is definitely signs of things to come. The server and the storage are “merging again”. Think of it as Borg assimilation in Star Trek.
Resistance is futile!
This week I went off the beaten track to get back to my first love – Solaris. Now that Oracle owns it, it shall be known as Oracle Solaris. I am working on a small project based on (Oracle) Solaris Containers and I must say, I am intrigued by it. And I felt good punching the good ‘ol command lines in Solaris again.
Oracle actually offers a lot of virtualization technologies – Oracle VM, Oracle VM Dynamic Domains, Oracle Solaris Logical Domains (LDOMs), Oracle Solaris Containers (aka Zones) and Oracle VirtualBox. Other than VirtualBox, the other VE (Virtualized Environment) solutions are enterprise solutions but unfortunately, they lack the pizazz of VMware at this point in time. From my perspective, they are also very Oracle/Solaris-centric, making them less appealing to the industry at this moment
Here’s an old Sun diagram of what Sun virtualization solutions are:
What I am working on this week is Solaris Containers or Zones. The Containers solution is rather similar to VMware’s gamut of Tier-2 Virtualization solutions that are host-based. Solutions that fall into this category are VMware Server, VMware Workstation, VMware Player, VMware ACE and VMware Fusion for MacOS. Therefore, it requires a host OS to run the Solaris Containers.
I did not have a Solaris Resource Manager software to run the GUI stuff, so I had to get back to basics with CLI, which is good for me. In fact, I liked it even more and with the CLI, I could pretty much create zones with ease. And given the fact that the host OS is Solaris 10, I could instantly feel the robustness, the performance, the stability and the power of Solaris 10, unlike the flaky Windows hosting VMware host-based virtualization solutions or the iffiness of Linux.
A more in depth look of Solaris Containers/Zones is shown below.
At first touch, 2 things impressed me
- The isolation of each Container and its global master domain is very well defined. What can be done, and what cannot be done; what can be configured and what cannot, is very clear and the configurability of each parameter is quickly acknowledged and controlled by the Solaris kernel. From what I read, Solaris Containers has achieved the highest level of security with its Trusted Extension component, which is a re-implementation of Trusted Solaris. Solaris 10 has received the highest commercial level of Common Criteria Certification. This is known as EAL4+ and has been accepted by the U.S DoD (Department of Defense).
- It’s simplicity in administering compute and memory resources to the Containers. I will share that in CLI with you later.
To start, we acknowledge that there is likely a global zone that has been created when Solaris 10 was first installed.
To create a zone and configuring it with CLI, it is pretty straightforward. Here’s a glimpse of what I did yesterday.
# zonecfg –z perf-rac1 Use ‘create’ to be configuring a zone zonecfg:perf-rac1> create zonecfg:perf-rac1> set zonepath=rpool/perfzones/perf-rac1 zonecfg:perf-rac1> set autoboot=true zonecfg:perf-rac1> remove inherit-pkg-dir dir=/lib zonecfg:perf-rac1> remove inherit-pkg-dir dir=/sbin zonecfg:perf-rac1> remove inherit-pkg-dir dir=/usr zonecfg:perf-rac1> remove inherit-pkg-dir dir=/usr/local zonecfg:perf-rac1> add net zonecfg:perf-rac1:net> set address=<input from parameter> zonecfg:perf-rac1:net> set physical=<bge0|or correct Ethernet interface> zonecfg:perf-rac1:net> end zonecfg:perf-rac1> add dedicated-cpu zonecfg:perf-rac1:dedicated-cpu> set ncpus=2-4 (or any potential cpus on sun box) zonecfg:perf-rac1:dedicated-cpu>end zonecfg:perf-rac1> add capped-memory zonecfg:perf-rac1:capped-memory> set physical=4g zonecfg:perf-rac1:capped-memory>set swap=1g zonecfg:perf-rac1:capped-memory>set locked=1g zonecfg:perf-rac1:capped-memory>end zonecfg:perf-rac1> verify zonecfg:perf-rac1> commit zonecfg:perf-rac1> exit
The command zonecfg -z <zonename> triggers a configuration prompt where I run create to create the zone. I set the zonepath to list where the zone files will be contained and set the autoboot=true so that it will automatically start during a reboot.
Solaris Containers is pretty cool where it has the ability to either inherit or share the common directories such as /usr, /lib, /sbin and others or create its own set of directories separate from the global root directory tree. Here I choose to remove the inheritance and allow the Solaris in the Container to have its own independent directories.
The commands add net sends me into another sub-category where I can configure the network interface as well as the network address. Nothing spectacular there. I end the configuration and do a couple of cool things which are related to resource management.
I have added add dedicated-cpu and set ncpus=2-4 and also add capped-memory of physical=4g, swap=1gb, locked=1gb. What I have done is to allocate a minimum of 2 CPU resources and a maximum of 4 CPU resources (if resource permits) to the zone called perf-rac1. Additionally, I have allowed it to have a capped memory of at most 4GB of RAM, with assured of 1GB of RAM. Swap space wis set at 1GB.
This resource management allows me to build a high performance Solaris Container for Oracle 11g RAC. Of course, you are free to create as many containers as long as the system resources allow it. Note that I did not include the shared memory and semaphores parameters required for Oracle 11g RAC but go ahead and consult your favourite Oracle DBA (have fun doing so!)
After the perf-rac1 zone/container has been created (and configured), I just need to run the following
# zoneadm –z perf-rac1 install # zoneadm –z perf-rac1 boot
These 2 commands will install the zone and start the installation process. It will copy all the packages from the global zone and start the installation as per normal. Once the “installation” is complete, there will be the usual Solaris configuration form where information such as timezone, IP address, root login/password and so on are input. That will take about 20-40 minutes, depending on the amount of things to be installed and of course, the power of the Sun system. I am running an old Sun V210 with 512MB, so it took a while.
When it’s done, we can just login into the zone with the command
# zlogin –C perf-rac1
and I get into another Solaris OS in the Solaris Container.
What I liked what the fact that Solaris Containers is rather simple to understand but the flexibility to configure computing resources to it is pretty impressive. It’s fun working on this stuff again after years away from Solaris. (This was after I took my RedHat RHCE certification and I pretty much left Sun Solaris for quite a while).
More testing to be done, but overall I am quite happy to be back as a Solaris virgin again.
I picked up a new article this afternoon from SearchStorage – titled “Enterprise storage trends: SSDs, capacity optimization, auto tiering“. I cannot help but notice some of the things I have been writing about VMware being the storage killer and the rise of Cloud Computing which take away our jobs.
I did receive some feedback about what I wrote in the past and after reading the SearchStorage article, I can’t help but feeling justified. On the side bar, it wrote:
“The rise of virtual machine-specific and cloud storage suggest that other changes are imminent. In both cases …. and would no longer require storage architects and managers.”
Things are changing at an extremely fast pace and for those of us still languishing in the realms of NAS and SAN, our expertise could be rendered obsolete pretty quickly.
But all is not lost because it would be easier for a storage engineer, who already has the foundation to move into the virtualization space than a server virtualization engineer coming down to learn about the storage fundamentals. We can either choose to be dinosaur or be the species of the next generation.
This is breaking news. RedHat is to acquire Gluster!
What is Gluster? Gluster is a clustering Linux distribution started by Z Research under the direction of Anand Babu (who is currently Gluster’s CEO) aiming to commoditize supercomputing and supercomputing clustered storage. Gluster is open source but there is a commercial version as well. It runs on commodity 64-bit x86 hardware. The Gluster File System (GlusterFS) aggregates disks and memory resources into a pool of storage thru a single global namespace and accessed through multiple file-level protocols. The scale-out architecture is where storage resources can be added as a storage node in a building block fashion to meet performance and capacity demands, rather like what HP P4000 is doing to the block-level environment for SAN.
Gluster can integrated with most 64-bit Linux distros. This is done at the Linux user space but it can also be crafted at the Linux kernel space, where it is a software appliance, easily integrated into off-the-shelf 64-bit x86-64 platforms. This means that you can build a scale-out NAS pretty easily using your own hardware.
From an architecture standpoint, GlusterFS and its integration to a storage appliance looks like this:
Because it works in a modular add-on fashion, this architecture is distribution and extended by replicating the same architecture across additional x86-64 platforms (which is a storage node) as shown below.
It’s really easy to install Gluster and build the Scale Out NAS. I have been saving a couple videos about how Gluster is installed and I must say that it’s pretty easy. In less than 30 minutes, you can install your first Gluster storage node and then add additional nodes on the fly.
Enjoy the videos.
Video #1 (Gluster Installation)
(I have difficulty uploading the videos because WordPress requires me to purchase one of their solutions)
Video #2 (Creating and adding Storage Node in Gluster)
(I have difficulty uploading the videos because WordPress requires me to purchase one of their solutions)
Note: If you are interested to see the videos, please email to me at firstname.lastname@example.org.
This news gets me very excited because this is the perfect endorsement of what I have been saying all along. Storage networking and data management are the foundations of CLOUD and VIRTUALIZATION. Without data being stored and managed well, everything falls apart. And as I have mentioned many times before, this is a fantastic time to become an extra-ordinary storage engineer/consultant/architect/sales (maybe not!)
After being in the storage networking industry for so long, I have seen most of the new storage solutions out there. Most of them don’t really differ much from what already out there, and it gets a little boring. But once in a while, a little gem is unearthed and my excitement bubbles up again.
Today, I was at the HP P4000 G2 SAN workshop and the LeftHand Networks SAN/iQ storage solution which HP acquired in 2008 left me with 3 words – Interesting, Innovative and Impressive – from a technology standpoint.
I must admit that this is a little gem that got past my radar and now it’s HP’s gain. I have heard about LeftHand Networks in the past, and at the same time, I was also looking at another storage solution called Intransa. Unfortunately, Intransa went on to differentiate themselves and today, they are focused more as a storage solution for videos and CCTVs, seldom surfacing with innovative technology. LeftHand Networks was and is different and I can understand why HP bought them, because the technology that they bring with them to HP is really cool!
Now rebranded and renamed as HP P4000 G2 SAN, the storage solution no longer sits on proprietary hardware. As part of HP’s Converged Infrastructure strategy, the SAN/iQ has been fully integrated into the HP Proliant x86 platform (I heard there’s a blade version as well), making it simple to procure and probably helps simplify operational resource planning and logistics as well. At the same, there is also a P4000 VSA (Virtual Storage Appliance) as well, which HP guys have been using for demo for several years now. There is a 60-day trial available at the HP P4000 VSA Download site, for organizations to have a try-and-buy and if they do, they can turn some of their old x86 platforms into a storage appliance by just adding more hard disk drives. That’s saves money too!
So, what’s cool, you say?
2 key technologies stands out
- Storage Clustering
- Network RAID
As I was well informed at the workshop today, the Storage Clustering technology is not exclusive to the P4000. In fact, Dell EqualLogic employs something similar as well. But it was something that impressed me and it is different from the traditional storage SANs that we usually see.
You see, in the traditional SAN setup, the LUNs or volumes are either loosely or tightly linked to 2 active/active storage processors/controllers. And the way most of the storage vendors do, when a customer runs out of capacity or performance or both, they would have to do a forklift upgrade of the controllers. This is something that is disruptive and also does not allow CPU, memory or I/O channels upgrade to the existing controller. Today, most storage vendors do not allow you to break open the storage processor chassis and change the CPU, add more RAM or add more I/O paths to support more disk drives or increase throughput. Mind you, this is something that I have been questioning for a long time but as the storage networking industry has it, you got to upgrade the entire storage processor or controller in order to get more power and capacity.
The P4000 (as well as the Dell EqualLogic) approaches this from another angle where instead of doing a forklift upgrade of the storage processor/controller, just add another node of the same CPU and RAM profile, and have the P4000 SAN/iQ software group the new node together with the existing node(s) to form a storage cluster group. As best practice, the Storage Cluster feature should have 16 nodes or less, but in one of the war stories shared, one customer in the US actually had 32 nodes in a Storage Cluster group, for storage capacity reasons.
As more nodes are added to the Storage Cluster group, the LUNs/volumes can be extended or spanned to the other nodes as long as they are physically connected in a Gigabit network and the entire LUN or volume is been seen as ONE irregardless of which physical nodes it may be sitting. Typically you will see this sort of thing of single “Global Namespace” concept at the file system level but this is the first time I have seen it implemented at the SAN level. (Ok, I have to admit that I am a little behind times with this technology)
Here’s a little diagram I dug up from LeftHand before it was acquired by HP which I hope will enlightened the readers about this Storage Cluster feature.
But the best is yet to come as the HP Solution Architect (Timothy Chua) mentioned that the Network RAID feature was uniquely LeftHand’s and way cooler. And I couldn’t agree more because this lighted me up like a spark plug!
Since Storage Clustering could span LUNs/volumes across nodes, it was only natural that the RAID capability be extended across nodes as well. RAID-10, RAID-5, RAID-6 could all be spanned across all nodes, spread the data blocks and its mirrored/parity data blocks across the nodes in the network. And the nodes does not have to at a single site. With Gigabit networks, the nodes can be separated into multiple sites as well, giving the entire solution quite a comprehensive campus-wide storage high availability. And since this is Network RAID, it gives an entirely new meaning to the word Disaster Recovery because this will eliminate the need for data replication. Primary data in a Network RAID-10 in Node 1/Site 2 could be mirrored in Node 2/Site 2, which can be further mirrored to Node 3/Site 3 and Node 4/Site 4 for a 4-way mirror. This is the P4000 Multi-site SAN solution.
The diagram below shows how Network RAID is implemented with VMware ESX.
And since replication is no longer a requirement, VMware’s SRM (Site Recovery Manager) is also not required as well.
It is no surprise that synchronous replication in the P4000 solution is equivalent to Network RAID. Though the concept of separating the storage controllers/nodes into multiple sites for true long-distance mirroring exists, they usually don’t exist at this level. NetApp has their Fabric and Stretch MetroCluster and EMC has their VPlex, but they usually are proposed at the higher end of the spectrum. Looks to me that HP P4000 is the only one that has this concept at the entry level iSCSI SAN level. Kudos!
They have an asynchronous replication as well for longer distance networks.
I did not stay for the demo today but I am already tickled pink about the HP P4000 technology. It had a good impression on me and I can’t wait to know more of how it works internally. Looking forward to a deeper dive of the P4000 and hope to stay for the demo next time.
I was reading the news from Oracle OpenWorld and a slew of news about specialized appliances are on the menu.
Oracle added Big Data Appliance and Oracle Exalytics Business Intelligence Machine to its previous numero uno, Exadata Database Machine. EMC, also announced its Green Plum Data Computing Appliance and also its VNX Unified Storage for Oracle.
The EMC VNX Unified Storage for Oracle is a VNX system that has Oracle installed in a VMware vSphere virtual machine environment. The system is meant to unify all Oracle environments--database over Oracle Direct NFS, application servers over NFS, and testing and development over NFS--resulting in less disk space used and faster testing. EMC says this configuration was made because 50% of Oracle customers are virtualizing their systems today. The VNX Unified Storage for Oracle includes EMC's Fully Automated Storage Tiering (FAST) technology, which migrates most frequently used data between a primary Fibre Channel drive and solid state drives and migrates less frequently used data to Serial ATA (SATA) drives and its FAST Cache. In an Oracle environment, FAST is well-suited to database applications that generate a large number of random inputs-outputs, that experience sudden bursts in user query activity, or a high number of user loads and where the entire working set can be contained in the solid state drive cache. Based on testing carried out on an Oracle Real Application Clusters (RAC) 11g database that was configured to access the VNX7500 file storage over the Network File System (NFS), using the Oracle Direct NFS (dNFS) client, results showed an 100% improvement in transactions per minute (TPM), 170% improvement in IOPS, and a 79% decrease in response time, the company said.
As for GreenPlum, EMC quoted:
The company also is showing off the EMC Greenplum Data Computing Appliance(DCA) for Big Data Analytics configuration, which provides a new migration path to Greenplum for Oracle Data Warehouse. This system includes the Greenplum Data Computing Appliance, EMC's Global Data Warehouse, and EMC's IT Business Intelligence Grid infrastructure. The EMC Greenplum DCA consists of 8 to 16 segment servers running Red Hat Enterprise Linux. Each segment server contains 96 to 192 processor cores, with 384 GB to 768 GB of memory per segment server. The DCA includes 12 600-GB Serial Attached SCSI (SAS) 15K RPM drives for a total useable and compressed capacity of 73 TB to 144 TB. The DCA competes with Oracle's Exadata Database Machine. In tests performed with this server/storage configuration and a 15-TB Oracle Data Warehouse, the DCA processed a 99 million rows query in less than 28 seconds vs. seven minutes in a traditional Oracle environment and data loads decreased from six days to 29 minutes
It is getting pretty obvious that specialized appliances are making waves at Oracle OpenWorld but what’s more interesting is the return of a combined and integrated environment of compute and storage as I have mentioned in my previous blog. And I forsee that these specialized appliances will be one of the building blocks of cloud computing together with general purposes platforms such as x86, JBODs and the glue to all these, virtualization, notably VMware.
Compute and storage are 2 components within the IT infrastructure which are surely converging. SAN and NAS are facing their greatest adversary yet, and could be made insignificant if the cloud and virtualization game had their way. This is giving rise to the a new breed of solution, a specialized appliance where both compute and storage are ONE. Rising from the ashes of shared storage (SAN and NAS, take note), we are beginning to see things going back to way of direct, internal storage.
There were some scuffles in the bushes about 5 years, where Sun (now Oracle) was ahead of its game. The Sun Fire X4500 (aka Thumper) was one of the strong candidates to challenge the SAN/NAS duopoly in this networked storage period. X4500 integrated both the server and the storage components together, using ZFS as a file system and volume manager to deliver a very high throughput on all the JBOD disks very efficiently. ZFS acted as the RAID, so there was no need to have specialized RAID hardware. This proved that a very high performance storage solution can be easily integrated using standard off-the-shelf infrastructure components and the x86 architecture. By combining both compute and storage together, there were hints that the industry was about to rise up to Direct-Attached Storage (DAS) again, despite its perceived weakness against SAN and NAS.
Unfortunately, the applications were not ready for DAS then. Besides ZFS, applications such as databases, emails and file servers were not ready to jump into the DAS bandwagon and watch them ride into the sunset. But the fairy tale seems to be retold again, and this time, the evidence that DAS could rise again is much stronger.
The catalyst to this disruptive force? Virtualization!
I mentioned that VMware is the silent storage killer a few blogs ago. Needless to say, that ruffled a few featheres among the readers. I have no doubt that virtualization is changing how we storage guys look at SAN and NAS. In a traditional setup, the SAN or NAS is setup to provision LUNs or mount points to the data storage for VMFS volumes in the VMware environment. It will then be the storage array to provide snapshots, replications, thin provisioning and so on.
Perhaps VMware is nit picking that managing storage arrays for VMFS volumes is difficult. From the VMware administrators view, they are right. They don’t want to know what’s going on below the VM-level. All they want is storage, any kind of storage and VMware will manage the volumes, snapshots, replication and thin provisioning. Indeed they were already doing that since vStorage API was introduced. In the new release of VMware version 5.0, the ante has been upped even higher, making networked storage less and less significant.
If you want to know about vStorage API and stuff, below is a diagram of the integration of the various components at the VMware API level.
VMware can now use direct, internal storage look like shared storage. The Virtual Storage Appliance (VSA) does just that. VMware already has a thriving market from the community and hobbists for VMware Appliances.
The appliance market has now evolved into new infrastructure too. Using x86 architecture, off-the-shelf infrastructure components (sounds familiar?), companies such as Nutanix and Tintri are taking advantage of this booming trend to introduce specialized VMware appliances as shown in their advertisements on their respective web sites.
Here’s the Nutanix Ad:
Here’s the Tintri Ad:
Both Tintri and Nutanix are a new breed of appliances – specialized appliances for VMware.
At the same time, other applications are building these specialized appliances as well. I have mentioned Oracle Exadata many times in the past and Oracle Exadata is the perfect example an a fine-tuned, hardcore database engine to make the Oracle run at the best performance possible.
Likewise HP has announced their E5000 Messaging System for Microsoft Exchange. The E5000 is a specialized appliance optimized and well-tuned for the Microsoft Exchange Server 2010. From the words of HP,
“HP E5000 Messaging System is the industry’s first fully self-contained platform built for the next-generation of Microsoft Exchange to deliver enterprise-class messaging to businesses of all sizes. Built as a turnkey solution that can be up and running in a few hours vs. days, the HP E5000 Messaging System gives business users the experience they want most: large mailboxes, centralized archiving of mailboxes files and 24×7 access from any device. IT staffs benefit the solutions simplicity to setup, scale and manage and to meet new demands affordably. Ideal for multi-site enterprises as well as branch office and remote office environments, each HP Messaging System delivers greater simplicity and accelerates deployment with preconfigured solutions starting at 500 mailboxes up to 3000 mailboxes, while delivering large, 1 to 2.5GB mailbox sizes. Clients can grow by adding storage capacity or more appliances within the environment up from hundreds to thousands of mailboxes.”
What are the specs of this E5000 box, you say? Here you go:
And look at Row#2 in the table above … Direct, Internal Disks! Look at Row #4, Xeon CPUs! Both Compute and Storage in the same appliance!
While the HP E5000 announcement was recently, Hitachi Data Systems were already in the game early with their Unified Compute Platform and their Converged Platform for Microsoft Exchange with relatively the same idea – specialized appliances.
Perhaps the HDS solutions aren’t exactly direct, internal storage but the concept is still the same – specialized appliance. HDS Unified Compute Platform (UCP) has these components.
HDS Converged Platform for MS Exchange provides their specialized “appliance” with Reference Architectures that can support up to 68,000 Microsoft Exchange mailboxes. Here’s an architecture diagram of their “appliance”
There’s no denying that the networked storage landscape is changing. So are the computing platforms. We are already seeing the compute and storage components being integrated together, tighter than ever. The wave is rising for specialized appliances and it can only get more intense from now on.
No wonder HP’s Converged Infrastructure vision is betting on x86 architecture, simple storage platforms with SAS/SATA disks and Virtualization. Other vendors are doing the same as well – Cisco, NetApp and VMware with their FlexPod solution and EMC with their VBlocks of VMware, Cisco and EMC Storage.
Hail to the Rise of the Specialized Appliance!
I was at the RedHat Forum last week when I chanced upon a conversation between an attendee and one of the ECS engineers. The conversation went like this
Attendee: Is the RHEV running on SAN or NAS?
ECS Engineer: Oh, for this demo, it is running NFS but in production, you should run iSCSI or Fibre Channel. NFS is only for labs only, not good for production.
Attendee: I see … (and he went off)
I was standing next to them munching my mini-pizza and in my mind, “Oh, come on, NFS is better than that!”
NAS has always played a smaller brother to SAN but usually for the wrong reasons. Perhaps it is the perception that NAS is low-end and not good enough for high-end production systems. However, this is very wrong because NAS has been growing at a faster rate than Fibre Channel, and at the same time Fibre Channel growth has been tapering and possibly on the wane. And I have always said that NAS is a better suited protocol when it comes to unstructured data and files because the NAS protocol is the new storage networking currency of Internet storage and the Cloud (this could change very soon with the REST protocol, but that’s another story). Where else can you find a protocol where sharing is key. iSCSI, even though it has been growing at a faster pace in production storage, cannot be shared easily because it is block-based.
Now back to NFS. NFS version 3 has been around for more than 15 years and has taken its share of bad raps. I agree that this protocol is still very much in the landscape of most NFS installations. But NFS version 4 is changing all that taking on the better parts of the CIFS protocol, notably the equivalent of opportunistic locking or oplocks. In addition to that it has greatly enhanced its security, incorporating Kerberos-type of authentication. As for performance, NFS v4 added in a compounded in a COMPOUND operations for aggregating operations into a single request.
Today, most virtualization solutions from VMware and RedHat works with NFS natively. Note that the Windows CIFS protocol is not supported, only NFS.
This blog entry is not stating that NFS is better than iSCSI or FC but to give NFS credit where credit is due. NFS is not inferior to these block-based protocols. In fact, there are situations where NFS is better, like for instance, expanding the NFS-based datastore on the fly in a VMware implementation. I will use several performance related examples since performance is often used as a yardstick when these protocols are compared.
In an experiment conducted by VMware based on a version 4.0, with all things being equal, below is a series of graphs that compares these 3 protocols (NFS, iSCSI and FC). Note the comparison between NFS and iSCSI rather than FC because NFS and iSCSI run on Gigabit Ethernet, whereas FC is on a different networking platform (hey, if you got the money, go ahead and buy FC!)
Based a one virtual machine (VM), the Read throughput statistics (higher is better) are:
The red circle shows that NFS is up there with iSCSI in terms of read throughput from 4K blocks to 512K blocks. As for write throughput for 1 VM, the graph is shown below:
Even though NFS suffers in write throughput in the smaller blocks less than 16KB, NFS performance write throughput improves over iSCSI when between 16K and 32K range and is equal when it is in 64K, 128K and 512K block tests.
The 2 graphs above are of a single VM. But in most real production environment, a single ESX host will run multiple VMs and here is the throughput graph for multiple VMs.
Oh, you might say that this is just VMs without any OSes or any applications running in these VMs. Next, I want to share with you another performance testing conducted by VMware for an Microsoft Exchange environment.
The next statistics are produced from an Exchange Load Generator (popularly known as LoadGen) to simulate the load of 16,000 Exchange users running in 8 VMs. With all things being equal again, you will be surprised after you see these graphs.
The graph above shows the average send mail latency of the 3 protocols (lower is better). On the average, NFS has lower latency than iSCSI, better than what most people might think. Another graph shows the 95th percentile of send mail latency below:
Again, you can see that the NFS’s latency is lower than iSCSI. Interesting isn’t it?
What about IOPS then? In another test with an 8-hour DoubleHeavy LoadGen simulator, the IOPS graphs for all 3 protocols are shown below:
As I have shown, NFS is not inferior compared to the block-based protocols such as iSCSI. In fact, VMware in version 4.1 has improved all 3 storage protocols significantly as mentioned in the VMware paper. The following are quoted in the paper for NFS and iSCSI.
- Using storage microbenchmarks, we observe that vSphere 4.1 NFS shows improvements in the range of 12–40% for Reads,and improvements in the range of 32–124% for Writes, over 10GbE.
- Using storage microbenchmarks, we observe that vSphere 4.1 Software iSCSI shows improvements in the range of 6–23% for Reads, and improvements in the range of 8–19% for Writes, over 10GbE
The performance improvement for NFS is significant when the network infrastructure was 10GbE. The percentage jump between 32-124%! That’s a whopping figure compared to iSCSI which ranged from 8-19%. Since both protocols are neck-to-neck in version 4.0, NFS seems to be taking a bigger lead in version 4.1. With the release of VMware version 5.0 a few weeks ago, we shall know the performance of both NFS and iSCSI soon.
To be fair, NFS does take a higher CPU performance hit compared to iSCSI as the graph below shows:
Therefore, NFS isn’t inferior at all compared to iSCSI, even in a 10GbE environment. We just got to know the facts instead of brushing off NFS.