Category Archives: Cloud

Solid?

The next all-Flash product in my review list is SolidFire. Immediately, the niche that SolidFire is trying to carve out is obvious. It’s not for regular commercial customers. It is meant for Cloud Service Providers, because the features and the technology that they have innovated are quite cloud-intended.

Are they solid (pun intended)? Well, if they have managed to secure a Series B funding of USD$25 million (total of USD$37 million overall) from VCs such as NEA and Valhalla, and also angel investors such as Frank Slootman (ex-Data Domain CEO) and Greg Papadopoulus(ex-Sun Microsystems CTO), then obviously there is something more than meets the eye.

The one thing I got while looking up SolidFire is there is probably a lot of technology and innovation behind their  Nodes and their Element OS. They hold their cards very, very close to their chest, and I couldn’t not get much good technology related information from their website or in Google. But here’s a look of how the SolidFire is like:

The SolidFire only has one product model, and that is the 1U SF3010. The SF3010 has 10 x 2.5″ 300GB SSDs giving it a raw total of 3TB per 1U. The minimum configuration is 3 nodes, and it scales to 100 nodes. The reason for starting with 3 nodes is of course, for redundancy. Each SF3010 node has 8GB NVRAM and 72GB RAM and sports 2 x 10GbE ports for iSCSI connectivity, especially when the core engineering talents were from LeftHand Networks. LeftHand Networks product is now HP P4000. There is no Fibre Channel or NAS front end to the applications.

Each node runs 2 x Intel Xeon 2.4GHz 6-core CPUs. The 1U height is important to the cloud provider, as the price of floor space is an important consideration.

Aside from the SF3010 storage nodes, the other important ingredient is their SolidFire Element OS.

Cloud storage needs to be available. The SolidFire Helix Self-Healing data protection is a feature that is capable of handling multiple concurrent failures across all levels of their storage. Data blocks are replicated randomly but intelligently across all storage nodes to ensure that the failure or disruption of access to a particular data block is circumvented with another copy of the data block somewhere else within the cluster. The idea is not new, but effective because solutions such as EMC Centera and IBM XIV employ this idea in their data availability. But still, the ability for self-healing ensures a very highly available storage where data is always available.

To address the efficiency of storage, having 3TB raw in the SF3010 is definitely not sufficient. Therefore, the Element OS always have thin provision, real-time compression and in-line deduplication turned on. These features cannot be turned off and operate at a fine-grained 4K blocks. Also important is the intelligence to reclaim of zeroed blocks, no-reservation,  and no data movement in these innovations. This means that there will be no I/O impact, as claimed by SolidFire.

But the one feature that differentiates SolidFire when targeting storage for Cloud Service Providers is their guaranteed volume level Quality of Service (QOS). This is important and SolidFire has positioned their QOS settings into an advantage. As best practice, Cloud Service Providers should always leverage the QOS functionality to improve their storage utilization

The QOS has:

  • Minimum IOPS – Lower IOPS means lower performance priority (makes good sense)
  • Maximum IOPS
  • Burst IOPS – for those performance spikes moments
  • Maximum and Burst MB/sec
The combination of QOS and storage capacity efficiency gives SolidFire the edge when cloud providers can scale both performance and capacity in a more balanced manner, something that is not so simple with traditional storage vendors that relies on lots of spindles to achieve IOPS performance sacrificing capacity in the process. But then again, with SSDs, the IOPS are plenty (for now). SolidFire does not boast performance numbers of millions of IOPS or having throughput into the tens of Gigabytes like Violin, Virident or Kaminario, but what they want to be recognized as the cloud storage as it should be in a cloud service provider environment.
SolidFire calls this Performance Virtualization. Just as we would get to carve our storage volumes from a capacity pool, SolidFire allows different performance profiles to be carved out from the performance pool. This gives SolidFire the ability to mix storage capacity and storage performance in a seemingly independent manner, customizing the type of storage bundling required of cloud storage.
In fact, SolidFire only claims 50,000 IOPS per storage node (including the IOPS means for replicating data blocks). Together with their native multi-tenancy capability, the 50,000 or so IOPS will align well with many virtualized applications, rather than focusing on a 10x performance improvement on a single applications. Their approach is more about a more balanced and spread-out I/O architecture for cloud service providers and the applications that they service.
Their management is also targeted to the cloud. It has a REST API that integrates easily into OpenStack, Citrix CloudStack and VMware vCloud Director. This seamless and easy integration, is more relevant because the CSPs already have their own management tools. That is why SolidFire API is a REST-ready, integration ready to do just that.
The power of the SolidFire API is probably overlooked by storage professionals trained in the traditional manner. But what SolidFire API has done is to provide the full (I mean FULL) capability of the management and provisioning of the SolidFire storage. Fronting the API with REST means that it is real easy to integrate with existing CSP management interface.

Together with the Storage Nodes and the Element OS, the whole package is aimed towards a more significant storage platform for Cloud Service Providers(CSPs). Storage has always been a tricky component in Cloud Computing (despite what all the storage vendors might claim), but SolidFire touts that their solution focuses on what matters most for CSPs.

CSPs would want to maximize their investment without losing their edge in the cloud offerings to their customers. SolidFire lists their benefits in these 3 areas:

  • Performance
  • Efficiency
  • Management

The edge in cloud storage is definitely solid for SolidFire. Their ability to leverage on their position and steering away from other all-Flash vendors’ battlezone could all make sense, as they aim to gain market share in the Cloud Service Provider space. I only wish they can share more about their technology online.

Fortunately, I found a video by SolidFire’s CEO, Dave Wright which gives a great insight about SolidFire’s technology. Have a look (it’s almost 2 hour long):

[2 hours later]: Phew, I just finished the video above and the technology is solid. Just to summarize,

  • No RAID (which is a Godsend for service providers)
  • Aiming for USD5.00 or less per Gigabyte (a good number!)
  • General availability in Q1 2012

Lots of confidence about the superiority of their technology, as portrayed by their CEO, Dave Wright.

Solid? Yes, Solid!

Amazon makes it easy

I like the way Amazon is building their Cloud Computing services. Amazon Web Services (AWS) is certainly on track to become the most powerful Cloud Computing company in the world. In fact, AWS might already is.  But they are certainly not resting on their laurels when they launched 2 new services in as many weeks – Amazon DynamoDB (last week) and Amazon Storage Gateway (this week).

I am particularly interested in the Amazon Storage Gateway, because it is addressing one of the biggest fears of Cloud Computing head-on. A lot of large corporations are still adamant to keep their data on-premise where it is private and secure. Many large corporations are still very skeptical about it even though Cloud Computing is changing the IT landscape in a massive way. The barrier to entry for large corporations is not something easy, but Amazon is adapting to get more IT divisions and departments to try out Cloud Computing in a less disruptive way.

The new service, is really about data storage and data backup for large corporations. This is important because large corporations have plenty of requirements for data storage and data to be backed up. And as we know, a large portion of the data stored does not need to be transactional or to be accessed frequently. This set of data is usually less frequently used, for archiving or regulatory compliance reasons, particular in the banking and healthcare industry.

In the data backup operations, the reason data is backed up is to provide a data recovery mechanism when a disaster strikes. Large corporations back up tons of data every day, weeks or month and this data only has value when there is a situation that requires data relevance, data immediacy or data recovery. Otherwise, it is just plenty of data taking up storage space, be it on disk or on tape.

Both data storage and data backup cost a lot of money, both CAPEX and OPEX. In CAPEX, you are constantly pressured to buy more storage to store the ever growing data. This leads to greater management and administration costs, both contributing heavily into OPEX costs. And I have not included the OPEX costs of floor space, power and cooling, people (training, salary, time and so on) typically adding up to 3-5x the operations costs relative to the capital investments. Such a model of IT operations related to storage cannot continue forever, and storage in the Cloud offers an alternative.

These 2 scenarios – data storage and data backup – are exactly the type of market AWS is targeting. In order to simplify and pacify large corporations, AWS introduced the Amazon Storage Gateway, that eases the large corporations to take some of their IT storage operations to the Cloud in the form of Amazon S3.

The video below shows the Amazon Storage Gateway:

The Amazon Storage Gateway is a piece of software “appliance” that is installed on-premise in the large corporation’s data center. It seamlessly integrates into the LAN and provides a SSL (Secure Socket Layer) connection to the Amazon S3. The data being transferred to the S3 is also encrypted with AES (Advanced Encryption Standard) 256-bit. Both SSL and AES-256 can give customers a sense of security and AWS claims that the implementation meets the data storage and data recovery standards used in the banking and healthcare industries.

The data storage and backup service regularly protects the customer’s data in snapshots, and giving the customer a rapid recovery platform should the customer experienced on-premise data corruption or data disruption. At the same time, the snapshot copies in the Amazon S3 can also be uploaded into Amazon EBS (Elastic Block Store) and testing or development environments can be evaluated and testing with Amazon EC2 (Elastic Compute Cloud). The simplicity of sharing and combining different Amazon services will no doubt, give customers a peace of mind, easing their adoption of Cloud Computing with AWS.

This new service starts with a 60-day free trial and moving on to a USD$125.00 (about Malaysian Ringgit $400.00) per gateway per month subscription fee. The data storage (inclusive of the backup service), costs only 14 cents per gigabyte per month. For 1TB of data, that is approximately MYR$450 per month. Therefore, minus the initial setup costs, that comes to a total of MYR$850 per month, slightly over MYR$10,000 per year.

At this point, I like to relate an experience I had a year ago when implementing a so-called private cloud for an oil-and-gas customers in KL. They were using the HP EVS (Electronic Vaulting Service) to an undisclosed HP data center hosting site in the Klang Valley. The HP EVS, which was an OEM of Asigra, was not an easy solution to implement but what was more perplexing was the fact that the customer had a poor understanding of what would be the objectives and their 5-year plan in keeping with the data protected.

When the first 3-4TB data storage and backup were almost used up, the customer asked for a quotation for an additional 1TB of the EVS solution. The subscription for 1TB was MYR$70,000 per year. That is 7x time more than the AWS MYR$10,000 per year cost! I have to salute the HP sales rep. It must have been a damn good convincing sell!

In the long run, the customer could be better off running their storage and backup on-premise with their HP EVA4400 and adding an additional of 1TB (and hiring another IT administrator) would have cost a whole lot less.

Amazon Web Services has already operating in Singapore for the past 2 years, and I am sure they are eyeing Malaysia as their regional market. Unless and until Malaysian companies offering Cloud Services know to use economies-of-scale to capitalize the Cloud Computing market, AWS is always going to be a big threat to CSP companies in Malaysia and a boon of any companies seeking cloud computing services anywhere in the world.

I urge customers in Malaysia to start questioning their so-called Cloud Service Providers if they can do what AWS is doing. I have low confidence of what the most local “cloud computing” companies can deliver right now. I hope they stop window dressing their service offerings and start giving real cloud computing services to customers. And for customers, you must continue to research and find out more which cloud services meet your business objectives. Don’t be flashed by the fancy jargons or technical idealism thrown at you. Always, always find out more because your business cost is at stake. Don’t be like the customer who paid MYR$70,000 for 1TB per year.

AWS is always innovating and the Amazon Storage Gateway is just another easy-to-adopt step in their quest for world domination.

Joy(ent) to the World

When someone as important and as prominent as Jason Hoffman reads and follows your blog, you tend to stand up and take notice. I found out last week that Jason Hoffman, Founder and CTO of Joyent, was doing just that, I was deeply honoured and elated.

My Asian values started kicking in and I felt that I should reciprocate his gracious visits with a piece on Joyent. I have known about Joyent, thanks to Bryan Cantrill as the VP of Engineering because I am bloody impressed with his work with DTrace. And I have followed Joyent’s announcements every now and then, even recommending a job that was posted on Joyent’s website for a Service Delivery Manager in Asia Pacific for my buddy a couple of months ago. He’s one of the best Solaris engineers I have ever worked with but the problem with techies is, they tend to wait for everything to fall into place before they do the next thing. Too methodical!

I took some time over the weekend to understand a bit more about Joyent and their solution offerings. They are doing some mighty cool stuff and if you are Unix/Linux buff/bigot like me, you would be damn impressed. For those people who has experienced Unix and especially Solaris, there is an unexplained element that describes the fire and the passion of such a techie. I was feeling all the good vibes all over again.

Unfortunately, Joyent is not well known in this part of the world but I am well aware of their partnership with a local company called XyBase in an announcement in June last year. Xybase, through its vehicle called Anise Asia, entered into the partnership to resell Joyent’s SmartCenter solution. For those who has worked with XyBase in Malaysia, let’s not go there. 😉

Enough chitter-chatter! What’s Joyent about?

Well, for Malaysian IT followers, we are practically drowned in VMware. VMware does a seminar every 1.5 months or so, and they get invited to other vendors’ events ever so frequently as well. My buddy, Mr. Ong Kok Leong, who was an early employee in VMware Malaysia, has been elevated to superstardom, thanks to his presence in everything VMware. It’s a good thing and kudos to VMware to take advantage of their first-to-market, super gung-ho approach in the last 3 years or so. They have built a sizable lead in the local market and the competitors like Citrix Xen, Microsoft Hyper-V are being left in a dust. I believe only RedHat’s KVM is making a bit of a dent but they are primarily confined to their own RedHat space. Furthermore, most of VMware competitors do not have a strong portfolio and a complete software stack to challenge VMware and what they have been churning out.

Here’s my take … consider Joyent because I see Joyent having a very, very strong portfolio to give VMware a run for its money. Public listed VMware has deep pockets to continue their marketing blitz and because of where they are right now, they have gotten very pricey and complicated. And this blogger intends to level the playing field a bit by sharing more about Joyent and their solutions.

I see Joyent having 4 very strong technologies that differentiates them from others. These technologies (in no particular order) are:

  • node.js
  • ZFS
  • DTrace
  • KVM

These technologies have been proven in the field because Joyent has been deploying, stress testing them and improving on them in their own cloud offering called Joyent Cloud for the last few year. This is true “eating your own dogfood” and putting your money where your mouth is. This is a very important considering when building a Cloud Computing offering, especially in the public cloud space. You need something that is proven and Joyent Cloud is testimonial to Joyent’s technology.

So let’s start with a diagram of the Joyent Cloud Software Stack.

Key to the performance of Joyent Cloud is node.js.

node.js as quoted in its website is “Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.” The key to this is being event-driven and asynchronous and cloud solutions developed using node.js are able to go faster, scale bigger and respond better. The event-based model follows a programming approach in which the flow of the program is determined by events that occurred.

A simple analogy is when you (in Malaysia) is at McDonald’s. In the past, the McDonald’s staff will service and fulfill your order before they service the next customer and so on. That was the flow of the past. Some time last year, McDonalds’ decide that their front staff would take your order, sends you to a queue and then took the order of the next customer. The back-end support staff would then fulfill your order putting that burger and drink on your tray. That is why they are able to serve (take your money) faster and get more things done. This is what I understand about event-driven, when it is applied in a programming content.

node.js has been touted as the new “Ruby-on-Rails” and it is all about low-latency, and concurrency in applications, especially cloud applications. Here’s a video introducing node.js, by Joyent’s very own Ryan Dahl, the creator of node.js.

Besides performance, you would also need a strong and robust file system to ensure security, data integrity and protection of data as it scales. ZFS is a 128-bit, enterprise file system that was developed in Sun more than 10 years ago, and I am a big admirer of the ZFS technology. I have written about ZFS in the past, comparing it with NetApp’s Data ONTAP and also written about ZFS self-healing properties in dealing with Silent Data Corruption. In fact, my buddy (him being the more technical one) and I have been developing storage solutions with ZFS.

Cloud Computing is complex and you have to know what’s happening in the Cloud. For the Cloud Service Provider, they must know the real-time behaviour of the cloud properties. It could be for performance, resource consumption and contention, bottlenecks, applications characteristics, and even for finding the problems as quickly as possible. For the customers, they must have the ability to monitor, understand and report what they are consuming and using in the Cloud.

The regular used buzzword is Analytics and DTrace is the framework developed for Cloud Analytics. When it comes to analytics, nothing comes close to what DTrace can do. Most vendors (including VMware) will provide APIs for 3rd party ISVs to develop cloud analytics but nothing beats having the creator of the cloud technology given you the tools that they use internally. That is what Joyent is giving to the customer, DTrace, a tool that they use themselves internally. Here’s a screenshot of DTrace in action for Joyent’s SmartDataCenter.

I have always said that you got to see  it to know it. Cloud visibility is crucial for the optimal operational efficiency of the cloud.

Joyent already has Solaris Zones technology in its offering. But the missing piece was bare metal hypervisor and last year, Joyent added the final piece. KVM (Kernel-based Virtualization) was ported to Joyent, and KVM is more secure, and faster than the traditional approach of VMware, which relies on binary translation. KVM would mean that the virtualization kernel has direct interaction and communication with the native  x86 virtualization on processors that supports hardware virtualization extension. There is a whole religious debate about native, paravirtualization and binary translation on the web. You can read one here, and as I said, KVM is native virtualization.

There are lots more to know about Joyent but you got to spend some time to learn about it. It is not well known (yet) in this part of the world, my intention in this blog entry is to disseminate information so that you readers don’t have to be droned into one thing only.

There are choices and in the virtualization space, it is just not always about VMware. VMware deserves to be where they are but when one comes into power (like VMware), he/she tends to become less friendly to work it. A customer should not be subjected to this new order of oppression because businesses are there when there are customers. And as customers, they are always choices and Joyent is one good choice.

Primary Dedupe where are you?

I am a bit surprised that primary storage deduplication has not taken off in a big way, unlike the times when the buzz of deduplication first came into being about 4 years ago.

When the first deduplication solutions first came out, it was particularly aimed at the backup data space. It is now more popularly known as secondary data deduplication, the technology has reduced the inefficiencies of backup and helped sparked the frenzy of adulation of companies like Data Domain, Exagrid, Sepaton and Quantum a few years ago. The software vendors were not left out either. Symantec, Commvault, and everyone else in town had data deduplication for backup and archiving.

It was no surprise that EMC battled NetApp and finally won the rights to acquire Data Domain for USD$2.4 billion in 2009. Today, in my opinion, the landscape of secondary data deduplication has pretty much settled and matured. Practically everyone has some sort of secondary data deduplication technology or solution in place.

But then the talk of primary data deduplication hardly cause a ripple when compared a few years ago, especially here in Malaysia. Yeah, the IT crowd is pretty fickle that way because most tend to follow the trend of the moment. Last year was Cloud Computing and now the big buzz word is Big Data.

We are here to look at technologies to solve problems, folks, and primary data deduplication technology solutions should be considered in any IT planning. And it is our job as storage networking professionals to continue to advise customers about what is relevant to their business and addressing their pain points.

I get a bit cheesed off that companies like EMC, or HDS continue to spend their marketing dollars on hyping the trends of the moment rather than using some of their funds to promote good technologies such as primary data deduplication that solve real life problems. The same goes for most IT magazines, publications and other communications mediums, rarely giving space to technologies that solves problems on the ground, and just harping on hypes, fuzz and buzz. It gets a bit too ordinary (and mundane) when they are trying too hard to be extraordinary because everyone is basically talking about the same freaking thing at the same time, over and over again. (Hmmm … I think I am speaking off topic now .. I better shut up!)

We are facing an avalanche of data. The other day, the CEO of Nexenta used the word “data tsunami” but whatever terms used do not matter. There is too much data. Secondary data deduplication solved one part of the problem and now it’s time to talk about the other part, which is data in primary storage, hence primary data deduplication.

What is out there?  Who’s doing what in term of primary data deduplication?

NetApp has their A-SIS (now NetApp Dedupe) for years and they are good in my books. They talk to customers about the benefits of deduplication on their FAS filers. (Side note: I am seeing more benefits of using data compression in primary storage but I am not going to there in this entry). EMC has primary data deduplication in their Celerra years ago but they hardly talk much about it. It’s on their VNX as well but again, nobody in EMC ever speak about their primary deduplication feature.

I have always loved Ocarina Networks ECO technology and Dell don’t give much hoot about Ocarina since the acquisition in  2010. The technology surfaced a few months ago in Dell DX6000G Storage Compression Node for its Object Storage Platform, but then again, all Dell talks about is their Fluid Data Architecture from the Compellent division. Hey Dell, you guys are so one-dimensional! Ocarina is a wonderful gem in their jewel case, and yet all their storage guys talk about are Compellent  and EqualLogic.

Moving on … I ought to knock Oracle on the head too. ZFS has great data deduplication technology that is meant for primary data and a couple of years back, Greenbytes took that and made a solution out of it. I don’t follow what Greenbytes is doing nowadays but I do hope that the big wave of primary data deduplication will rise for companies such as Greenbytes to take off in a big way. No thanks to Oracle for ignoring another gem in ZFS and wasting their resources on pre-sales (in Malaysia) and partners (in Malaysia) that hardly know much about the immense power of ZFS.

But an unexpected source coming from Microsoft could help trigger greater interest in primary data deduplication. I have just read that the next version of Windows Server OS will have primary data deduplication integrated into NTFS. The feature will be available in Windows 8 and the architectural view is shown below:

The primary data deduplication in NTFS will be a feature add-on for Windows Server users. It is implemented as a filter driver on a per volume basis, with each volume a complete, self describing unit. It is cluster aware, and fully crash consistent on all operations.

The technology is Microsoft’s own technology, built from scratch and will be working to position Hyper-V as an strong enterprise choice in its battle for the server virtualization space with VMware. Mind you, VMware already has a big, big lead and this is just something that Microsoft must do-or-die to keep Hyper-V playing catch-up. Otherwise, the gap between Microsoft and VMware in the server virtualization space will be even greater.

I don’t have the full details of this but I read that the NTFS primary deduplication chunk sizes will be between 32KB to 128KB and it will be post-processing.

With Microsoft introducing their technology soon, I hope primary data deduplication will get some deserving accolades because I think most companies are really not doing justice to the great technologies that they have in their jewel cases. And I hope Microsoft, with all its marketing savviness and adeptness, will do some justice to a technology that solves real life’s data problems.

I bid you good luck – Primary Data Deduplication! You deserved better.

The definition of Cloud Computing … really

Happy New Year! I am looking forward to the year of 2012.

Lately, I have been involved in Cloud Computing forums and I have been reading articles on Cloud Computing. I even took up a 5-day course on Cloud Computing in order to prepare myself for the inevitable. Yes, Cloud Computing is here to stay, but we joke about it, don’t we? I think the fun word of Cloud Computing is “cloudy“, which is indeed very true.

As I ingest more and more information about Cloud Computing, the definition of how different people has different perspective or opinion about Cloud Computing has never been “cloudier“. It is fuzzy, hazy, and confusing. And in the forums, many were saying that virtualization is Cloud Computing. What do you think?

I found that one definition of Cloud Computing very definitive, yet simple. This definition comes from the National Institute of Standards and Technology (NIST) of the US Department of Commerce. In its publication #800145, NIST defines Cloud Computing to have the following 5 essential characteristics (duplicated in verbatim):

  • On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.
  • Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).
  • Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth.
  • Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.
  • Measured service. Cloud systems automatically control and optimize resource use by leveraging a metering capability1 at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.

The 5 essential characteristics are very important in determining whether Virtualization = Cloud Computing, and I know there are a lot of people out there that says that Virtualization equates Cloud Computing. Let’s see the table below:

Some readers might argue about the “YES” or “NO” in the above comparison, but I do not want to dwell on the matter. Yes, I believe that many of these things are doable in their own right but with different level of complexity and costs. The objective is to settle the arguments and confusions of Cloud Computing, accept some definitive terms and move on.

As you can see from the table above, Virtualization does not equate to Cloud Computing. We can say that Virtualization enables Cloud Computing to happen. It is the pre-cursor to Cloud Computing.

In Cloud Computing, there are different Service Models. NIST defines 3 different Service Models. They are:

  • Software as a Service (SaaS). The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure2. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
  • Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider.3 The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.
  • Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).

And NIST went on to define the Deployment Models of Cloud Computing as listed below:

  • Private cloud. The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.
  • Community cloud. The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.
  • Public cloud. The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.
  • Hybrid cloud. The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

There! Cloud Computing by the definition of NIST. It is simple, easily understood and most importantly, it give us the context of what we are looking for in the sea of confusion. Here’s the link to NIST’s PDF.

We can argue till the cows come home but it is best to stick to a simple definition of Cloud Computing and focus on other more important aspects of the cloud.

I hope to share more of my Cloud Computing experience with you and storage will have a big part to play in it.

Is there IOPS for Cloud Storage? – Nasuni style

I was in Singapore last week attending the Cloud Infrastructure Services course.

In the class, one of the foundation components of Cloud Computing is of course, storage. As the students and the instructor talked about Storage, one very interesting argument surfaced. It revolved around the storage, if it was offered on the cloud. A lot of people assumed that Cloud Storage would be for their databases, and their virtual machines, which of course, is true when the communication between the applications, virtual machines and databases are in the local area network of the Cloud Service Provider (CSP).

However, if the storage is offered through the cloud to applications that are sitting on-premise in the customer’s server room, then we have to think twice of how we perceive Cloud Storage. In this aspect, the Cloud Storage offered by the CSP is a Infrastructure-as-a-Service (IaaS), where the key service is Storage. We have to differentiate that this Storage functions as a data container, and usually not for I/O performance reasons.

Though this concept probably will be easily understood by storage professionals like us, this can cause a bit confusion for someone new to the concept of Cloud Computing and Cloud Storage. This confusion, unfortunately, is caused by many of us who are vendors or solution providers, or even publications and magazines. We are responsible to disseminate correct information to customers, but due to our lack of knowledge and experience in this extremely new market of Cloud Storage, we have created the FUDs (Fear, Uncertainty and Doubt) and hype.

Therefore, it is the duty of this blogger to clear the vapourware, and hopefully pass on the right information to accelerate  the adoption of Cloud Storage in the near future. At this moment, given the various factors such as network costs, high network latency and lack of key network technologies similar to LAN in Cloud Computing, Cloud Storage is, most of the time, for data storage containership and archiving only. And there are no IOPS or any performance related statistics related to Cloud Storage. If any engineer or vendor tells you that they have the fastest Cloud Storage in the industry, do me a favour. Give him/her a knock on the head for me!

Of course, as technologies evolve, this could change in the near future. For now, Cloud Storage is a container, NOT a high performance storage in the cloud. It is usually not meant for transactional data. There are many vendors in the Cloud Storage space from real CSPs to storage companies offering re-packaged storage boxes that are “cloud-ready”. A good example of a CSP offering Cloud Storage is Amazon S3 (Simple Storage Service). And storage vendors such as EMC and HDS are repackaging and rebranding their storage technologies as object storage, ready for the cloud. EMC Atmos is really a repackaged and rebranded Centera, with some slight modifications, while HDS , using their Archiving solution, has HCP (aka HCAP). There’s nothing wrong with what EMC and HDS have done, but before the overhyping of the world of Cloud Computing, these platforms were meant for immutable data archiving reasons. Just thought you should know.

One particular company that captured my imagination and addresses the storage performance portion is Nasuni. Of course, they are quite inventive with the Cloud Storage Gateway approach. Nasuni comes up with a Cloud Storage Gateway filer appliance, which can be either a physical 1U server or as a VMware or Hyper-V virtual appliance sitting on-premise at the customer’s site.

The key to this is “on-premise”, which allows access to data much faster because they are locally-cached in the Nasuni filer appliance itself. This Nasuni filer piece addresses the Cloud Storage “performance” piece but Nasuni do not claim any performance statistics with such implementation. The clever bit is that this addresses data or files that are transactional in nature, i.e. NFS or CIFS, to serve data or files “locally”. (I wonder if Nasuni filer has iSCSI as well. Hmmmm….)

In the Nasuni architecture, they “break up” their “Cloud Storage” into 2 pieces. Piece #1 sits on-premise, at the customer site, and acts as a bridge to the Piece #2, that is sitting in a Cloud Storage. From a simplified view, have a look at the diagram below:

 

 

Piece #1 is the component that handles some of the transactional traffic related to files. In a more technical diagram below, you can see that the Nasuni filer addresses the file sharing portion, using the local disks on the filer appliance as a local caching mechanism.

Furthermore, older file pieces are whiffed away to the any Cloud Storage using the Cloud Connector interface, hence giving the customer a sense that their storage capacity needs can be limitless if they want to (for a fee, of course). At the same time, the Nasuni filer support thin provisioning and snapshots. How cool is that!

The Cloud Storage piece (Piece #2) is used for the data container and archiving reasons. This component can be sitting and hosted at Amazon S3, Microsoft Azure, Rackspace Cloud Files, Nirvanix Storage Delivery Network and Iron Mountain Archive Services Platform.

The data communication and transfer between the Nasuni filer is secure, encrypted, deduplication and compressed, giving it the efficiency and security that most customers would be concerned about. The diagram below explains the dat communication and data transfer bit.

In this manner, the Nasuni filer can replace traditional NAS platforms and can potentially provide a much lower total cost of ownership (TCO) in the long run. Nasuni does not pretend to be a NAS replacement. To me, this concept is very inventive and could potentially change the way we perceive file sharing and file server, obscuring and blurring concept of NAS.

Again, I would like to reiterate that Nasuni does not attempt to say their solution is a NAS or a performance-based Cloud Storage but what they have cleverly packaged seems to be appealing to customers. Their customer base has grown 78% in Q2 of 2011. It’s just too bad they are not here in Malaysia or this part of the world (yet).

IOPS in Cloud Storage? Not yet.

 

Cloud Computing and it’s not iCloud

Steve Jobs was great with what he has done, but when it comes to Cloud Computing, Jeff Bezos of Amazon is the one. And I believe the Amazon Web Services (AWS) is bigger than Apple’s iCloud, in this present time and the future. Why do I say that knowing that the Apple fan boys could be using me as target practice? Because I believe what Amazon is doing is the future of Cloud Computing. Jeff Bezos is a true visionary.

One thing we have to note is that we play different roles when it comes to Cloud Computing. There are Cloud Service Providers (CSP) and there are enterprise subscribers. On a personal level, there are CSPs that cater for consumer-level type of services and there are subscribers of this kind as well. The diagram below shows the needs from an enterprise perspective, for both providers and subscribers.

Also we recognize Amazon from a less enterprise perspective, and they are probably better known for their engagement at the consumer level. But what Amazon is brewing could already be what Cloud Computing should be and I don’t think Apple iCloud is quite there yet.

Amazon Web Services cater for the enterprise and the IT crowd, providing both Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS) through its delectable offerings of the

  • Elastic Compute Cloud (EC2)
  • SimpleDB
  • Simple Storage Service (S3)
  • Elastic Block Store (EBS)
  • Elastic Beanstalk
  • CloudFormation
  • many more
And AWS has been operational and serving enterprise customers for 5-6 years now. Netflix, Zynga, Farmville are some of AWS customers.  This is something Apple iCloud do not have, a Cloud Computing ecosystems for enterprise customers. Apple iCloud do not offer PaaS or IaaS. Perhaps that’s Apple vision not to get into the enterprise, but eventually the world evolve around businesses and businesses are adopting Cloud Computing. Many readers may disagree with what I say now in this paragraph but I will share with you later that even at the consumer level, Amazon is putting right moves in place, probably more so than Apple’s vision. (more about this later).

But the recent announcement of Kindle Fire, their USD$199 Android-based gadget, was to me, the final piece to Amazon’s Phase I jigsaw – the move to conquer the Cloud Computing space. I read somewhere that USD$199 Kindle Fire actually costs about USD$201.XX to manufacture. Apple’s iPad costs USD$499. So Amazon is making a loss for each gadget they sell. So what! It’s no big deal.

Let me share with you this table that will rattle your thinking a little bit. Remember this: Cloud Computing is defined as a “utility”. Cloud Computing is about services, content. 

The table was taken from a recent Wired Magazine article. It featured the interview with Jeff Bezos. Go check out the interview. It’s very refreshing and humbling.

I hope the table is convincing you enough to say that the device or the gadget doesn’t matter. Yes, Apple and Amazon have different visions when it comes to Cloud Computing, but if you take some time to analyze the comparison, Amazon does not lock you into buying expensive (but very good) hardware, unlike Apple.

Take for instance the last point. Apple promotes downloaded media while Amazon uses streamed media. If you think about it, that what Cloud Computing should be because the services and the contents are utility. Amazon is providing services and content as a utility. Apple’s thinking is more old-school, still very much the PC-era type of mentality. You have to download the applications onto your gadget before you can use it.

Even the Amazon Silk browser concept is more revolutionary that Apple’s Safari. The Silk browser splits some of the processing in the Amazon Cloud, taking advantage of the power of the Amazon Cloud to do the processing for the user. Here’s a little video about Amazon Silk browser.

The Apple Safari is still very PC-centric, where most of the Web content has to be downloaded onto the browser to be viewed and processed. No doubt the Amazon Silk also download contents, but some of the processing such as read-ahead, applet-processing functions have been moved to Amazon Cloud. That’s changing our paradigm. That’s Cloud Computing. And iCloud does not have anything like that yet.

Someone once told me that Cloud is about economics. How incredibly true! It is about having the lowest costs to both providers and consumers. It’s about bringing a motherload of contents  that can be delivered to you on the network. Amazon has tons of digital books, music, movies, TV and computing power to sell to you. And they are doing it at a responsible pace, with low margins. With low margins, the barrier of entry is lower, which in turn accelerates the Cloud Computing adoption. And Amazon is very good at that. Heck, they are selling their Kindle Fire at a loss.

Jeff Bezos has stressed that what they are doing is long term, much longer term than most. To me, Jeff Bezos is the better visionary of Cloud Computing. I am sorry but the reality is Steve Jobs wants high margins from the gadgets they sell to you. That is Apple’s vision for you.

 Photo courtesy of Wired magazine.

Whitewashing Cloudsh*t

Pardon my French but I just had about enough of it!

I was invited to attend the Internet Alliance Association‘s event today at OneWorld Hotel. It was aptly titled “Global Trends on Cloud Technology”. I don’t know much about the Internet Alliance but I was intrigued by the event because I wanted to know what the Malaysian hosting and service providers are doing on the cloud. I was not in touch with the hosting providers landscape for a few years now, so I was like an eager-beaver, raring to learn more.

After registration, I quickly went to the first booth behind the front counter. He said he was a cloud consultant, so I asked what his company does. He said they provide IaaS, PaaS and so on. I asked him if I could purchase IaaS with a credit card and what was the turnaround time to get a normal server with Windows 2008 running.

He obliged with a yes. They accept credit card purchases. But the turnaround to have the virtual server ready is 1 day. It would take 24-hours before I get a virtual server running Windows. So, I assumed the entire process was manual and I told him that. He assured me that the whole process is automatic. At the back of my mind, if this was automatic, will it take 24-hours? Reality set in when I realized I am dealing with a Malaysian company. Ah, I see.

A few more sentences were exchanged. He told me that they are hosted at AIMS, a popular choice. I inquired about their Disaster Recovery. They don’t have a disaster recovery. More perplexity for me. Hmmm …

In the end, I was kinda turned off by his “story” about how great they are, better than Freenet and AIMS and so on. If they are better than AIMS, why host their cloud at AIMS?

I went to another booth which had a sign call “1-Nimbus”. The number “1” is the usual 1-Malaysia Logo with the word “Nimbus” next to it. Here’s that “1” logo below.

It was the word “Nimbus” that capture my attention. I thought, “Wow, is this really Nimbus?” Apparently not. Probably some Malaysian company borrowed that name .. we are smart that way. “1-Nimbus, Cloud Backup”, it read. I asked the chap (another consultant), who gave me the brochure, “How does it work?” “Does it require any agent?”

“Err, actually, I am not really technical. Let me refer you to my colleague”. A bespectacled chap popped over and introduced himself as a technical guy. I asked again, “How does this cloud backup work?”. His reply … “Err, it’s not really our product. Go check out the website”, and gave me another brochure.  Damn!

From then on, there were more excuses as I kept repeating the same questions from one booth to another – tell me what you do in the cloud? Right now, I decided to do a pie chart of how I assessed the exhibition lobby floor.

I went on. There were about 15 booths. With exception of Falconstor, only one booth managed to tell me some decent stuff. They were KumoWorks and the guy spoke well about their Cloud Desktop with Citrix and iGel thin client. And they are from Singapore. It figures!

I cannot but to feel nauseated by most of the booths at the OneWorld Hotel exhibition lobby. If this is the state our “Cloud Service Providers”, I think we are in deep sh*t. Whitewashing aside and over using the word “Cloud” everywhere is one thing. These guys don’t even know what they are talking about. It is about time we admit that the Singaporeans are better than us. Even they might not know their stuff well, at least they know how to package the whole thing and BS to me intelligently!

And I learned a new “as-a-Service” today. One cloud consultant introduced me to “Application-as-a-Service”. I was so tempted to call it “Ass“.

NetApp to buy Commvault?

The rumour mill is going again that Commvault is an acquisition target, and this time, NetApp. The rumour is not new but someone Commvault has gotten too big in the past couple of years to be swallowed up. But this time, it could happen as NetApp is hungry, …. very hungry.

NetApp took a big hit a couple of weeks back, when it announced its Q3 numbers. Revenues fell short of analysts expectations and the share price took a big hit. While its big rival, EMC, has been gaining much momentum on all fronts, it appears that NetApp is getting overwhelmed by the one-stop-shop of EMC. EMC is everything to everyone who wants storage, data protection software, services, data management, scale-out, data security, big data, cloud storage and virtualization and much more. NetApp, has been very focused on what they do best, and that is storage. Everything evolves around their crown jewel, Data ONTAP and recently added Engenio to their stable of storage solutions.

NetApp does not mix the FAS storage with the Engenio and making sure that their story-telling gels but in the past few years, many other vendors are taking the “one-stack-fits-all” approach. Oracle have Exadata, where servers, storage, database and networking in all-in-one. Many others are doing the same, while NetApp prefers a more “loose-coupled” partnerships, such as their “Imagine Virtually Anything” concept partnership with VMware and Cisco, in the shape of FlexPod. FlexPod is a flexible infrastructure package comprising presized storage, networking and server components designed to ease the IT transformation journey–from virtualization all the way to cloud computing.

Commvault would be a great buy (going to be very expensive buy) for NetApp. Things fits perfectly if NetApp decides to abandon its overly protective shield and start becoming a “one-stop-shop” to its customers, starting with data protection. Commvault is already the market leader in the Enterprise Disk-based Backup and Recovery market, and well reflected in Gartner’s Magic Quadrant January 2011 report.

It’s amazing to see how Commvault got to become the leader in this space in just a few short years, and part of its unique approach is providing a common core engine called the Common Technology Engine (CTE). The singular core architecture allows different data management components – Backup, Replication, Archiving, Resource Management and Classification & Search – to share resource and more importantly detailed knowledge of true data management.

In the middle of this year, NetApp had an OEM deal with Commvault to resell their SnapProtect solution, which integrates with NetApp’s SnapMirror solution. The SnapProtect manages NetApp snapshots and SnapMirror replications and also enhances the solution as a tape-out for SnapMirror. Below shows how the Commvault SnapProtect fits into NetApp’s snapshots and SnapMirror data protection architecture.

Sources of NetApp’s C-Level said that NetApp is still very much focused on their ONTAP strategy and with their “loosely-coupled” partnerships with key partners like VMware, Cisco, F5 and Quantum. But at the back of NetApp’s mind, I believe, it is time to do something about it. This “focused” (also could be interpreted an overly cautious) approach is probably seeing the last leg of its phase as cloud computing is changing all that. The cost of integration of different, yet flexible components of storage, data protection and data management components, is prohibitive to cloud service providers and NetApp must take a bolder approach to win the hearts of these providers. Having a one-stop-shop isn’t so bad anymore; it is beginning to make sense and NetApp had better do something quick. Commvault is one of the best out there and NetApp shouldn’t lose that chance.

Note: While the rumours of NetApp and Commvault are swirling, there’s been rumours that Quantum could be another NetApp target. 

A cloud economy emerges … somewhat

A few hours ago, Rackspace had just announced the first “productized” Rackspace Private Cloud solution based on OpenStack. According to Openstack.org,

OpenStack OpenStack is a global collaboration of developers and cloud computing 
technologists producing the ubiquitous open source cloud computing platform for 
public and private clouds. The project aims to deliver solutions for all types of 
clouds by being simple to implement, massively scalable, and feature rich. 
The technology consists of a series of interrelated projects delivering various 
components for a cloud infrastructure solution.

Founded by Rackspace Hosting and NASA, OpenStack has grown to be a global software 
community of developers collaborating on a standard and massively scalable open 
source cloud operating system. Our mission is to enable any organization to create 
and offer cloud computing services running on standard hardware. 
Corporations, service providers, VARS, SMBs, researchers, and global data centers 
looking to deploy large-scale cloud deployments for private or public clouds 
leveraging the support and resulting technology of a global open source community.
All of the code for OpenStack is freely available under the Apache 2.0 license. 
Anyone can run it, build on it, or submit changes back to the project. We strongly 
believe that an open development model is the only way to foster badly-needed cloud 
standards, remove the fear of proprietary lock-in for cloud customers, and create a 
large ecosystem that spans cloud providers.

And Openstack just turned 1 year old.

So, what’s this Rackspace private cloud about?

In the existing cloud economy, customers subscribe from a cloud service provider. The customer pays a monthly (usually) subscription fee in a pay-as-you-use-model. And I have courageously predicted that the new cloud economy will drive the middle tier (i.e. IT distributors, resellers and system integrators) in my previous blog out of IT ecosystem. Before I lose the plot, Rackspace is now providing the ability for customers to install an Openstack-ready, Rackspace-approved private cloud architecture in their own datacenter, not in Rackspace Hosting.

This represents a tectonic shift in the cloud economy, putting the control and power back into the customers’ hands. For too long, there were questions about data integrity, security, control, cloud service provider lock-in and so on but with the new Rackspace offering, customers can build their own private cloud ecosystem or they can get professional service from Rackspace cloud systems integrators. Furthermore, once they have built their private cloud, they can either manage it themselves or get Rackspace to manage it for them.

How does Rackspace do it?

From their vast experience in building Openstack clouds, Rackspace Cloud Builders have created a free reference architecture.  Currently OpenStack focuses on two key components: OpenStack Compute, which offers computing power through virtual machine and network management, and OpenStack Object Storage, which is software for redundant, scalable object storage capacity.

In the Openstack architecture, there are 3 major components – Compute, Storage and Images.

 

More information about the Openstack Architecture here. And with 130 partners in the Openstack alliance (which includes Dell, HP, Cisco, Citrix and EMC), customers have plenty to choose from, and lessening the impact of lock-in.

What does this represent to storage professionals like us?

This Rackspace offering is game changing and could perhaps spark an economy for partners to work with Cloud Service Providers. It is definitely addressing some key concerns of customers related to security and freedom to choose, and even change service providers. It seems to be offering the best of both worlds (for now) but Rackspace is not looking at this for immediate gains. But we still do not know how this economic pie will grow and how it will affect the cloud economy. And this does not negate the fact that us storage professionals have to dig deeper and learn more and this not does change the fact that we have to evolve to compete against the best in the world.

Rackspace has come out beating its chest and predicted that the cloud computing API space will boil down these 3 players – Rackspace Openstack, VMware and Amazon Web Services (AWS). Interestingly, Redhat Aeolus (previously known as Deltacloud) was not worthy to mentioned by Rackspace. Some pooh-pooh going on?