Category Archives: Filesystems

VAAI to go!

Posted by cfheoh

First of all, let me apologize. I am guilty of not updating my blogs as regularly as I did in the past. Things got a bit crazy after Christmas and I had to juggle several things that demand more of my attention but I am confident things will sort itself out soon enough.

Today’s topic is about VMware’s VAAI (vSphere vStorage API for Array Integration). This feature was announced more than 3 years ago but was only introduced in vSphere 4.1 July 2010 and now with newer enhancements in the latest release of vSphere 5.0.

What is this VAAI and what does this mean from a storage perspective?

When VMware came into prominence in version 3.0/3.5 time, the whole world revolved around the ESX hypervisor. It tried to do everything on its own, in its own proprietary nature. Given its nascent existence then, ESX had to do what it had to do and control everything with its hypervisor universe. Yes, it was a good move then and it did what it was supposed to do. This was back when server virtualization was in its infancy, and resources requirements were less demanding.

Hence when VMware wants to initialize VMs, or create VMDK files on the datastore, or creating clones or snapshots, or even executing VMotion and Storage VMotion, it tends to execute it at the hypervisor level. For example, when creating virtual disks with VMFS, most of the commands to initialization of the disks were done at the VMFS level. Zeroing the virtual disks would mean sending zeroing commands to the actual physical disks on the shared storage. And this would go on back and forth, taxing the CPU cycles and memory on the hypervisor layer, and sending wasteful and unnecessary zeroes over the network to the storage array. This was very inefficient, wasteful and degrades the performance tremendously, especially at the hypervisor layer (compute and memory).

There are also other operations such as virtual disks locking that locks up the entire LUN that housed several datastores. Again, not good.

But VMware took off like a rocket, and quickly established itself as a Tier 1, enterprise server virtualization solution addressing the highest demands of the enterprise. It is also defining the future of Cloud Computing, building exorbitant requirements as it pushes forward. And VMware began to realize that if the hypervisor is to scale, it needs to leave the I/O operations to the “experts”, and the “experts” here being the respective storage array itself.

So, in version 4.1, VAAI (vStorage API for Array Integration) was introduced as an API suite, following 3 other earlier APIs – vStorage API for Site Recovery Manager (SRM), vStorage API for Data Protection and vStorage API for Multipathing.

In a nutshell, as I have mentioned before, VAAI offloads I/O and storage related operations to the VAAI-capable storage array (leave it to the experts) as shown in the diagram below:

Of course, the storage vendors themselves has to rework their array OS layer to integrate with the VAAI API. You can say that the VAAI are “hooks” that enhances the storage connectivity and communications with vSphere’s hypervisor. But then again, if you look at it from the other angle, vSphere need the storage vendors more in order for its universe to scale. Good thing VMware has a big, big market share. Imagine if there are no takers for the VAAI APIs. That would be a strange predicament instead.

What is the big deal that we get from VAAI? The significant and noticeable benefit is increase performance. By offloading the I/O functionality and operations to the storage array itself, the hypervisor and the compute and memory resource are not bogged down, resulting in higher performance and better response time to serve its VMs and other VM operations.

I am going off to another meeting and I shall write of VAAI in more details later. Until the next entry, adios and have a great year ahead.

Posted in Filesystems, Virtualization

2 Comments

Tags: hypervisor, I/O performance, performance, VAAI, VM, VMFS, VMware, vStorage API

Is there IOPS for Cloud Storage? – Nasuni style

Dec 21

Posted by cfheoh

I was in Singapore last week attending the Cloud Infrastructure Services course.

In the class, one of the foundation components of Cloud Computing is of course, storage. As the students and the instructor talked about Storage, one very interesting argument surfaced. It revolved around the storage, if it was offered on the cloud. A lot of people assumed that Cloud Storage would be for their databases, and their virtual machines, which of course, is true when the communication between the applications, virtual machines and databases are in the local area network of the Cloud Service Provider (CSP).

However, if the storage is offered through the cloud to applications that are sitting on-premise in the customer’s server room, then we have to think twice of how we perceive Cloud Storage. In this aspect, the Cloud Storage offered by the CSP is a Infrastructure-as-a-Service (IaaS), where the key service is Storage. We have to differentiate that this Storage functions as a data container, and usually not for I/O performance reasons.

Though this concept probably will be easily understood by storage professionals like us, this can cause a bit confusion for someone new to the concept of Cloud Computing and Cloud Storage. This confusion, unfortunately, is caused by many of us who are vendors or solution providers, or even publications and magazines. We are responsible to disseminate correct information to customers, but due to our lack of knowledge and experience in this extremely new market of Cloud Storage, we have created the FUDs (Fear, Uncertainty and Doubt) and hype.

Therefore, it is the duty of this blogger to clear the vapourware, and hopefully pass on the right information to accelerate the adoption of Cloud Storage in the near future. At this moment, given the various factors such as network costs, high network latency and lack of key network technologies similar to LAN in Cloud Computing, Cloud Storage is, most of the time, for data storage containership and archiving only. And there are no IOPS or any performance related statistics related to Cloud Storage. If any engineer or vendor tells you that they have the fastest Cloud Storage in the industry, do me a favour. Give him/her a knock on the head for me!

Of course, as technologies evolve, this could change in the near future. For now, Cloud Storage is a container, NOT a high performance storage in the cloud. It is usually not meant for transactional data. There are many vendors in the Cloud Storage space from real CSPs to storage companies offering re-packaged storage boxes that are “cloud-ready”. A good example of a CSP offering Cloud Storage is Amazon S3 (Simple Storage Service). And storage vendors such as EMC and HDS are repackaging and rebranding their storage technologies as object storage, ready for the cloud. EMC Atmos is really a repackaged and rebranded Centera, with some slight modifications, while HDS , using their Archiving solution, has HCP (aka HCAP). There’s nothing wrong with what EMC and HDS have done, but before the overhyping of the world of Cloud Computing, these platforms were meant for immutable data archiving reasons. Just thought you should know.

One particular company that captured my imagination and addresses the storage performance portion is Nasuni. Of course, they are quite inventive with the Cloud Storage Gateway approach. Nasuni comes up with a Cloud Storage Gateway filer appliance, which can be either a physical 1U server or as a VMware or Hyper-V virtual appliance sitting on-premise at the customer’s site.

The key to this is “on-premise”, which allows access to data much faster because they are locally-cached in the Nasuni filer appliance itself. This Nasuni filer piece addresses the Cloud Storage “performance” piece but Nasuni do not claim any performance statistics with such implementation. The clever bit is that this addresses data or files that are transactional in nature, i.e. NFS or CIFS, to serve data or files “locally”. (I wonder if Nasuni filer has iSCSI as well. Hmmmm….)

In the Nasuni architecture, they “break up” their “Cloud Storage” into 2 pieces. Piece #1 sits on-premise, at the customer site, and acts as a bridge to the Piece #2, that is sitting in a Cloud Storage. From a simplified view, have a look at the diagram below:

Piece #1 is the component that handles some of the transactional traffic related to files. In a more technical diagram below, you can see that the Nasuni filer addresses the file sharing portion, using the local disks on the filer appliance as a local caching mechanism.

Furthermore, older file pieces are whiffed away to the any Cloud Storage using the Cloud Connector interface, hence giving the customer a sense that their storage capacity needs can be limitless if they want to (for a fee, of course). At the same time, the Nasuni filer support thin provisioning and snapshots. How cool is that!

The Cloud Storage piece (Piece #2) is used for the data container and archiving reasons. This component can be sitting and hosted at Amazon S3, Microsoft Azure, Rackspace Cloud Files, Nirvanix Storage Delivery Network and Iron Mountain Archive Services Platform.

The data communication and transfer between the Nasuni filer is secure, encrypted, deduplication and compressed, giving it the efficiency and security that most customers would be concerned about. The diagram below explains the dat communication and data transfer bit.

In this manner, the Nasuni filer can replace traditional NAS platforms and can potentially provide a much lower total cost of ownership (TCO) in the long run. Nasuni does not pretend to be a NAS replacement. To me, this concept is very inventive and could potentially change the way we perceive file sharing and file server, obscuring and blurring concept of NAS.

Again, I would like to reiterate that Nasuni does not attempt to say their solution is a NAS or a performance-based Cloud Storage but what they have cleverly packaged seems to be appealing to customers. Their customer base has grown 78% in Q2 of 2011. It’s just too bad they are not here in Malaysia or this part of the world (yet).

IOPS in Cloud Storage? Not yet.

Posted in Cloud, Data, Filesystems, Nasuni, Performance Caching

4 Comments

Tags: Amazon S3, Cloud Storage, data archiving, Filer, IOPS, Iron Mountain, Microsoft Azure, NAS, Nasuni, performance, Rackspace

A little yellow elephant

Dec 17

Posted by cfheoh

By now, I believe most of you in the storage networking world would have heard of Hadoop. Hadoop was created by Doug Cutting, while he and his team was working on an open source web search engine called Nutch. The easily recognized little yellow elephant, Hadoop, was Doug Cutting’s son toy, which he made as Hadoop’s mascot. Pretty cool!

And today, Hadoop has become THE platform for Big Data applications. Why?

As I have mentioned before, everything that we do or don’t do, generates data, either as a direct product or in-direct product. I am blogging right now and I am creating data. I was in Singapore the whole of this week and everywhere I go in the MRT stations, I am being watched by the video cameras they have at the station. A new friend in class said that Singapore is the second most “watched” city after London, where there are video cameras mounted everywhere, either discreetly or indiscreetly. And that’s just video data. And there’s plenty of other human activities that generate tons and tons of data.

IDC Digital Universe Report for 2011 said that we have generated 1.8ZB (zettabyte) of data this year alone. I mentioned in my previous blog that this is a gold mine and companies are scrambling to tap on massive amount of data. Extracting valuable information to anticipate the next trend or predict that next evolution in human preference is akin to the Gold Rush in the wild, wild west in the late 19th century. Folks, Big Data is going to be this generation’s “Digital Gold Rush”.

Sieving, filtering and processing gazillions of data (more unstructured than structured) will not work in defined, well-formatted relational databases. The data model of relational databases will simply break down. And of course, there are different schools of thoughts of different data models, but the Hadoop model seems to be gaining momentum and mind share of data scientists. That is because of Hadoop’s capability to deal with massive unstructured data, processing it and producing results in a small amount of time.

One way to process the pool of massive data is parallel programming. In parallel programming, multi-threading is commonly deployed to achieve the performance and effects of programming. But implementing multi-threading in parallel programming is difficult. Developers often has to deal with LWP (lightweight processes), semaphores, shared memory, mutex (mutually exclusive) locking and so on. Hence this style of programming works with different states on shared data, often resulting in different results in different states, even when using the same programming expression.

Hadoop belongs to another school of programming known as functional programming, where the different states on shared data concept is removed. With that in mind, the dependency on different states is also removed, resulting in a much easier and simpler parallel programming implementation. Hadoop borrows ideas from the MapReduce software framework made well known by Google and the Google File System.

Before, we get to know Hadoop, we must know MapReduce. MapReduce is a framework which allows very large data sets to be processed with a very large set of computer nodes in a cluster. Typically the computational processing is executed in a distributed fashion, spread across many computer nodes and final results are consolidated from the sub-results of these distributed processing nodes.

According to Wikipedia, the 2 key functions of Map Reduce are map() and reduce(). That’s pretty obvious. The extract below was taken from the Wikipedia definition, and explains both functions very well.

“Map” step: The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.

“Reduce” step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.

The diagram below probably can simplify the concept of MapReduce to the readers.

Hadoop is one of the open-source implementations of MapReduce. It is one of the projects of Apache Foundation, and the project has sparked a brand-new niche of data search, data management and data science. The diagram below will allow our readers to juxtapose MapReduce and Hadoop, and comparing them in the simplest fashion.

Hadoop primary development platform is Java. Hadoop’s architecture consists mainly of 2 components – Hadoop Common and a Hadoop-compatible file system, as shown in the diagram below.

Hadoop MapReduce layer above is the file/object access interface to the Hadoop-compatible file system below. HDFS is Hadoop Distributed File System is just one of a few Hadoop-compatible file systems. Other file systems include:

Amazon S3 File System as part of the Amazon EC2 Infrastructure-as-a-Service (IaaS) cloud platform
CloudStore – a similar Hadoop-like implementation using C++ and also inspired by Google File System
FTP file systems
HTTP and HTTPS read-only file systems
Any file systems accessible with the file:// URL nomenclature

But the main engine of Hadoop is in the MapReduce layer. The 2 core components in this layer is JobTracker and TaskTracker. Both has their own individual roles to play and collectively, they are key cogs in the Hadoop distributed data processing model.

Below are extract I picked up from Wikipedia.

JobTracker submits MapReduce jobs to client applications. The JobTracker pushes work out to available TaskTracker nodes in the cluster, striving to keep the work as close to the data as possible. With a rack-aware filesystem, the JobTracker knows which node contains the data, and which other machines are nearby. If the work cannot be hosted on the actual node where the data resides, priority is given to nodes in the same rack. This reduces network traffic on the main backbone network. If a TaskTracker fails or times out, that part of the job is rescheduled. The TaskTracker on each node spawns off a separate Java Virtual Machine process to prevent the TaskTracker itself from failing if the running job crashes the JVM. A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status. The Job Tracker and TaskTracker status and information is exposed by Jetty and can be viewed from a web browser. Jetty is a Java-based HTTP server, among other things

JobTracker records what it is up to in the filesystem. When a JobTracker starts up, it looks for any such data, so that it can restart work from where it left off.

Scheduling

By default Hadoop uses first-in, first-out (FIFO), and optional 5 scheduling priorities to schedule jobs from a work queue. In version 0.19 the job scheduler was refactored out of the JobTracker, and added the ability to use an alternate scheduler (such as the Fair scheduler or the Capacity scheduler).

Fair scheduler

The fair scheduler was developed by Facebook. The goal of the fair scheduler is to provide fast response times for small jobs and QoS (Quality of Service) for production jobs. The fair scheduler has three basic concepts.

Jobs are grouped into Pools.
Each pool is assigned a guaranteed minimum share.
Excess capacity is split between jobs.

By default jobs that are uncategorized go into a default pool. Pools have to specify the minimum number of map slots, reduce slots, and a limit on the number of running jobs.

Capacity scheduler

The capacity scheduler was developed by Yahoo. The capacity scheduler supports several features which are similar to the fair scheduler.

Jobs are submitted into queues.
Queues are allocated a fraction of the total resource capacity.
Free resources are allocated to queues beyond their total capacity.
Within a queue a job with a high level of priority will have access to the queue’s resources.

I took most the extract below from Wikipedia, and I don’t claim to be a knowledgeable person on Hadoop. All the credits go to Wikipedia editors to put Hadoop in layman terms.

Hadoop has certainly won the hearts of the new digital gold rush, Big Data and is slowly becoming a force to be reckoned with among data scientists. Hadoop implementations are powering new frontiers in processing and mining the ever growing data capacity, giving solution providers a simple programming methodology and data model to gain more insights into the vast seas of data and information.

Hadoop has many fans, and slowly becoming the data platform for large companies such as Yahoo!, Facebook, IBM, Amazon, Apple, eBay and many more. Facebook even claims to have the largest Hadoop clusters in the world, growing to 30PB in July of 2011.

This little yellow elephant is going places and one to watch out for.

Posted in Analytics, Big Data, Data, Filesystems, Hadoop

What should be a Cloud Storage?

Dec 8

Posted by cfheoh

For us filesystem guys, NAS is the way to go. We are used to store files into network file systems via NFS and CIFS protocols and treating the NAS storage array like a refrigerator – taking stuff out and putting stuff back it. All that is fine and well as long as the data is what I would term as corporate data.

Corporate data is generated by employees, applications and users of the company and for a long time, the power of data creation lies in the hands of the enterprise. That is why storage solutions are designed to address the needs of the enterprise where the data is structured and well defined. How the data is stored; the data is formatted; and how is being accessed are the “boundary” of how the data is being used. Typically a database is used to “restrict” the data created so that the information can be retrieved and modified quickly. And of course, SAN guys will tell you to put these structured data of the data base into their SAN.

For the unstructured data in the enterprise, NAS file systems hold that responsibility. Files such as documents and presentations have a more loosely defined “boundaries”, and hence filesystems are a better natural fit for unstructured data. Filesystems are like a free-for-all container, and able to store and provide access to any files in the enterprise.

But today, as the Web 2.0 applications are already taking over the enterprise, the power of data creation does not necessary lie in the hands of the enterprise applications and users. In fact, it is estimated that the percentage of enterprise data now has exceeded 50% of the enterprise’s total data capacity. With the proliferation of personal devices such as tablets, Blackberries, smart phones, PDAs and so on, individual contributors are generating plenty of data. This situation has been made more acute with Web 2.0 applications, such as Facebook, blogs, social networking, Twitter, Google Search and so on.

Unfortunately, file systems in the NAS category still pretty much the traditional file systems, while the needs of a new type of file system could not be met by the traditional file systems. The paradigm is definitely shifting. The new unstructured data world needs a new storage concept. I would term this type of storage as “Cloud Storage” because it breaks down the traditional concepts of NAS.

So what basically defined a Cloud Storage? I already mentioned that the type of unstructured data has changed. And the new requirements for unstructured data type are:

The unstructured data type is capable of globally distributed.
There will be billions and billions of unstructured data objects created but each object, be it a Twitter tweet, or a uploaded mobile video, or even the clandestine data collected by CarrierIQ, can be accessed easily via a single namespace
The storage file system foundation for these new unstructured data type is easily provisioned and management. Look at Facebook. It is easy to setup, get going and the user (and probably the data administrator) can easily manage the user interface and the platform
For the service provider of Cloud Storage, the file system must be secure and support multi-tenancy and virtualization of storage resources
There should be some form of policy-driven content management. That is why development platforms such as Joomla!, Drupal, WordPress are slowing become enterprise driven to address these unstructured data types.
Highly searchable and have a high degree of search optimization. A Google search do have a strong degree of intelligence and relevance to the data being search as well as generating tons of by-product data that feeds the need to understand the consumers or the users better. Hail Big Data!

So when I compare traditional NAS storage solutions such as Netapp or EMC VNX or BlueArc, I ask the question of whether their NAS solutions has these capabilities to meet the requirements of these new unstructured data type.

Most of them, no matter how they package it, is still relying on files as the granular object of storage. And today, most files may have some form of metadata such as file name, owner, size etc, DO NOT, possess the capability of content-aware. Here’s an example when I want to show you:

The file properties (part of the file metadata) tell you about the file but little about the content of the file. Today, it requires more than that and the new unstructured data type should look more like this:

If you look at the diagram below, the object on the right (which is the new unstructured data type), display much more information than a typical file in a NAS file system. There additional information becomes the fodder to other applications such as search engines, RSS feeds, robots and spiders and of course, big data analytics.

Here’s another example of what I mean about these extended metadata, and a Cloud Storage storage array is required to work with these new set of parameters and a new set of requirement.

There’s a new unstructured data type in town. Traditional NAS systems may not have the right features to work with this new paradigm.

Don’t be white washed by the fancy talk of storage vendors in town. Learn the facts, and find out what is really a Cloud Storage.

It’s time to think differently. It’s time to think of what should be a Cloud Storage.

Posted in Analytics, Big Data, Filesystems, Object Storage

3 Comments

Tags: big data, Cloud Storage, databases, metadata, structured data, unstructured data

btrfs – a better butter from a better COW?

Nov 21

Posted by cfheoh

There’s better a lot of chatter about the default file system in the recently released Fedora 16. Fedora 16 was released on November 8, 2011. For months, there were rumours that the default file system for Fedora 16 will be btrfs (btree file system, or better known as butter-FS). But after the release, the default file system of Fedora 16 is still ext4, much to the dismay of many. btrfs holds a lot of potential because in the space of less than 4 years, it has moved up the hierarchy to be one of the top file systems in the Linux ecosystem.

File systems in Linux has not seen had a knight in shining armour for the long time. The file systems kings of the Linux world were ext2/3 for the RedHat and Debian flavoured distros and reiserfs for the SuSE flavoured distros. ext4 is now default in most Linux distros but the concept of ext2/3/4 has not changed much since its inception decades ago.

At the same time, reiserfs had a lot of promise as well, but its development and progress have lost it lustre after its principal developer, Han Reiser was convicted of the murder of his wife a few years ago. If you are KPC (Malaysian Chinese colloquialism, meaning busybody), you can read the news here.

btrfs is going to be the new generation of file systems for Linux and even Ted T’so, the CTO of Linux Foundation and principal developer admitted that he believed btrfs is the better direction because “it offers improvements in scalability, reliability, and ease of management”.

For those who has studied computer science, B-Tree is a data structure that is used in databases and file systems. B-Tree is an excellent data structure to store billions and billions of objects/data and is able to provide fast data retrieval in logarithmic time. And the B-Tree implementation is already present in some of the file systems such as JFS, XFS and ReiserFS. However, these file systems are not shadow-paging filesystems (popularly known as copy-on-write file systems).

You see, B-Tree, in its native form of implementation, is very incompatible with COW file systems. In fact, the implementation was thought of impossible, until someone by the name of Ohad Rodeh came along. He presented a paper in Usenix FAST ’07 which described the challenges of combining the B-Tree concept with shadow paging file systems. The solution, as he described it, was to introduce insert/remove key into the tree structure, and removing the dependency of intra-leaf linking.

Chris Mason, one of the developers of reiserfs, took Ohad’s idea and created a shadow-paging file system based on the B-Tree idea.

Traditional file systems tend to follow the idea of the Berkeley Fast File Systems. Different cylinder groups have its own inode, bitmap and disk blocks. The used space in one cylinder group cannot be shared to another cylinder group, resulting in wastage. At the same time, performance can be an issue as the disk read/head frequently have to move to the inodes to find out where the next used or free blocks will be. In a way, it looks like the diagram below.

Chris Mason’s idea of btrfs made the file system looked like this:

Today, a quick check in btrfs wiki page, shows that the main Btrfs features available at the moment include:

Extent based file storage
2^64 byte == 16 EiB maximum file size
Space-efficient packing of small files
Space-efficient indexed directories
Dynamic inode allocation
Writable snapshots, read-only snapshots
Subvolumes (separate internal filesystem roots)
Checksums on data and metadata
Compression (gzip and LZO)
Integrated multiple device support
RAID-0, RAID-1 and RAID-10 implementations
Efficient incremental backup
Background scrub process for finding and fixing errors on files with redundant copies
Online filesystem defragmentation

And Chris Mason and his team have still plenty more to offer for btrfs. It will likely be the default file system in Fedora 17 and at the rate it is going, could be the file system of choice for RedHat, Debian and SuSE distros very soon.

There’s been a lot of comparisons between btrfs and ZFS, since both are part of Oracle now. ZFS is obviously a much more mature file system, with more enterprise features and more robust, (Incidentally, ZFS just celebrated its 10th birthday on Halloween 2011 – see Matt Ahren’s blog) and the btrfs is the rising star in the Linux world. But at this moment, the 2 file systems are set apart in their market positioning and deployment.

ZFS is based on the CDDL (Common Development and Distribution License) while btrfs is based on GNU GPL, open source licensing. There are controversies surrounding CDDL licensing scheme, while is incompatible with GNU GPL scheme.

Oracle can count itself very lucky to have 2 of the most promising and prominent COW file systems around. It will be interesting to see what Oracle will do next. As a proponent of innovation, community and sharing, I sincerely hope that both file systems will continue to thrive in Oracle’s brutal, sales-driven organization. We certainly don’t want to see controversies about dual ownership BS of Oracle and mySQL that could end in 2015.

Posted in Filesystems, Oracle

NetApp SPECSfs record broken in 13 days

Nov 17

Posted by cfheoh

Thanks for my buddy, Chew Boon of HDS who put me on alert about the new leader of the SPECSfs benchmark results. NetApp “world record” has been broken 13 days later by Avere Systems.

Avere has posted the result of 1,564,404 NFS ops/sec with an ORT (overall response time) of 0.99 msec. This benchmark was done by 44 nodes, using 6.808 TB of memory, with 800 HDDs.

Earlier this month, NetApp touted fantastic results and quickly came out with a TR comparing their solution with EMC Isilon. Here’s a table of the comparison

But those numbers are quickly made irrelevant by Avere FXT, and Avere claims to have the world record title with the “smallest footprint ever”. Here’s a comparison in Avere’s blog, with some photos to boot.

For the details of the benchmark, click here. And the news from PR Newswire.

If you have not heard of Avere, they are basically the core team of Spinnaker. NetApp acquired Spinnaker in 2003 to create the clustered file systems from the Spinnaker technology. The development and integration of Spinnaker into NetApp’s Data ONTAP took years and was buggy, and this gave the legroom for competitors like Isilon to take market share in the clustered NAS/scale-out NAS landscape.

Meanwhile, NetApp finally came did come good with the Spinnaker technology and with ONTAP 8.0.1 and 8.1, the codes of both platforms merged into one.

The Spinnaker team went on develop a new technology called the “A-3 Architecture” (shown below) and positioned itself as a NAS Accelerator.

The company has 2 series of funding and now has a high performance systems to compete with the big boys. The name, Avere Systems, is still pretty much unknown in this part of the world and this “world record” will help position them stronger.

But as I have said before, benchmarking are just ways to have bigger bragging rights. It is a game of leapfrogging, and pretty soon this Avere record will be broken. It is nice while it lasts.

Posted in Avere, Filesystems, Performance Benchmark

2 Comments

Tags: Avere, benchmark, FXT, NFS, ORT, SPECSfs, world record

The future is intelligent objects

Nov 16

Posted by cfheoh

We are used to block-based approach and also the file-based approach to data. The 2 diagrams below shows the basics of how we access data in both block-based and file-based data on the storage device.

For block-based , the storage of the blocks is merely in arrays of unrelated contiguous blocks. For file-based, as seen below,

there is another layer of abstraction, and this is called the file system. But if you seen both diagrams above, there are some random numbers in light blue and that is to represent the storage device, the hard disk drive’s export of “containers” to the file system or the application that is accessing the storage device. This is usually the LBA (Logical Block Addressing), which is basically set of schematics that defines the locations on the hard disk drives. LBA tells the location of where the data is stored. For more information about LBA, check out this Wikipedia definition. But the whole idea is LBA is dumb. It is pretty much static and exported to file systems and applications so that these guys can do something with it.

There’s something brewing in the background since 1994 and it is one of the many efforts to make intelligent storage devices. This new object-based interface was part of the research project done by Carnegie-Mellon University (CMU). Initially, it was known as Network Attached Secure Disk (NASD) but eventually made its way to the working group in SNIA, and developing it for ANSI T10 INCITS standard. ANSI T10 is the guardian of all SCSI standards. This is called Object Storage Device (OSD). The SCSI architecture diagram below shows the layer where OSD resides.

The motivation for this simple: To make storage devices of today to do more computational work, in particularly I/O, relieving the hosts and the local systems to concentrate other computational processing work. And the same time, the local systems must have some level of interactivity and management between the storage object and the computational hosts.

In the diagram below which compares both block-based and OSD,

you can see the separation of file system management interface that is at the kernel-space of the local host/system and this is replaced by the OSD Management interface at the storage device.

What does this all mean? This means that using LBA type of addressing that we are familiar with in the block-based and file-based storage is no longer the way to go, because as I mentioned before, LBA is dumb.

OSD, in some way, replaces the LBA with OIDs (Object IDs). The existing local system and/or its file system will interact with the storage devices with OIDs and the OIDs links to its respective objects storage. And the object will carry a lot of metadata, that represents the object, giving it the intelligent and management capability of the object.

The prominence of the metadata in the OSD would mean that we can build much more intelligent systems in the future. The OIDs and the objects can be grouped together in a flat design or can be organized and categorized in a virtual, hierarchical model.

Object storage is an intelligent evolution of disk drives that can store and serve objects rather than simply place data on tracks and sectors. And it can bring the following benefits:

Intelligent space management in the storage layer
Data aware pre-fetching and caching
Robust shared access by multiple clients
Scalable performance using off-loaded data path
Reliability security

Several vendors such as EMC and NetApp are already supporting OSD.

Posted in Filesystems, Object Storage, SCSI

2 Comments

Tags: ANSI T10, block storage, File systems, object storage, OSD, SCSI, SNIA

Stop stroking your …

Nov 11

Posted by cfheoh

A few days after I wrote about the performance benchmark bag of tricks, EMC was the first to fire the first salvo at NetApp’s SPECSfs2008 world records on NFS IOPS.

EMC is obviously using all its ammo to deflate NetApp chest thumping act, with Storagezilla‘s blog. Mark Twomey, who is the alter ego of Storagezilla posted several observations about NetApp apparent use of disk short stroking to artificially boost its performance numbers. This puts NetApp against the wall, with Alex MacDonald (who incidentally is SNIA NFSv4 co-chairman) of the office of the CTO responding hard to Storagezilla’s observation.

The news of this appeared in The Register. Read all about it.

With no letting up, the article also mentioned EMC Isilon’s CTO, Rob Pegler, adding more fuel to the fire.

I spoke about short stroking as some of the tricks used to gain better numbers in benchmark. And I also mentioned that these numbers have little use to the real work and I would like to add that these numbers are just there for marketing reasons. So, for you readers out there, benchmark is really not big of a deal.

Have a great weekend!

Posted in EMC, Filesystems, NetApp, NFS, Performance Benchmark

Ocarina rising

Oct 21

Posted by cfheoh

After more than a year since Dell acquired Ocarina Networks, it has finally surfaced last week in the form of Dell DX Object Storage 6000G SCN (Storage Compression Node).

Ocarina is a content-aware storage optimization engine, and their solution is one of the best I have seen out there. Its unique ECOsystem technology, as described in the diagram below, is impressive.

Unlike most deduplication and compression solutions out there, Ocarina Networks solution takes storage optimization a step further. Ocarina works at the file level and given the rise and crazy, crazy growth of unstructured files in the NAS space, the web and the clouds, storage optimization is one priority that has to be addressed immediately. It takes a 3-step process – Extract, Correlate and Optimize.

Today’s files are no longer a flat structure of a single object but more of a compounded file where many objects are amalgamated from different sources. Microsoft Office is a perfect example of this. An Excel file would consists of objects from Windows Metafile Formats, XML objects, OLE (Object Linking and Embedding) Compound Storage Objects and so on. (Note: That’s just Microsoft way of retaining monopolistic control). Similarly, a web page is a compound of XML, HTML, Flash, ASP, PHP object codes.

In Step 1, the technology takes files and breaks it down to its basic components. It is kind of like breaking apart every part of a car down to its nuts and bolt and layout every bit on the gravel porch. That is the “Extraction” process and it decodes each file to get the fundamental components of the files.

Once the compounded file object is “extracted”, identified and indexed, each fundamental object is Correlated in Step 2. The correlation is executed with the file and across files under the purview of Ocarina. Matching and duplicated objects are flagged and deduplicated. The deduplication is done at the byte-level, unlike most deduplication solutions that operate at the block-level. This deeper and more granular approach further reduces the capacity of the storage required, making Ocarina one of the most efficient storage optimization solutions currently available. That is why Ocarina can efficiently reduce the size of even zipped and highly encoded files.

It takes this storage optimization even further in Step 3. It applies content-aware compactors for each fundamental object type, uniquely compressing each object further. That means that there are specialized compactors for PDF objects, ZIP objects and so on. They even have compactors for Oil & Gas seismic files. At the time I was exposed to Ocarina Networks and evaluating it, it had about 600+ unique compactors.

After Dell bought Ocarina in July 2010, the whole Ocarina went into a stealth mode. Many already predicted that the Ocarina technology would be integrated and embedded into Dell’s primary storage solutions of Compellent and EqualLogic. It is not there yet, but will likely be soon.

Meanwhile, the first glimpse of Ocarina will be integrated as a gateway solution to Dell DX6000 Object Storage. DX Object Storage is a technology which Dell has OEMed from Caringo. DX6000 Object Storage (I did not read in depth) has the concept of the old EMC Centera, but with a much newer, and more approach based on XML and HTTP REST. It has published an open API and Dell is getting ISV partners to develop their applications to interact with the DX6000 including Commvault, EMC, Symantec, StoredIQ are some of the ISV partners working closely with Dell.

(24/10/2011: Editor note: Previously I associated Dell DX6000 Object Storage with Exanet. I was wrong and I would like to thank Jim Dtuton of Caringo for pointing out my mistake)

Ocarina’s first mission is to reduce the big, big capacities in Big Data space of the DX6000 Object Storage, and the Ocarina ECOsystem technology looks a good bet for Dell as a key technology differentiator.

Posted in Dell, Filesystems, Storage Optimization

5 Comments

Tags: Correlate, Dell, DX6000 Object Storage, ECOsystem, Extract, Filesystem, Ocarina Networks, Optimize

Novell Fil(e)r … Files, my way

Oct 19

Posted by cfheoh

I took a bit of time of my busy schedule this week to learn a bit more about the Novell Filr.

Firstly, it is a F-I-L-E-R, spelled “Filr”, something like Tumblr, or Razr. I think it’s pretty inventive but putting marketing aside, I learned about a little of how the idea works behind the concept. Right now, my evaluation is pretty much on the surface because I am working out the time for a real-life demo and hands-on later on.

As I mentioned in my previous blog, the idea behind Novell Filr is to allow the users to access their files anywhere, any device. The importance of this concept is to allow the users to stay in their comfort zone. This simple concept, of having the users being comfortable, is something that we should not overlook, because it brings together the needs of the enterprise and the IT organization and the needs of the individual users in a subtle, yet powerful way. It allows the behavioral patterns of the “lazy” users to be corralled into what IT wants them to do, that is to have the users’ files secured, protected and be in IT’s control. OK, that was my usual blunt way of saying it but I believe this is a huge step forward to address the issues at hand. And I am sorry for saying that the users are “lazy” but that’s what the IT guys would say.

What are the usual issues usually faced when it comes to dealing with user files? Let me count the ways:

Users don’t put the files in backup folders as they were told and they blame IT for not backing up their files
Users keep several copies of the files and email, share through thumb drives etc, to their friends and colleagues. IT gets blames for ever growing storage capacity needs and even worse, breaching the security of the organization as internal files are shared to outsiders.
Users wants to get their files on iPads, iPhones, Android Pads, BlackBerry and other smart devices and saying IT is too archaic. Users said that they are less productive if they can’t get the files anywhere. IT gets the blame again
Users has little discipline to change their habits and to think about file security and ownership of company’s private and confidential data – sharing files happily and IT gets blame

These points, from the IT point-of-view, are exactly the challenges faced daily. That is why users are flocking to Box.Net, DropBox and Windows Live SkyDrive because they want simplicity; they want freedom; they want IT to get off their back. But all these “confrontations” are comprising the integrity of the files and data of the organization.

Novell Filr, is likely to be one of the earliest solutions to address this problem. It attempts to marry both the simplicity and freedom ala-DropBox for the users, but in the IT backend, where the organization’s files will be stored, IT runs a tight ship of the users AAA (authentication, authorization and auditing) and at the same time, includes the Novell File Management Suite. As shown below, Novell File Management Suite consists of 3 main solutions.

I will probably talk more about the File Management Suite in another blog entry, but meanwhile, how does the Novell Filr work?

First of all, it sits between the conversation between the users’ devices (typically, this will be a Windows computer accessing a network drive via CIFS) and the central file storage. You know? The usual file sharing concept, but this traditional approach limits the users to only computers, not smart devices such as smartphones and tablets.

In the spirit of DropBox, I believe a Novell Filr client (computers, smart devices etc) speaks with the Novell Filr “middleware” with standard RESTful API, over HTTP. I still need to ascertain this because I have not had any engagement with Novell yet, nor have I seen the product. In the slides given to me, the explanation at 10,000 feet is shown below.

I will share more details later once I have more information.

At the same time, I cannot help but notice this changing trend of NAS. It seems to me that many of the traditional NAS ideas going the way of the REST protocol, especially in a object-based “file” access. In fact, the definition of a “file” would also be changing into a web object. While the tide has certainly rising on this subject, we shall see how it pans out as SMB 2.0 and NFS version 4.0 start making inroads to replace the NAS protocols of CIFS 1.1 and NFS version 3.0.

As I mentioned previously, this is not disruptive to me and I know of several vendors already have developments similar to this. But the fundamental shift of users behaviors to the Web 3.0 type of data, files and information access might be addressed well with the Novell Filr.

I can’t wait for the hands-on and demo, knowing that much can be addressed in the enterprise file management space by changing the users habits, in a subtle but definitely more effective way.

Posted in Filesystems, Novell

5 Comments

Tags: File System Management, Novell Filr

Search for:
Top Posts & Pages
- Copy-on-Write and SSDs - A better match than other file systems?
- The definition of Cloud Computing ... really
May 2024

M T W T F S S

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31

« Mar
Menu
Topics
- 10Gigabit Ethernet
- Acquisition
- Amazon
- Analytics
- Apple
- Appliance
- Atempo
- Avere
- Backup
- Big Data
- Bluecoat
- Brocade
- Cloud
- Commvault
- Data
- Data Corruption
- Deduplication
- Dell
- Dennis Ritchie
- Disks
- Dropbox
- EMC
- Falconstor
- Filesystems
- Fujitsu
- Gartner
- Green
- Hadoop
- HP
- IBM
- IDC
- iSCSI
- Joyent
- Kaminario
- Memory Cloud
- Nasuni
- NetApp
- NFS
- Novell
- Object Storage
- Openstack
- Oracle
- Performance Benchmark
- Performance Caching
- Rackspace
- RAID
- Reliability
- SCSI
- Seagate
- Security
- Snapshots
- SNIA
- Solaris
- Solid State Devices
- Starboard Storage
- Steve Jobs
- Storage Magazine
- Storage Market Share
- Storage Optimization
- Storage Tiering
- Tapes
- Uncategorized
- Unified Storage
- Virtualization
- VMware
Archives
Meta
Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Email Address:

Join 71 other subscribers
Blog Stats
- 294,897 hits

Storage Gaga

Going Ga-ga over storage networking technologies ….

Category Archives: Filesystems

VAAI to go!

A little yellow elephant

Scheduling

Fair scheduler

Capacity scheduler

btrfs – a better butter from a better COW?

NetApp SPECSfs record broken in 13 days

The future is intelligent objects

Stop stroking your …

Ocarina rising

Novell Fil(e)r … Files, my way

Top Posts & Pages

Menu

Topics

Archives

Meta

Follow Blog via Email

Blog Stats