Unless you are working with highly, parallelized access to files in a large scale-out NAS environment, you probably don’t get to work much with Parallel NFS (pNFS). pNFS is part of the NFSv4.1 (RFC 5661) standard, and was introduced in January 2010 to address NFS protocol in the clustered, scale-out NAS environment. It is to provide parallel file access across distributed servers.
pNFS is heavily driven by Panasas, NetApp, EMC, IBM, Sun (now Oracle) among others. And funnily enough, the company that sticks out from the bunch is one that used to tout block storage as the way to go, not files. That’s EMC, the company that more well known for its SAN solutions than its NAS (remember Celerra and IP4700?). And EMC has embraced pNFS in a big, big way. Read EMC’s CTO for Global Marketing, Chuck Hollis’ blog here and here.
However, unknown to many, a lot of the thinking that goes into pNFS are very similar to an EMC product some years ago. That product is EMC Highroad, which in the later years, renamed as Multi-path File System (MPFS).
Note: If you want to know more about the history of HighRoad/MPFS, read this blog.
The cornerstone of EMC MPFS is their File Mapping Protocol or FMP, which is a robust protocol that lines the mapping of files to their addressable blocks in storage. In a nutshell, when I was made responsible for this product during my time at EMC, I used to pitch to companies that MPFS was a file request is through NFS but respond to the requester can be in blocks (iSCSI or Fibre Channel). The beauty of this was, NFSv3 was chatty and heavy but the delivery of data through blocks via iSCSI or Fibre Channel has lesser overhead compared to NFSv3.
Hence the delivery is faster and EMC touted that the performance was 2-4x faster than NFS. Indeed, I have seen some lab tests results from EMC’s work with Schlumberger High Performance Lab in Houston, and the numbers were impressive. I still have them on Powerpoint somewhere.
In circa of 2003-2004, EMC donated the FMP code to IETF and as they say the rest was history.
The picture below basically summarizes what pNFS is all about.
A NFSv4 client will basically check with a Metadata server via the pNFS protocol about the layout of the distributed cluster of server. The file, could be striped across multiple nodes of the cluster, and it is the metadata server that provides a map to the client to access the blocks or files from these nodes. This is exactly what the EMC HighRoad/MPFS File Mapping Protocol (FMP) did, mapping the file requests to its respective corresponding blocks. See diagram below:
And one of the powerful feature of pNFS is that it is not just about NFS. The green arrow you see in the above diagram is the storage-access protocol. That access protocol can be NFSv4.1, CIFS, iSCSI, Fibre Channel, FCoE, Infiniband, and Object Storage Device (OSD).
In order to have pNFS working, the NFS client must be NFSv4.1 ready and that code has been made available in Linux and OpenSolaris. Other Unix vendors, no doubt, will be coming out with their NFSv4.1 implementation soon. Oooooh, there will be a Windows NFSv4.1 client coming as well!
But I want to dispel the notion that EMC is a SAN company. EMC is a very strong NAS company and if you have seen the IDC market share (ok, ok, many of you out there will argue about it), EMC is #1 in NAS. And their contribution to pNFS is immense.
I was up at about 1am several nights ago listening to the SNIA update of NFSv4 and it was worth losing sleep. I was both intrigued and interested to see a full scale NFSv4 adoption in the coming few years.
We all know that NFS has been around for more than 20 years already. Version 2 was released way back in 1989, with version 3 being around since 1995. That’s a long time, and it is beginning to show its age. NFS version 4 is not new as well. Believe or not, it was released in 2000, 11 years ago. But there was a significant update in 2010, with NFS version 4.1 and this came with parallel NFS (pNFS) support to address new requirements such as scale-out NAS and file striping across clusters of nodes.
I am doing my responsible bit to disseminate NFS version 4.x updates and moving the storage networking and filesystem community towards its imminent wide scaled adoption. This is one of the many entries I intend to share about NFS version 4.
So, why this librarian thingy? First of all, for those folks working on NFSv3, you would probably encounter issues about file locking and also high availability.
Don’t you just hate it when the server reboots, and your NFS mount point hangs? Depending if it was a hard mount or soft mount, the NFS retries could take forever or sometimes, the NFS clients just freezes. In additional to that, another frequent complaint is that NFSv3 has lousy file locking.
However, in the beginning years of NFS, the world was a very different place. Such issues about file locking and HA were very well addressed by NFSv2/v3 because the demands of the previous client-server world were lesser. As the world progressed in the 2nd millenium, NFS v2/v3 started to sputter.
In this lesson#1, I would like to share about 2 key features of NFSv4 to address the 2 issues I brought up – which is NFS HA and file locking. The 2 features are
- The NFS client leases a file lock from the NFS server for a certain period of time, eg. N seconds. It renews the lock with after the N seconds period has expired.
- If the NFS client fails, the lock is reclaimed by the server and released by the NFS server to other clients after a grace period
- If the NFS server fails and rebooted, all the files are locked for M seconds for the incumbent NFS clients to reclaim the locks. If they are not reclaimed by the respective client, the file lock is released.
- NFS client request a piece of the filesystem. The NFS server “lends” this piece to the client if it is not already locked, as a lease.
- NFS client works on this piece in its local cache
- Once the NFS client has completed the writes and commits to the piece of filesystem on the local cache, it releases the “borrowing” lease back to the NFS server
- If other NFS clients requests for the same piece of filesystem while it out on loan, the NFS server would say “Sorry, I loaned it out”
- If there is a high order authority requesting for the piece of filesystem, the NFS server would say to the NFS client, “I need this back” and will send an order to recall the filesystem
NFS has been around for almost 20 years and is the de-facto network file sharing protocol for Unix and Linux systems. There has been much talk about NFS, especially in version 2 and 3 and IT guys used to joke that NFS stands for “No F*king Security”.
NFS version 4 was different, borrowing ideas from Windows CIFS and Carnegie Mellon’s Andrew File System. And from its inception 11 years ago in IETF RFC 3010 (revised in 2003 with IETF RFC 3530), the notable new features of NFSv4 are:
Performance enhancement – One key enhancement is the introduction of the COMPOUND RPC procedure which allows the NFS client to group together a bunch of file operations into a single request to the NFS server. This not only reduces the network round-trip latency, but also reduces the small little chatters of the smaller file operations.
Removing the multiple daemons in NFSv3 – NFSv3 uses various daemons/services and various protocols to do the work. There isportmapper (TCP/UDP 111) which provides the port numbers for mounting and NFS services. There’s the mountd (arbitrary port byportmapper) service that does the mounting of NFS exports for the NFS clients. There’s nfsd on TCP/UDP 2049and so on. The command ‘rpcinfo -p’ below shows all the ports and services related to NFS
There are other features as well such as
Firewall friendly – The use ofportmapper dishing out arbitrary ports made it difficult for the firewall. NFSv4 changed that by consolidating most of the TCP/IP services into well-known ports which the security administrator can define in the firewall.
Stateful – NFSv3 is stateless and it does not maintain the state of the NFS clients. NFSv4 is stateful and implements a mandatory locking and delegation mechanisms. Leases for locks from the servers to the clients was introduced. A lease is a time-bounded grant for the control of the state of a file and this is implemented through locks.
Mandated Strong Security Architecture - NFSv4 requires the implementation of a strong security mechanism that is based on crytography. Previously the strongest security flavour was AUTH_SYS, which is level 1 clearance.
"The AUTH_SYS security flavor uses a host-based authentication model where the client asserts the user's authorization identities using small integers as user and group identity representations (this is EXACTLY how NFSv3 authenticates by default). Because of the small integer authorization ID representation, AUTH_SYS can only be used in a name space where all clients and servers share a uidNumber and gidNumber translation service. A shared translation service is required because uidNumbers and gidNumbers are passed in the RPC credential; there is no negotiation of namespace in AUTH_SYS."
NFSv4 security mechanism is based on RPCSEC_GSS, a level 6 clearance. RPCSEC_GSS is an API that is more than an authentication mechanism. It performs integrity checksum and encryption in the entire RPC request and response operations. This is further progressed with the integration of Kerberos v5 for user authentication. This is quite similar to Windows CIFS Kerberos implementation, providing a time-based ticket to gain authentication.
In addition to that, there are many other cool, new features with NFSv4. There was a further extension to NFSv4 last year, in 2010, when NFSv4.1 was added in IETF RFC5661. As quoted in Wikipedia – NFSv4.1 “aims to provide protocol support to take advantage of clustered server deployments including the ability to provide scalable parallel access to files distributed among multiple servers (pNFS extension).”
NFSv4 has much to offer. The future is bright.