btrfs – a better butter from a better COW?
There’s better a lot of chatter about the default file system in the recently released Fedora 16. Fedora 16 was released on November 8, 2011. For months, there were rumours that the default file system for Fedora 16 will be btrfs (btree file system, or better known as butter-FS). But after the release, the default file system of Fedora 16 is still ext4, much to the dismay of many. btrfs holds a lot of potential because in the space of less than 4 years, it has moved up the hierarchy to be one of the top file systems in the Linux ecosystem.
File systems in Linux has not seen had a knight in shining armour for the long time. The file systems kings of the Linux world were ext2/3 for the RedHat and Debian flavoured distros and reiserfs for the SuSE flavoured distros. ext4 is now default in most Linux distros but the concept of ext2/3/4 has not changed much since its inception decades ago.
At the same time, reiserfs had a lot of promise as well, but its development and progress have lost it lustre after its principal developer, Han Reiser was convicted of the murder of his wife a few years ago. If you are KPC (Malaysian Chinese colloquialism, meaning busybody), you can read the news here.
btrfs is going to be the new generation of file systems for Linux and even Ted T’so, the CTO of Linux Foundation and principal developer admitted that he believed btrfs is the better direction because “it offers improvements in scalability, reliability, and ease of management”.
For those who has studied computer science, B-Tree is a data structure that is used in databases and file systems. B-Tree is an excellent data structure to store billions and billions of objects/data and is able to provide fast data retrieval in logarithmic time. And the B-Tree implementation is already present in some of the file systems such as JFS, XFS and ReiserFS. However, these file systems are not shadow-paging filesystems (popularly known as copy-on-write file systems).
You see, B-Tree, in its native form of implementation, is very incompatible with COW file systems. In fact, the implementation was thought of impossible, until someone by the name of Ohad Rodeh came along. He presented a paper in Usenix FAST ’07 which described the challenges of combining the B-Tree concept with shadow paging file systems. The solution, as he described it, was to introduce insert/remove key into the tree structure, and removing the dependency of intra-leaf linking.
Chris Mason, one of the developers of reiserfs, took Ohad’s idea and created a shadow-paging file system based on the B-Tree idea.
Traditional file systems tend to follow the idea of the Berkeley Fast File Systems. Different cylinder groups have its own inode, bitmap and disk blocks. The used space in one cylinder group cannot be shared to another cylinder group, resulting in wastage. At the same time, performance can be an issue as the disk read/head frequently have to move to the inodes to find out where the next used or free blocks will be. In a way, it looks like the diagram below.
Chris Mason’s idea of btrfs made the file system looked like this:
Today, a quick check in btrfs wiki page, shows that the main Btrfs features available at the moment include:
- Extent based file storage
- 2^64 byte == 16 EiB maximum file size
- Space-efficient packing of small files
- Space-efficient indexed directories
- Dynamic inode allocation
- Writable snapshots, read-only snapshots
- Subvolumes (separate internal filesystem roots)
- Checksums on data and metadata
- Compression (gzip and LZO)
- Integrated multiple device support
- RAID-0, RAID-1 and RAID-10 implementations
- Efficient incremental backup
- Background scrub process for finding and fixing errors on files with redundant copies
- Online filesystem defragmentation
And Chris Mason and his team have still plenty more to offer for btrfs. It will likely be the default file system in Fedora 17 and at the rate it is going, could be the file system of choice for RedHat, Debian and SuSE distros very soon.
There’s been a lot of comparisons between btrfs and ZFS, since both are part of Oracle now. ZFS is obviously a much more mature file system, with more enterprise features and more robust, (Incidentally, ZFS just celebrated its 10th birthday on Halloween 2011 – see Matt Ahren’s blog) and the btrfs is the rising star in the Linux world. But at this moment, the 2 file systems are set apart in their market positioning and deployment.
ZFS is based on the CDDL (Common Development and Distribution License) while btrfs is based on GNU GPL, open source licensing. There are controversies surrounding CDDL licensing scheme, while is incompatible with GNU GPL scheme.
Oracle can count itself very lucky to have 2 of the most promising and prominent COW file systems around. It will be interesting to see what Oracle will do next. As a proponent of innovation, community and sharing, I sincerely hope that both file systems will continue to thrive in Oracle’s brutal, sales-driven organization. We certainly don’t want to see controversies about dual ownership BS of Oracle and mySQL that could end in 2015.
Posted on November 21, 2011, in Filesystems, Oracle and tagged B-Tree, btrfs, Copy-on-Write, Debian, ext2, ext3, ext4, Fedora, file system, Linux, reiserfs, Shadow Paging, SuSE, Ubuntu. Bookmark the permalink. Leave a comment.