Playing with NetApp … After Rightsizing
It has been a tough week for me and that’s why I haven’t been writing much this week. So, right now, right after dinner, I am back on keyboard again, continuing where I have left off with NetApp’s usable capacity.
A blog and a half ago, I wrote about the journey of getting NetApp’s usable capacity and stopping up to the point of the disk capacity after rightsizing. We ended with the table below.
|Manufacturer Marketing Capacity||NetApp Rightsized Capacity|
* The size of 34.5GB was for the Fibre Channel Zone Checksum mechanism employed prior to ONTAP version 6.5 of 512 bytes per sector. After ONTAP 6.5, block checksum of 520 bytes per sector was employed for greater data integrity protection and resiliency.
At this stage, the next variable to consider is RAID group sizing. NetApp’s ONTAP employs 2 types of RAID level – RAID-4 and the default RAID-DP (a unique implementation of RAID-6, employing 2 dedicated disks as double parity).
Before all the physical hard disk drives (HDDs) are pooled into a logical construct called an aggregate (which is what ONTAP’s FlexVol is about), the HDDs are grouped into a RAID group. A RAID group is also a logical construct, in which it combines all HDDs into data or parity disks. The RAID group is the building block of the Aggregate.
So why a RAID group? Well, first of all, (although likely possible), it is not prudent to group a large number of HDDs into a single group with only 2 parity drives supporting the RAID. Even though one can maximize the allowable, aggregated capacity from the HDDs, the data reconstruction or data resilvering operation following a HDD failure (disks are supposed to fail once in a while, remember?) would very much slow the RAID operations to a trickle because of the large number of HDDs the operation has to address. Therefore, it is best to spread them out into multiple RAID groups with a recommended fixed number of HDDs per RAID group.
RAID group is important because it is used to balance a few considerations
- Performance in recovery if there is a disk reconstruction or resilvering
- Combined RAID performance and availability through a Mean Time Between Data Loss (MTBDL) formula
Different ONTAP versions (and also different disk types) have different number of HDDs to constitute a RAID group. For ONTAP 8.0.1, the table below are its recommendation.
So, given a large pool of HDDs, the NetApp storage administrator has to figure out the best layout and the optimal number of HDDs to get to the capacity he/she wants. And there is also a best practice to set aside 2 HDDs for a RAID-DP configuration with every 30 or so HDDs. Also, it is best practice to take the default recommended RAID group size most of the time.
I would presume that this is all getting very confusing, so let me show that with an example. Let’s use the common 2TB SATA HDD and let’s assume the customer has just bought a 100 HDDs FAS6000. From the table above, the default (and recommended) RAID group size is 14. The customer wants to have maximum usable capacity as well. In a step-by-step guide,
- Consider the hot sparing best practice. The customer wants to ensure that there will always be enough spares, so using the rule-of-thumb of 2 HDDs per 30 HDDs, 6 disks are set aside as hot spares. That leaves 94 HDDs from the initial 100 HDDs.
- There is a root volume, rootvol, and it is recommended to put this into an aggregate of its own so that it gets maximum performance and availability. To standardize, the storage administrator configures 3 HDDs as 1 RAID group to create the rootvol aggregate, aggr0. Even though the total capacity used by the rootvol is just a few hundred GBs, it is not recommended to place data into rootvol. Of course, this situation cannot be avoided in most of the FAS2000 series, where a smaller HDDs count are sold and implemented. With 3 HDDs used up as rootvol, the customer now has 91 HDDs.
- With 91 HDDs, and using the default RAID group size of 14, for the next aggregate of aggr1, the storage administrator can configure 6 x full RAID group of 14 HDDs (6 x 14 = 84) and 1 x partial RAID group of 7. (91/14 = 6 remainder 7). And 84 + 7 = 91 HDDs.
- RAID-DP requires 2 disks per RAID group to be used as parity disks. Since there are a total of 7 RAID groups from the 91 HDDs, 14 HDDs are parity disks, leaving 77 HDDs as data disks.
This is where the rightsized capacity comes back into play again. 77 x 2TB HDDs is really 77 x 1.69TB = 130.13TB from an initial of 100 x 2TB = 200TB.
If you intend to create more aggregates (in our example here, we have only 2 aggregates – aggr0 and aggr1), there will be more consideration for RAID group sizing and parity disks, further reducing the usable capacity.
This is just part 2 of our “Playing with NetApp Capacity” series. We have not arrived at the final usable capacity yet and I will further share that with you over the weekend.