## The Situation

I’ve seen, over the last few weeks, more than a few posts on a popular IT hangout site proclaiming in loud, evangelical voice that “RAID-5 is terrible for spinning disks, never use it! If you do, you’re a stupidhead!” and similar statements.

I’m here to tell you that’s not an appropriate answer. In fact, it’s tunnel-vision BS.

I’m also here to remind you that RAID is not a backup – it is *avoidance of downtime*, and it is *reliability of storage.* If you are relying on a RAID array to protect you from data loss, you need to add some extra figures to your budget. **You cannot have your production system also be your backup repository.** If you think that you are safe because all your production data is on a RAID, and you don’t bother with a proper backup, you are going to be in deep kimchee when you have a serious issue with your array.

Now, I suspect there is a kernel of truth inside the concern here – it seems to stem from an article written last year whose theme was “Is this the end of RAID-5” or something similar. That article was quite accurate in its point – that with the escalating size of drives today, and the numbers of them we are using to produce our volumes, it is inevitable that a drive failure will occur – and that during a rebuild, it becomes a mathematical likelihood that a read error will result in a rebuild failure.

All quite true.

But in the reality of many of the conversations I’ve seen the doomsayers trumpeting their end-of-the-world mantras, volume sizes simply do not justify the fear.

Let’s take a realistic look at RAID fails, and figure out the real numbers, so we can all breathe a little calmer, shall we?

As a goal for this article, I want to give you the ability to calculate the odds of data loss in your own RAID systems when we’re done.

First off, we have to look at the risk we are mitigating with RAID…drive failures and read failures. Both come down to a small percentage chance of failure, which is best tied to the figures “Annualized Failure Rate” (AFR, which represents the % of drives that die in a year) and “Unreadable Read Error” (URE, which represents an attempt by an array to read a sector and fails, probably due to Bit Error).

Google wrote a paper on drive fails about ten years ago, which showed that drives which don’t die in the first few months of life generally last for five years or so before their AFR starts getting up on about 6%-8%, which is generally considered unacceptable for datacenter or other usage that requires reliability. As it happens, BackBlaze (backblaze.com) is a DC that publishes its own empirical hard drive mortality stats regularly, so these figures can be updated in your own records using accurate data for the brands of drive you use.

The most current BlackBlaze chart as of the time of this writing can be found here: https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/

So let’s begin, shall we?

During this article, I’m going to spell out several different scenarios, all real-world and all appropriate for both SMBs and personal operations. I have direct and hands-on with each of them, and it is my hope you’ll be able to perform the same calculations for those arrays within your own sphere of control.

Array 1: 4 Western Digital Red drives, 4TB each in a RAID-5 array.

Array 2: 4 HGST NAS drives, 8TB each in a RAID-5 array.

Array 3: 8 Western Digital Red drives, 6TB each in a RAID-6 array. (we’ll also run over this in RAID-5 just to be thorough)

Array 4: 12 Seagate Iron Wolf Pro drives, 10TB each in RAID-6 (as with the above, we’ll hit it at RAID-5 too)

Array 5: 12 Seagate Enterprise Capacity drives, 8TB each in RAID-6 (and RAID-5)

Array 6: 12 Seagate 300GB Savvio drives, RAID-5

Array 7: 7 Seagate 600GB Savvio drives, RAID-5

(Note: Enterprise Capacity drives have been re-branded by Seagate and now go by the name “Exos”)

We start by collecting fail rates on those drives, both annualized fail rates from the empirical charts at BackBlaze, and the averaged bit-read error rate. Note that AFR increases with age, high temperature, and power cycles; it lowers for things like using Helium as a filler (despite this making all your data sound like it was recorded by Donald Duck). The bit error rate figures are drawn directly from the manufacturer’s sites (and can often be found as BER, “bit error rate”), so there will be some ‘wiggle room’ in our final derived figures.

Drive | Annualized Failure Rate | Bit Error Rate |

WD Red 4TB | 2.17% | 1 per 10e14 |

HGST NAS 8TB | 1.2% | 1 per 10e14 |

WD Red 6TB | 4.19% | 1 per 10e14 |

Iron Wolf Pro 10TB | 0.47% | 1 per 10e15 |

Iron EC 8TB | 1.08% | 1 per 10e15 |

Seagate Savvio .3TB | 0.44% | 1 per 10e16 |

Seagate Savvio .6TB | 0.44% | 1 per 10e16 |

For reference, the reason why people often follow up the statement “RAID-5 is crap” with “unless you use an SSD” is because SSDs have a BER of around 1 per 10e17 – a BER on an SSD is extremely rare.

With these figures, and with the sizes of the arrays and their types known, we can prepare the variables of the equation we’ll build.

Num: Number of drives in the array

ALoss: Allowed loss – the number of drives we can afford to lose before unrecoverable data loss occurs.

AFR: Annualized Failure Rate (derived from empirical evidence)

URE: Unrecoverable Read Error, this is the same as “Bit Error Rate” above

MTTR: Mean time to repair – this will vary depending on your drive sizes, cage controller(s), memory, processor, etc. I’m going to just plug in “24 hours” here, you can put in whatever you feel is appropriate.

We’re also going to be playing a probability game with these, since we don’t know exactly when something is going to blow out on us, we can only assume statistical probability. To set the stage, let’s play with a few dice (and that’s something I know quite a bit about, having written a book on Craps some decades ago). We want to establish the probability of a particular event.

The probability of something is = number of sought outcomes / number of total outcomes

Starting simple, we’ll use a six-sided die. We want to prepare an equation to determine the odds of *rolling a one on any of ten rolls. *

So our sought outcome is 1. Number of total outcomes is 6. That gives us 1/6, or 0.1667.

We’re trying ten times, which complicates matters. It’s not simply additive. It’s *multiplicative.* And when we’re collating multiple independent events, we multiply the odds of each event against each other. Probability of two events A and B happening together, then, are Prob(A) * Prob(B). If we were asking “what are the odds of rolling a one on each of ten rolls” it would be pretty easy. But that’s not the question we’re asking.

The question we’re asking is *what are the odds of one or more of the rolls being a one?*

We have to invert our approach a bit. We’re going to start with 100% and subtract the chance of *not getting a 1.* If we determine the odds of avoiding a 1 on every single roll, then the chance of getting a roll on any one of our rolls is the inverse of it. The odds of *not* getting a 1 when we roll are 5/6, and there are ten tries being made, so (5/6) raised to the 10^{th}. Then we simply subtract that from 100% to get our answer.

(5/6) raised to the 10^{th} is (9,765,625 / 60,466,176), which is 0.1615 – I rounded a bit.

1-0.1615= 0.8385, which is our result. The odds of rolling a 1 on any of ten individual rolls is 83.85%.

## RAID Types

A little backgrounder on types of RAID for the uninitiated here – and there’s no shame in not knowing, this stuff is pretty dry for all but the total platterhead. I guess that means I’m a bit of a dork, but what the hell.

RAID means “Redundant Array of Inexpensive Disks” and first became popular commercially in the late ‘80s and early ‘90s, when hard drives were becoming economically a big deal. Previously, a strategy called “SLED” was considered the go-to model for storage, and it represented “Single Large Expensive Disk”. RAID took over, because it was a lot more economical to bond multiple inexpensive units into an array that offered capacity equal to a drive which would cost far more than the combined cost of the RAID.

Different RAID types offer different advantages. Importantly, *all of them are considered for use as volumes,* just like you’d consider a hard drive. These aren’t magic, they’re just volumes. How you use them is up to you. *When you store production data on them, they need to be backed up using smart backup practice.*

Most mentions you’ll see regarding RAID include various numbers, each of which means something:

__RAID 0__ – this form of raid uses at least two disks, and “stripes” data across all of them. This offers fast read performance, fast write performance. Usually this RAID limits its use of any physical drives to the size of the smallest in the group (so if you have three 4TB and one 6TB, it will generally only use 4TB of the 6TB drive). This RAID also provides the used capacity in full for storage, so 3 4TB drives will make a 12TB RAID 0 volume. This RAID adds vulnerability: if any one of the drives in the array is lost, you lose data.

__RAID 1__ – this is “mirroring”. It uses an even number of disks (usually just two), and makes an exact copy of volume data on each drive. They don’t have to be the same size, but the volume will only be as big as the smallest drive. Benefit is fast reading (no benefit in write speed) and redundant protection – if you lose a drive, you still have its mirror. It also is fast to create, as adding a second drive only requires that the new drive receive a copy of the other. The performance benefits are limited only to the speed of the slowest member of the array. This method gives up 50% of the total drive capacity to form the mirror.

__RAID 2__ – it’s unlikely you’ll ever see this in your life. Uses a disk for parity information in case of loss of a data disk. It’s capable of super-fast performance, but it depended on coordinating the spin of all disks to be in sync with each other.

__RAID 3__ – Also extremely rare, this one is good for superfast sequential reads or writes, so perhaps would be good for surveillance camera recording or reading extended video tracks. This also uses a parity disk similar to RAID 2.

__RAID 4__ – another rare one, suitable for lots of little reads, not so hot for little writes, also uses a dedicated parity disk like 2 & 3.

__RAID 5__ – this is currently the most common form of raid. It stripes data among all its drives, just like RAID 0, but it also dedicates a portion of its array equal to the capacity of one of its disks to parity information and stripes that parity information among all disks in the array. This is different from the previous forms of parity, which used a single disk to store all parity info. RAID 5 can withstand the loss of any one disk without data loss from the array’s volumes, but a second drive loss will take data with it. This array type has an advantage in write speed against a single disk, but not quite as good as RAID 0 since it has to calculate and record parity info.

__RAID 6__ – this basically takes the idea of striped parity in RAID 5 and adds redundancy to it: this array stores parity info twice, enabling it to resist the loss of two drives without data loss.

__RAID 10__ – this is actually “nested” RAID, a combination of 1 (striping) and 0 (mirroring). This requires at least four disks, which are striped and mirrored. Usually this is done for performance, and some data protection. It’s a little bit more protected than RAID 5, in that it can withstand the loss of one drive reliably, and if it loses a second, there’s a chance that second drive won’t cause data loss. However, this one gives up 50% of the total drive capacity to the mirror copies.

There are also a series of other nested forms of RAID, but if you need those you’re well past the scope of this article.

## Parity

In RAID terminology, “Parity” is a value calculated by the combination of bits on the disks in the array (most famously an XOR calc, but different vendors can stray from this), which generates a bit value which is recorded in the “parity” bit.

In the image here of a RAID 6 array, the first bit of stripe A’s parity would be generated by taking the first bit of each A1, A2, and A3, and performing a sequential XOR calculation on them. This would produce a bit that is recorded on both Ap and Aq. Later, if a disk fails – say Disk 0 bites it – then the system can read the data from the bits in A2, A3, and Ap or Aq to figure out what belongs where A1 used to be. When a new drive replaces the failed Disk 0, that calculation is run for every bit on the disk, and the new drive is “rebuilt” to where the old one was.

There’s also an important point to be made about the *types* of parity you’re looking at in that image. There are multiple ways to calculate the parity bit that is being used. In RAID 5, the most common is an XOR calculation. In this method “bit number 1” on each data stripe is XOR’ed with the next one, and then the next, etc. until you reach the parity stripe and the result is then recorded there. Effectively this is a “horizontal” line drawn through each disk, ending in the parity stripe. So when you need to know what was on that data disk (whether rebuilding or just reading), it can be re-constructed by backing up that XOR equation.

And then…the gods rose from R’lyeh to lay down RAID 6 parity.

Most RAID-6 uses an encoding method for its extra parity called “Reed-Solomon” (this method is used in a lot of data-reading applications, like barcode scanners, DVD readers, and low-bandwidth radio data transmission). This method manages to record parity against a second missing piece using other data in the array – RS encoding builds its parity using an algorithm that generates something like a scattergram of source bits, both vertical and horizontal (which makes it resistant to the loss of a second data disk – if it just copied the XOR result of the first disk then a second data disk would corrupt the intent). I’m not going to pretend I understand the Galois Field and other heavy-duty math behind this stuff, I just know it exists, it is commonly used for RAID-6, and Dumbledore or the Old Ones were probably involved somewhere along the way. It costs more CPU- and IO-wise, which is why it isn’t commonly used in RAID-5.

(I say “Most” RAID-6, because other vendors can use different methods – for example, Adaptec has their own proprietary algorithm in their hardware controllers, different from RS, but the functional result to us as users is the same.)

## Data Loss

What is it that takes us into data loss territory? Obviously, dropping the entire cage while powered up and running will get us there fast. Let’s make the assumption that if something along those lines were to occur, you’d have an entirely different set of problems, and you wouldn’t have time to be perusing this article. Instead, we’ll focus on natural wear-and-tear. To get to data loss, there are three steps:

- Initial drive failure(s), and…
- Enough drive failures in any point before preservation exceeding our acceptable loss

Some other topics we’ll talk about:

- Possibly drive failure during rebuild (I’ll tell you towards the end here why you should have caution before starting that rebuild)

…and/or…

- Read error during rebuild (this is why 2 will require caution)

This brings me to a very important point, and one around which this entire discussion revolves: protecting your data. I think the entire “RAID-5 is poopy” argument stems from the forgetfulness that one must never rely on RAID levels as the only protection of your data. RAID serves to make you a nice big volume of capacity, and protects your uptime with some performance benefits.

*It does not magically provide itself with backup*. You have to back it up just like anything else.

So if you’re creating a 3TB array, get something that can back that array up and has the capacity on reliable forms of storage to keep your data safely.

## Drive Failure

### First Failure

The initial drive failure is a compound figure of the AFR by the number of drives, and we’ll figure it on an annual rate. This part is pretty simple, let’s go back to our dice equation and substitute drive values:

Drive Loss Rate = *what are the odds at least one drive will die in a year?*

If it’s just one drive, that’s easy – use the AFR.

But it’s multiple drives, so we have to approach it backwards like we did with dice rolls.

So it’s 100% minus (1-AFR)eNumberOfDrives.

For my Array 1, for example: those WD’s have an AFR of .0217. Plugging this into the equation above yields:

100% – (1-AFR)e4 = 100% – 91.59% = 8.41%

So I have about an 8.41% chance of losing a drive in a given year. This will change over time as the drives age, etc.

### Drive Failure Before Preservation

So let’s assume I lost a drive. I’m now ticking with no redundancy in my array, and what are my chances of losing another to cause data loss during the window of time I have to secure my data?

This one is also pretty simple – it’s the same calc we just did, but we’re doing it only for the gap-time before we preserve the data and for the remaining drives in the array. Let’s use two examples – 24 hours, and two weeks.

24 hours: 1 – (1-(AFR * 0.00273))eN

Where AFR is the AFR of the drive, N is the number of drives remaining. The 0.00273 is the fraction of a year represented by 24 hours.

2 weeks: 1 – (1-(AFR * 0.0384))eN

0.0384 is the fraction of a year represented by 2 weeks.

If it’s my Array 1, then we’re working with WD reds which have a 0.0217 AFR. I lose a drive, I have three left. Plugging those values in results in:

24 hours: 1 – (1-(0.0217 * 0.00273))e3 = 1 – (0.99994)e3 = 0.0001777, or 0.01777% chance of failure

2 weeks: 1 – (1-(0.0217 * 0.0384))e3 = 1 – (0.99917)e3 = 0.002498, or 0.2498% chance of failure

We now know what it will take for my Array 1 to have a __data loss__ failure: 8.41% (chance of initial drive failure) times the chance of failure during the gap when I am protecting my data. Assuming I’m a lazy bastard, let’s go with 2 weeks, 0.2498%.

That data loss figure comes out to be 0.021%. A little bit more than two chances in ten thousand.

Based on that, I’m pretty comfy with RAID-5. Especially since I take a backup of that array every night.

## Unrecoverable Read Error

This figure is generally the one that strikes fear into people’s hearts when talking about RAID-5. I want to establish the odds of a read error occurring during the rebuild, so we can really assess what the fearful figure is:

What is Bit Error Rate? In simple terms, BER is calculated as (# of errors / total bits sent-read). Let’s find a way to translate these miniscule numbers into something our brains can grok, like a percentage.

To start, we’re reading some big quantities of data from hard drives, so let’s bring that into the equation too – there are 8 bits in a byte, and 1,000,000,000 bytes in a Gigabyte. Add three more zeroes for a Terabyte.

Be aware that some arrays can see a failure coming, and have the ability to activate a hot-spare to replace the threatened drive – most SAN units have this capacity, for example, and a lot of current NAS vendors do as well. If yours can’t, this is where you should be paying attention to your SMART health reports, so you can see it coming and take action beforehand. Usually that action is to install and activate a hot-spare. If you have a hot-spare and it gets activated, it receives a bit-for-bit copy of what’s on the failing drive, and then is promoted to take over the position of the failing disk. This avoids rebuild errors and is much faster than a rebuild, but it doesn’t protect from BER, so if there’s a bit error during the copy then the incorrect bit will be written to the new drive. This might not be a big issue, as many file formats can withstand an occasional error. Might even be that the error takes place on unused space.

Rebuilds of an array are another case entirely. The time required is much greater, since the array is reading *every single bit *from the remaining stripe data on the good drives, and doing an XOR calc using the parity stripe to determine what the missing bit should be, and writing it to the new drive. During a rebuild, that bit error poses a bigger problem. We are unable to read, ergo we can’t do the XOR calc, and that means we have a rebuild failure.

(If we’re in RAID-1, by the way, that’s a block-for-block copy from a good drive to the new drive – bit error will end up copying rather than calculating, so there won’t be a failure, just bad data.)

If we had a hot spare, we’d be out of the woods before having to rebuild. But let’s keep looking at that rebuild.

Translating that BER into how likely we have for a rebuild failure…the math gets a little sticky.

UREs, just like drive fails, are a matter of odds. Every bit you read is an independent event, with the odds of failure being the bit-read chance that we collected about the drive. The probability equation comes out looking like this:

Let’s apply the probabilities we started with at the beginning of this article to the drives in my Array 1 now. A reminder, these are WD Red 4TB drives. Western Digital sets a BER value of 1 per 10e14.

Array 1 blows a drive. I’ve got three left, and a new 4TB I popped into the array. I trigger the rebuild. We’ve already said 24 hours, so we’ll stick with that (technically it’s closer to 10h for a 4TB, but big deal).

Edit 10.10.2018 – I have identified a mistake in my calcs here courtesy of the Spiceworks forum. Parity data is being read from more drives than I originally laid out. by the time you read this, the information below will have been corrected.

My array now has to perform *three *reads (two data and one parity) to get each value to be written to the new drive – a read on the stripe, and a read on the parity stripe. So I’m actually reading twice the volume of the target drive.

4TB is 4,000,000,000,000 bytes. Three times that is 12,000,000,000,000. 8 bits per byte means 96,000,000,000,000. Which is a crap-ton of bytes.

However, 10e14 (the BER of our WD drives) is 100,000,000,000,000. That’s an even bigger crap-ton. Not that much bigger, but bigger.

So let’s ask the question, and plug in the numbers. The question:

*During my rebuild, what are the odds of rolling a mis-read on any of my 96,000,000,000,000 reads?*

As before, let’s invert this question and ask instead, *what are the odds of not rolling a mis-read on every one of our reads? *and then subtract that from 1.

Odds of successful read on each of these reads is 99,999,999,999,999 / 100,000,000,000,000. We’re trying 96,000,000,000,000 times. Most of our PCs can’t raise something to the 96-trillionth power, I’m afraid. Even Excel’s BINOM.DIST will barf on numbers this size. You’re going to need a scientific calculator to get this done.

1 – (99,999,999,999,999/100,000,000,000,000)e96,000,000,000,000 =

1 – (.99,999,999,999,999)e96,000,000,000,000 =

(now you’re going to have to trust me on the following figure, I got it from the scientific calculator at https://www.mathsisfun.com/scientific-calculator.html)

1 – 0.38318679500580827 = 0.6168132049941917

So the odds of a BER giving my Array 1 a bad case of indigestion is 61.68%. That’s a pretty scary figure, actually, and I’ll get to the mitigation of it later. It’s this kind of figure that I think generally gives people enough of the willies to make that crazy “RAID-5 is for poopyheads!” proclamation. Very likely because the people who make that claim assume that this is the end of the road.

Thankfully, we’re looking at odds of *data loss*. Not necessarily *rebuild failure*, though that does factor into the odds of loss.

## The Equation for Data Loss

In order to have loss of data, basically we have to lose a number of drives that our array cannot tolerate, before we can protect or preserve that data.

Let’s say that window of time comes out to two weeks. That’s probably a lot more than we need, so it will inflate the odds to a conservative number. Two weeks is 336 hours, .038 of a year.

So given that, the basic odds of data loss are:

For RAID-5, we need to lose a second drive for data loss. That means odds of Initial loss * odds of another loss during window (remember that these are multiplicative, not additive). If all the arrays I mentioned above were RAID-5, and using the “lazy bastard” two-week window, here’s where we’d be:

Array and # drives | Drive Type | Annualized Failure Rate | Odds of Initial Loss | Loss during Window | Total Chance |

1 – (1-AFR)eN | 1 – (1-(AFR * 0.0384))e(N-1) | Initial * Window Loss | |||

Array 1 – 4 drives | WD Red 4TB | 2.17% | 1-(1-.0217)e4 = 8.41% | 1-(1-(.0217*.0384))e3 = 0.25% | 0.00021, or 0.021% |

Array 2 – 4 drives | HGST NAS 8TB | 1.2% | 1-(1-.012)e4 =
4.7% |
1-(1-(.012*.0384))e3 = 0.138% | 0.00006486, or 0.0065% |

Array 3 – 8 drives | WD Red 6TB | 4.19% | 1-(1-.0419)e8
= 28.99% |
1-(1-(.0419*.0384))e7 = 1.12% | 0.003249, or
0.3249% |

Array 4 – 12 drives | Iron Wolf Pro 10TB | 0.47% | 1-(1-.0047)e12 =
5.94% |
1-(1-(.0047*.0384))e11 = 0.216% | 0.0001283, or
0.01283% |

Array 5 – 12 drives | Iron EC 8TB | 1.08% | 1-(1-.0108)e12 =
12.217% |
1-(1-(.0108*.0384))e11 = 0.455% | 0.0005559, or
0.05559% |

Array 6 – 12 drives | Seagate Savvio .3TB | 0.44% | 1-(1-.0044)e12 =
5.154% |
1-(1-(.0044*.0384))e11 = 0.1857% | 0.0000957, or
0.00957% |

Array 7 – 7 drives | Seagate Savvio .6TB | 0.44% | 1-(1-.0044)e7 = 3.04% | 1-(1-(.0044*.0384))e6 = 0.1013% | 0.0000308, or
0.00308% |

I think the values above show definitively that RAID-5 is a perfectly viable storage mechanism.

## RAID-6 Enters the Fray

With RAID-6, we’re now adding a second parity stripe distributed among the disks of the array. In order for this type of array to fail, we have to have a third disk die during the window. I won’t repeat the entire set of equations, because that would be a pain in the ass. Basically, we’re adding a new column, called “Second Loss During Window”, which has the exact same formula as the “Loss During Window” one. The only difference is that the exponential is one less. Once we get the result of that column, we multiply it with the Initial Loss and Loss During Window to get the real figure of data loss.

Array and # drives | Drive Type | Annualized Failure Rate | Odds of Initial Loss | Loss during Window | 2^{nd} Loss |
Total Chance |

1 – (1-AFR)eN | 1 – (1-(AFR * 0.0384))e(N-1) | 1 – (1-(AFR * 0.0384))e(N-2) | Initial * Window Loss | |||

Array 1 – 4 drives | WD Red 4TB | 2.17% | 1-(1-.0217)e4 = 8.41% | 1-(1-(.0217*.0384))e3 = 0.25% | 1-(1-(.0217*.0384))e2 = 0.16% | 0.0000003364, or 0.00003364% |

Array 2 – 4 drives | HGST NAS 8TB | 1.2% | 1-(1-.012)e4 =
4.7% |
1-(1-(.012*.0384))e3 = 0.138% | 1-(1-(.012*.0384))e2 = 0.092% | 0.0000000597, or 0.00000597% |

Array 3 – 8 drives | WD Red 6TB | 4.19% | 1-(1-.0419)e8
= 28.99% |
1-(1-(.0419*.0384))e7 = 1.12% | 1-(1-(.0419*.0384))e6 = 0.9615% | 0.00003122, or 0.003122% |

As you can see, even if you’re a lazy bastard your chance of data loss in the window of vulnerability, RAID-6 makes the odds of data loss vanishingly small.

## Failure Mitigation

So you had a drive blow out in your RAID-5 or -6 array, and you’re staring at the column of Loss Window now, wondering what to do.

The most important action you can take right now is this:

** CALM DOWN**.

You haven’t lost data yet. But by hasty action, you might. Stop, breathe. Do NOT touch that array, and do NOT power it down just yet. If one of your disks has checked out of the hotel, when you reboot the cage, there’s a chance it could “unrecognize” that disk and re-initialize the array, blowing your data into never-never land.

Steps to take here:

- DO
STUFF A NEW DRIVE IN THE ARRAY AND REBUILD. NOT YET.__NOT__ - If you haven’t done so already, write down your RAID configuration. Include total capacity, disk types, stripe size, drive order, partitions/volumes and any other details you can get.
- Can you isolate the array from users? If you can, do it. Get their IO off the array if possible.
- Check your backups and confirm that you have a backup of the array’s data.
- Get another volume online that has capacity at least equal to the total used space on the degraded array. One of the easiest methods of doing this is a USB 3.0 drive cradle and a set of SATA drives.
- Copy all your data from the array onto this volume and confirm that it is valid

- If you can affirm that 5.a is done and good, proceed
- Are all the drives in the cage the same age? If so, get replacements for all of them and start a completely new array with the new ones. Retire the old drives.
- Reason for this is that they have all experienced similar wear-and-tear, and they all probably come from the same batch made at the factory – if there is a defect in one, there’s a good chance that this defect applies to all of them. You’re better off just dropping them all and replacing them.
- If they aren’t the same age, just note the ones that are, and plan to replace them asap.

- Okay, if 4 is good and 5 is good, NOW you can do a rebuild if you feel you have to. I still recommend reinitializing completely fresh and restoring the copied/backed up data, but I also recognize that convenience is a big draw.

Part of the whole debate about the validity of RAID-5 tends to stem from the probability of failure during a rebuild – which can be unacceptably high with old disks of appreciable size (see my section on UREs above). The argument seems to make the assumption that the array is either not backed up, or is somehow on critical path for general use by users.

Rebuilding an array while live and in production use should be considered a last resort. You can see above that there is a high likelihood of failure even from reasonably modest size arrays. The fact that current RAID vendors offer live-system rebuilds should be considered a convenience only at this point. When we were using 100Gb disks, a live rebuild was a viable option, but that simply doesn’t fit any more.

If your array is in that position – critical path and not backed up – then you have a big problem. You need to get a backup arranged *yesterday*. And if it is critical path, then you should ensure that there is a failover plan in place. Never assume that just because you have your critical data on RAID that you are totally safe. You are *safer *in the case of a drive fail, yes, but you aren’t out of the woods.

Stuff to consider that will help you survive an array failure:

- Buy a USB cradle or a tape drive that can handle the capacity of your RAID array. Use them religiously to preserve your data.
- Test them regularly (monthly is good) to ensure that when a fail does happen, you’re prepared to recover.

- Consider a second array, or a big-ass disk that you can house next to the array, of similar capacity that you can set up on a synchronization system (for example, Synology has a “Cloud Station Server” and “Cloud Synch” apps that can be used to ensure one NAS maintains exactly the same content as the other). That becomes your fail-over.
- Unless you absolutely have to, do not rely on the use of a live rebuild to preserve your data.
- If you have room in your cage, add another drive and convert your RAID-5 to RAID-6 to buy you extra insurance against multiple drive failure.
- Smaller volumes are better than big ones – you can shovel smaller volumes onto a USB drive more easily than trying to subdivide one large one onto multiple removable drives.
- When filling up an array, buy disks of the same brand and capacity, but mix up who you buy them from or buy them over time to protect you from factory batch errors.

## Summary

There’s no “magic panacea” here with RAID systems. They’re great, they’re effective, and there are simply some things that they do not do. I hope that I have helped dispel some of the fear about RAID-5 here, and it is also my hope that I have perhaps called attention to any gaps in your data coverage so that you can fill them now rather than wait for the inevitable to occur. With luck, you can breathe a little easier now, and not be too harsh on RAID-5.

Feel free to write me with any questions, comments, death-threats, or mathematical corrections you might feel necessary. Meanwhile, happy computing.

Edit 13.08.2018: I whipped up the figures into a spreadsheet that you can download and use for your own arrays as well.

Edit 10.10.2018: edited for clarity, and corrected math on UREs. Also corrected spreadsheet which is linked below.

## 3 Responses to

RAID-5 and the Sky Is Falling