RAID 6 triple drive failure, can I manually overcome a puncturing bad block? MR 9361-8i

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

twin_savage

Active Member
Jan 26, 2018
170
127
43
35
There's two things hardware RAID can do. The first is to have block-level checksums, where you format the drive as 520 or 528 bytes instead of 512 (or 4160/4224 instead of 4096) and the extra space is used for checksums. This lets the controller immediately detect a damaged block, and doesn't risk a write hole. However, this requires drives and a controller that support that. Typical patrol reads are only looking for unreadable sectors, and don't protect against bit rot, at least not unless they also have the extra checksum bytes.
The larger logical sectors sizes you speak of fall under Data Integrity Field (DIF) which has fallen out of favor the past decade due to the very robust code rate used with LDPC on newer hard drives, to the point where the industry has collectively decided DIF is no longer appealing and most vendors have dropped it... however I've seen DIF written into contracts that tie vendors hands on what they provision; I think this is a hold over from the days in which DIF actually did have utility due to the less robust ECC that hard drives use to employ.

ypical patrol reads are only looking for unreadable sectors, and don't protect against bit rot, at least not unless they also have the extra checksum bytes. You can do a full consistency check to verify that the data on each drive matches (or that the parity matches the expected data in RAID 5/6), but unless you have a 3-way mirror or RAID 6, or block-level checksums, that suffers from the issue that there's no authoritative source for which drive is correct.
I am assuming we're talking about parity raid in my example, patrol scrubs, volume checks or consistency checks as some call absolutely will find if there has been bit flips and the controller will provide different ways of reconciling the inconsistency you choose in firmware, here's what the more limited version of those options (raid 5) would look like through the raid card's out of band management webpage:
1777148564177.png



With raid 6 you get more options (not all hardware raid controllers implement these expanded options) like:
1777148632537.png

Even if the FS or application has some way of detecting the bit rot, there's no way to tell the HW raid controller to try reading from the other drive instead.
This is something I wish the industry would get together on and come up with a standard so the OS could pass information about this to raid subsystem. I kind of had hope this would happen at some point in the future with Microsoft's Storage Spaces system, particularly in conjunction with ReFS, but I've soured on this after seeing how messy Microsoft's support of ReFS has been.
 

DarkServant

Active Member
Apr 5, 2022
124
99
28
What i see are in theory a nice reliable concepts, which suffer in practical environment more or less.
Those settings in the newer Areca cards look promising.

The use of drives from different vendors is a good start. Remember those 40k hours bug, if one uses all the same SSD models (way lower prices if you buy a pack of 1000 pieces) the whole system failed. But even I didn't follow this rule.

In Klennet's post both the hardware RAID and ZFS software RAID are in question.
In ZFS it is the recommended hardware-base which is a main problem (ECC-DRAM and enterprise-grade drives ->power-loss protection, e.g. sm883 instead of 860pro, plus settings that DRAM is only for read-caching).
Did you know that RAID used to be an abbreviation for Redundant Array of Inexpensive Disks?
I did not know that LDPC was integrated into HDD's by the time of switching to 4k sector size. But even DDR5 uses on-die ECC which was only integrated to get the increased susceptibility to errors from the new process under control.
There are some things one can do like using "Zefr" memory from (ex?) SMART Modular Technologies, but that is only one part of the system.

All that is not enough. The code-base is written by people and people make mistakes, if you see all the firmware revisions on even an enterprise drive it does not look very solid, plus the firmware in the RAID-controller... I am constantly amazed how stable all the core systems work.

I heard that in critical systems like passenger airplanes are five independent systems; if there are two with a different results, the result of the other three is with an enough high probability the right one and will be chosen.

About the cooling design, there is the capability to get way better cooling even in a single slot. There is space behind or above the card, copper and heatpipes are nothing really new to get the heat to another place quickly - sorry look at this cheap piece of a heatsink 9266-8i for a dual-core PowerPC plus the whole SAS controller logic. The use of plastic push-pins instead of brass ones is another point. It is all a question of the material costs, the margin of profit and planned obsolescence.
About active coolers, it would be a good way if the cooler is a standard size and easy to replace when it fails (which is not the case for an rtx pro 6000 10k US$ GPU...).


PS: In the Samsung SSD F/W-Repository are some excel-sheets (in some of the directories) which contains some change-logs, there you can see what kind of bugs are even in the enterprise-grade SSD's
 
  • Like
Reactions: TRACKER

Fritz

Well-Known Member
Apr 6, 2015
3,763
1,699
113
72
Disk identification is not a ZFS feature nor a filesystem feature at all but can be done with a management GUI and/or a disk controller card or HBA that supports disk identification. Raid-Z expansion (ex 6 disk Z2 to 7 disk Z2) is not really an enterprise need but in ZFS for over a year.

Hybrid raid pools from hd and flash are a big ZFS advantage. You can define per vdev whether Metadata, small files or all files should be on hd or flash. A zfs rewrite command can move data between hd and flash.

ZFS Draid with hundreds of disks can offer ultrashort rebuild time in case of a disk failure.

The upcoming ZFS AnyRaid is a breakthrough in Raid technology, not only for ZFS. It fully supports the complete disk capacity in a Raid 1 or Raid Zn config from disks of different size with vdev expansion and vdev shrink (add/remove disks on AnyRaid).

ZFS can check/verify data in a Raid in online state, no offline chkdsk/fs check needed that can last days

ZFS replication can sync Terabyte Raid highload pools down to a delay of a few seconds even with open files in current state.

Advantages in a hardware raid not available or possible in ZFS software raid are very rare. The main advantage of a hardware raid may be a Windows system with a boot os ntfs mirror. This is mainly due the current lack of a stable boot software raid in Windows (modern Windows software raid =Storage Spaces cannot boot) but an upcoming ReFS bootmirror with Copy on Write and checksums may change that too.

The overall package is what counts.
When, if ever, will FreeNAS see all this?
 

Maery Fedorica

New Member
Jan 19, 2021
6
2
3
The heatsinks on any modern LSI/Adaptec etc are "adequate enough", IF they're running in a server environment. You can't have giant heatsinks on them from the factory, because they're meant to occupy a single slot, and that too within HHHL/FHFL/FHHL specs. There simply isn't room on the card for more heatsink without compromising the specs.
I long, long ago added a small and tall fan to my LSI Avago Broadcom MegaRAID cards. When I updated to a different card, the fan came along. I let it spin at 3,000 rpm on a regular basis, and ramp it up somewhat during a manual rebuild just in case.

The modification is relatively quick. They go on at an angle to the heatsink. But if you've seen enough aluminum heatsinks in your time messing with servers, you'd eventually come across one in which the heatsink fins are used as a screw-holding mechanism. With the right screws, they create their own threads in the soft aluminum heatsink.

I recommend a brushless 40mm x 20mm fan, like the Amazon offering B07VXT95GN and screws that I am guessing are a 10-32 size, approximately an inch long. I am attaching a photo of two MegaRAID controllers with my approved fan attached to one, ready to swap to the newer one. While the "battery" pack (supercapacitor) might be the same, I think the connections are different in some way. Left is the 9265CV-8i and right is the 9361-8i before I changed the back plate.

The height of this fan may prevent an adjacent PCIe slot from being used to its full extent. On the old card, I think it helped with the near-constant CRC errors that it was prone to generate. The fan I use can exceed 6,000 rpm.

40mm by 20mm fan for LSI MegaRAID cards B07VXT95GN small1.jpg
 
  • Like
Reactions: is39

kapone

Well-Known Member
May 23, 2015
2,009
1,374
113
I long, long ago added a small and tall fan to my LSI Avago Broadcom MegaRAID cards. When I updated to a different card, the fan came along. I let it spin at 3,000 rpm on a regular basis, and ramp it up somewhat during a manual rebuild just in case.

The modification is relatively quick. They go on at an angle to the heatsink. But if you've seen enough aluminum heatsinks in your time messing with servers, you'd eventually come across one in which the heatsink fins are used as a screw-holding mechanism. With the right screws, they create their own threads in the soft aluminum heatsink.

I recommend a brushless 40mm x 20mm fan, like the Amazon offering B07VXT95GN and screws that I am guessing are a 10-32 size, approximately an inch long. I am attaching a photo of two MegaRAID controllers with my approved fan attached to one, ready to swap to the newer one. While the "battery" pack (supercapacitor) might be the same, I think the connections are different in some way. Left is the 9265CV-8i and right is the 9361-8i before I changed the back plate.

The height of this fan may prevent an adjacent PCIe slot from being used to its full extent. On the old card, I think it helped with the near-constant CRC errors that it was prone to generate. The fan I use can exceed 6,000 rpm.

<snip>
Oh I do similar things...This is my main workstation...wall mounted (my desk space is taken up by quad 4K monitors...). Notice the lil fan on the Mellanox NIC (right next to the power supply)? This is an "open air" wall mounted PC, so there's very little airflow per se (but convection works), so needed a bit of airflow on that card.

The CPU is water cooled by that giant ass AIO and the GPU is...well...huge...by itself with enough cooling built in.

IMG_0252 copy.jpg
 
  • Like
Reactions: itronin and Fritz

Whaaat

Active Member
Jan 31, 2020
424
232
43
This VD holds about 6 TB in an array of 3 TB Seagates
Shitgate is the reason. Controller did everything it could. None of the 3TB (ES.3, v4, v5) I've seen lasted to nowadays, but I still have old 0.75TB models with 15 years uptime and they are as solid as tanks. Mass hard drive suicides after approximately 2010 is so seagatish...
 

Fritz

Well-Known Member
Apr 6, 2015
3,763
1,699
113
72
In my experience, Seagate isn't any better or worse than the rest of them. You pay your nickle and you take your chance.
 
  • Like
Reactions: nabsltd

Whaaat

Active Member
Jan 31, 2020
424
232
43
In my experience, Seagate isn't any better or worse than the rest of them.
Yep, moral of the story in a nutshell - if your array is made of seagate drives and one of them failed, the array is doomed as others will fail simultaneously during rebuild attempt
 

Fritz

Well-Known Member
Apr 6, 2015
3,763
1,699
113
72
These days I use mostly Drivepool. It has some nice features, one of which is if you have a x2 or above pool and a HD fails, recovering is as simple as replacing the failed drive then Drivepool will automatically rebuild the lost drive without any other input from you. Even grandma could do it. and even if you have a 1x pool you'll only lose the data on the failed drive, the rest of your data remains intact and readable.
 

kapone

Well-Known Member
May 23, 2015
2,009
1,374
113
These days I use mostly Drivepool. It has some nice features, one of which is if you have a x2 or above pool and a HD fails, recovering is as simple as replacing the failed drive then Drivepool will automatically rebuild the lost drive without any other input from you. Even grandma could do it. and even if you have a 1x pool you'll only lose the data on the failed drive, the rest of your data remains intact and readable.
But…you lose 50% of your storage…

These days, there’s very little data that shouldn’t be on a resilient volume.
 

Fritz

Well-Known Member
Apr 6, 2015
3,763
1,699
113
72
I have mini backup server with 4 14TB HDs. I recently made an incredible blunder, I accidentally deleted the partition on one of them while I had it out of the server, I mistook it for another drive. Made me feel sick but I said o well, poop happens. I then put it back in the backup server and added it back to the pool. Drivepool did the rest automagically. I even forgot it was a x2 pool but I'm glad it was. I'd rather lose half my storage space than all my data but to each his own.
 

kapone

Well-Known Member
May 23, 2015
2,009
1,374
113
I have mini backup server with 4 14TB HDs. I recently made an incredible blunder, I accidentally deleted the partition on one of them while I had it out of the server, I mistook it for another drive. Made me feel sick but I said o well, poop happens. I then put it back in the backup server and added it back to the pool. Drivepool did the rest automagically. I even forgot it was a x2 pool but I'm glad it was. I'd rather lose half my storage space than all my data but to each his own.
No no, I’m agreeing with you! Except there’s more efficient ways to use storage than lose half of it.
 
  • Like
Reactions: Fritz

Fritz

Well-Known Member
Apr 6, 2015
3,763
1,699
113
72
No no, I’m agreeing with you! Except there’s more efficient ways to use storage than lose half of it.
True, but there are none as simple as DP. Instructions I rarely use are lost in time and often lost in space too. I had a TrueNAS box for years, it was the most user unfriendly OS I have ever dealt with and so was the support forum. While it worked well for me I never had a drive fail but I suspect dealing with one would have been a major ordeal. The only thing it offered that I really wanted was bit rot protection. If, one day, storage prices come back down to earth I'll probably build another but for now, all is good.
 
  • Like
Reactions: kapone

etorix

Active Member
Sep 28, 2021
260
152
43
Yep, moral of the story in a nutshell
3 TB drives? Give Seagate a break: That bad generation was in a very distant past by now.

Given the choice, I'd pick Toshiba MG or Seagate Exos over any colour of WD—as fair retribution for sneaking SMR into NAS lines and doubling down with the concept of "5400-rpm class"…
Given the actual current market, I take what I can find at acceptable prices. With just three manufacturers, you just cannot further restrict your choices.
 
  • Like
Reactions: Fritz

Fritz

Well-Known Member
Apr 6, 2015
3,763
1,699
113
72
At current market prices, I choose none of the above.
 

nabsltd

Well-Known Member
Jan 26, 2022
813
614
93
These days I use mostly Drivepool. It has some nice features, one of which is if you have a x2 or above pool and a HD fails, recovering is as simple as replacing the failed drive then Drivepool will automatically rebuild the lost drive without any other input from you.
That sounds exactly like how every single reasonable storage system works. It's exactly what I do when a drive fails on any of my LSI RAID controllers.

Any storage system where you need to insert the new drive then instruct the system on what to do with that new drive isn't very user friendly.
 
  • Like
Reactions: Fritz

Fritz

Well-Known Member
Apr 6, 2015
3,763
1,699
113
72
The last time I used RAID controllers you had to jump thru too many hoops to replace a failed drive. apparently it's different now. This is good but what isn't is the max practical number of HDs per pool. Seems that any number over 8 is asking for trouble and stripped set or whatever consume too many drives so you end up back where Drivepool is.
 

kapone

Well-Known Member
May 23, 2015
2,009
1,374
113
The last time I used RAID controllers you had to jump thru too many hoops to replace a failed drive. apparently it's different now. This is good but what isn't is the max practical number of HDs per pool. Seems that any number over 8 is asking for trouble and stripped set or whatever consume too many drives so you end up back where Drivepool is.
I've been running 16-wide RAID-6 sets for...a long time. It's not even RAID-6 per se, but RAID-60, because I run three RAID-6 sets per chassis (48 drive chassis') in a RAIID-60 configuration. Each chassis is ~1PB.

So, each 48 drive chassis could potentially have 6 total drives fail, or 2 per RAID-6 and it'll totally survive that. Disk failures and swapping in a new disk works exactly like it should. The RAID card is configured for "Automatic Rebuild" (it's a simple checkbox). Open the chassis cover-->the failed HDD is blinking-->Pop in a new disk in it's place and walk away.

Given the amount of data I deal with, I have alerting and monitoring integrated into this as well. When a disk fails I get an alert, when the rebuild starts I get an alert, when it finishes I get an alert. But...plug and play. :)
 
  • Like
Reactions: Fritz