How a very large ZFS pools configured ?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

PigLover

Moderator
Jan 26, 2011
3,184
1,545
113
We actually understand you position. The problem is that your position is based on ideas that are not consistent with how SSD actually work or any any facts or data published elsewhere.

Without facts or data or empirical evidence to support it all we are left with is a rather odd misinformed idea. Everyone is, of course, entitled to whatever ideas they choose to believe. But when they start advising others, who may be novice and naive, they take on some an obligation to reasonably defend that advice - and 'I think I remember reading somewhere once' is a pretty weak defense indeed. And 'I'm not in the habit of saving Web links' is fairly interpreted as 'I got nothin'.

Again - always welcome to provide any evidence to support this idea. Until then I continue to call it FUD.

Sent from my SM-G925V using Tapatalk
 
Last edited:
  • Like
Reactions: Fritz

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
We actually understand you position. The problem is that your position is based on ideas that are not consistent with how SSD actually work or any any facts or data published elsewhere.

Without facts or data or empirical evidence to support it all we are left with is a rather odd misinformed idea. Everyone is, of course, entitled to whatever ideas they choose to believe. But when they start advising others, who may be novice and naive, they take on some an obligation to reasonably defend that advice - and 'I think I remember reading somewhere once' is a pretty weak defense indeed. And 'I'm not in the habit of saving Web links' is fairly interpreted as 'I got nothin'.

Again - always welcome to provide any evidence to support this idea. Until then I continue to call it FUD.

Sent from my SM-G925V using Tapatalk
Agreed.


Well, unfortunately I'm not (yet) in the habit of saving all web links.

To understand my position on homogenous mirrors on SSDs you first need to understand that from my experience SSDs practically never die when their expected lifetime of writes is exceeded. I have a stack of SSDs that I killed (some by accident, some deliberately) that all had the controllers (the ones on the SSD) die under specific "non-expected" write patterns. Doing that was pretty easy, and every new generation everybody came out that those problems are well-known and now fixed in the current generation. Then the hyped Intel SSDs that were all the huge behaved the same way for me and I had it. The Samsung 850 was the first one that I couldn't kill within a day, but somebody over at Tomshardware did discover a pattern to do that. As I mentioned elsewhere I have non-death problems with the Samsung 850, too.

So, if you believe, like I do, that SSDs can be killed with specific write patterns, then very obviously you don't want to back a single raw device with a mirror on drives that will die on the same write pattern.

I do remember that both those patterns and the "don't do raid1 on SSDs" issue were popular enough to be Slashdotted at some point. Today people just say "raid is no backup, what did you expect" when people lose critical mass in arrays, so no real discussion happens anymore.
(expand above quote to see emphasized selections)

Since you don't have any other data, articles or proof except what you've done... what about the "easy" way that you, yourself killed these SSD?

You mention "specific" write patterns... what exactly are your specific write patterns that are taking down "hyped intel" and the other "stack" of SSD that you have sitting there?

What are the "hyped" Intel SSD? Specific make/model please. Firmware version would be nice too.

It seems to me that since you have done it yourself, and repeatedly do it that you can:
- Share how you're killing them exactly
- Narrow down the issue to something in your setup that's killing them

I look forward to the additional information on this topic.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
Wow, how did I miss more discussion about this enthralling topic?

It's perfectly safe to run the same discs (either SSDs or platter drives) in RAID arrays as long as you buy one of my patented magic rocks that prevent the transmission of non-expected write patterns through SATA cables*. I'll also give away a litre of snake oil (a $700 value!) if it doesn't also work as a tiger repellent.

* SAS version is twice the price, obviously.
 

cookiesowns

Active Member
Feb 12, 2016
234
83
28
28
I personally believe what Unwind said has some truth to it. But... the likely hood of this happening in the real-world is very unlikely. I've ran RAID-0 SSD's in production for a while, back in X25-E, or X25-M days.

We needed the IOPs, and at the same time raid-0 has 1 very significant benefit, you are halving all writes to SSD's. So if endurance was your concern RAID-0 might actually increase your MTBF, because each drive is getting 50% of writes, so you have 50% extra of life-time. Assuming SSD's all wear & die at the same exact rate.But I'm sure you guys knew that :p

As for ZFS, personally if it's just for a backup / non production rig, I wouldn't mind going 8-12 drive vdevs. For production stuff in 24 bay chassis's I've stuck with 4x 6 Z2's with 4 & 8TB drives.
 
  • Like
Reactions: Benten93

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
I still think you might be conflating different issues; you seem to be saying that RAID0 works for you to achieve a certain number of IOPS, and that device reliability has been good enough under that configuration that you've not had any significant catastrophic failures. That's all well and good, I've used RAID0 myself before for scratch databases.

(As an aside, in the wild SSD failure from flash write exhaustion seems to be incredibly rare... most SSDs die from failure of the controller in my experience)

unwind-protect seems to be saying that you shouldn't run two of the same SSDs in any sort of RAID array since there are apparently magical Read/Write Patterns Of Doom that will brick your SSD, shave your eyebrows and give your dog Dutch elm disease. This goes from harmless old-wives tale to actively harmful advice IMHO so it needs to be challenged.
 
  • Like
Reactions: T_Minus

cookiesowns

Active Member
Feb 12, 2016
234
83
28
28
Yes,

I should have mentioned I was digressing from the actual topic on hand. Whatever folklore that unwind believes regarding magical write patterns is beyond me.

That said I mentioned the benefits of raid0 in specific scenarios as it actually reduces writes to each drive, unlike a direct mirror which is 1:1, but you guys all probably knew that.
 

unwind-protect

Active Member
Mar 7, 2016
414
156
43
Boston
I have a stack of dead SSDs to prove it.

Send me a consumer SATA SSD and I'll kill it without exceeding the documented amount of allowed writes total. It isn't even difficult. Apart from what I do at work I found that putting ZFS on top of disk encryption in CBC mode will generally do it. Varying the block mode will do some more.

In any case, this doesn't have much to do with why homogenous raid1s are a bad idea if this is the time where you start using a then-new model of storage device. Individual models of harddrives or SSDs that have specific defects in all manufactured pieces are not exactly uncommon.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
You mention "specific" write patterns... what exactly are your specific write patterns that are taking down "hyped intel" and the other "stack" of SSD that you have sitting there?

What are the "hyped" Intel SSD? Specific make/model please. Firmware version would be nice too.

It seems to me that since you have done it yourself, and repeatedly do it that you can:
- Share how you're killing them exactly
- Narrow down the issue to something in your setup that's killing them

I look forward to the additional information on this topic.
I have a stack of dead SSDs to prove it.

Send me a consumer SATA SSD and I'll kill it without exceeding the documented amount of allowed writes total. It isn't even difficult. Apart from what I do at work I found that putting ZFS on top of disk encryption in CBC mode will generally do it. Varying the block mode will do some more.

In any case, this doesn't have much to do with why homogenous raid1s are a bad idea if this is the time where you start using a then-new model of storage device. Individual models of harddrives or SSDs that have specific defects in all manufactured pieces are not exactly uncommon.
Do you purposely ignore myself and others who ask you specific questions? It's as-if you like to repeat yourself over and over without providing any actual information re: your process, and specific use-case experiences.

At-least now we know you're doing it with drive encryption + ZFS but you've been very vague again about: apart from what's done at your work.

We don't know WHAT you're doing at work.
We don't know HOW you're killing them at work.

Are we going to wait over a month for you to reply with a different cryptic story?

I also noticed you went from "hyped intel" to simply "consumer sata ssd"... does this mean you've not accomplished this with an enterprise intel drive?
 

PigLover

Moderator
Jan 26, 2011
3,184
1,545
113
I'll second @T_Minus: describe your specific method for destroying the consumer SSD so others can verify it.

Sent from my SM-G925V using Tapatalk
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
Here's the command I've been using:
Code:
effrafax@wug:~# cat /sys/block/sdk/queue/rotational
0
effrafax@wug:~# cat /sys/block/sdk/armour/invulnerabletohammers
1
effrafax@wug:~# hit -d /dev/sdk --with-tool /dev/massive_fscking_sledgehammer --magical-write-pattern /dev/zero
/dev/sdk now proper dead
 

rubylaser

Active Member
Jan 4, 2013
846
236
43
Michigan, USA
Here's the command I've been using:
Code:
effrafax@wug:~# cat /sys/block/sdk/queue/rotational
0
effrafax@wug:~# cat /sys/block/sdk/armour/invulnerabletohammers
1
effrafax@wug:~# hit -d /dev/sdk --with-tool /dev/massive_fscking_sledgehammer --magical-write-pattern /dev/zero
/dev/sdk now proper dead
+1 for awesome pseudo code :)
 

unwind-protect

Active Member
Mar 7, 2016
414
156
43
Boston
I think it would be overall better to postphone this until after we meet in person. Maybe it's time for a forum meet? :) I'm not a bad guy, I just have a stack of dead consumer SATA SSDs (actually one at work and one at home). I can't connect my online identity to my work and in any case you wouldn't be able to verify it unless we meet in person.

FWIW I had several deaths that were simply running my version of bonnie++ on top of a ZFS on top of either FreeBSD/geli or Linux/dmcrypt.

I think it is important to pick a cipher block mode that does the most unexpected things for the SSD in question.

Now, you can tell from the vagueness of my writing what the big problem is: how do I test whether this is really reproducible? I don't have an endless supply of spare SSDs around. The last SSDs I have at home I stuffed into my gaming-only windows machine where they didn't die yet. At work things have moved on from having to evaluate storage hardware. In any case, nobody there uses SATA SSDs for anything important.

As a result I also don't have a command-by-command record and I can't tell you specifically which OS with which block mode killed the things. On dm-crypt I usually use aes-cbc-essiv:sha256. On geli I use XTS. I believe those are the current defaults anyway.


In addition to all this I repeatedly had consumer SATA SSDs hang for extended periods, or even indefinitely. Not good for long uptime.

FWIW I cannot killed magnetic drives with any access method I had ever used. Seagates die like flies in my hands but I rate that as working as intended.
 

wildchild

Active Member
Feb 4, 2014
389
57
28
Actually, purely from a theoretic point of view i kind of agree with Unwind.

Wether it is write patterns, rotten firmware, other unknows, i dunno, or simply updating firmware to the SSD's while keeping your pool online and running
If given an option i can see why someone would build up a pool of 2 different brands of disks.

i know Dell did it for some time with their ZNAS devices

Actually did it myself with spinners, back in the dying 3 gb disks
 

PigLover

Moderator
Jan 26, 2011
3,184
1,545
113
It's pretty darn hard to 'agree' with this when he isn't willing to disclose any facts or any of his alleged methods. Gets worse when he starts using well known con artist techniques like his latest I won't talk about it until we meet in person.

This is pure and total FUD. Unless and until there are defensible facts and references presented he deserves no more audience here.

Sent from my SM-G925V using Tapatalk
 

unwind-protect

Active Member
Mar 7, 2016
414
156
43
Boston
Actually, purely from a theoretic point of view i kind of agree with Unwind.

Wether it is write patterns, rotten firmware, other unknows, i dunno, or simply updating firmware to the SSD's while keeping your pool online and running
If given an option i can see why someone would build up a pool of 2 different brands of disks.

i know Dell did it for some time with their ZNAS devices

Actually did it myself with spinners, back in the dying 3 gb disks
The Flash gurus at work are saying that the controller lanes on the flash burn out. Quite obviously the firmware has some logic in there that knows how FAT and NTFS typically work and the random looking pattern that you get through the encryption layer isn't expected. Keep in mind the manufacturers want you to do encryption by buying their encryption-carrying products.

Limited amount of controller lanes on consumer SATA SSDs also is why they can go excessively non-responsive where there is too much being written and you suddenly want a block read. You'll have to wait for the potentially very expensive block operation to be finished before you can have a lane for your read request.
 

cookiesowns

Active Member
Feb 12, 2016
234
83
28
28
Did you OP the consumer drives, or were they at max capacity. There was a select few of drives back in early 2011 / 2012 that will die a premature death with Encryption that I recall vaguely..

No, I cannot confirm my source, as honestly, it's been a while.

I can provide some S3710s as tribute, as long as you can give me a reproducible set of instructions =)
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
Was a bug in the Intel 320 series IIRC, caused a fair stir at the time.
Firmware update now available - Addresses Bad Context 13x Error

wind-up now seems to be going for full-on conspiraloon in that there are now datastreams that cause drives to self-destruct because they want you to buy more expensive drives, because the controller is expecting to see some special FAT/NTFS stuff* and when it doesn't get it, it gets upset and sets fire to itself. Utter tripe and tosh.

* Not quite sure what that has to do with the ZFS of the thread but I think we left Kansas behind some time ago, maybe the wizard will give me a heart. If people seriously think that SSD controllers have a fundamental understanding of filesystem formats, and won't work properly without understanding them, then I think you need to have a long, hard look at your "gurus". Or maybe even a wikipedia page?

This is pure and total FUD. Unless and until there are defensible facts and references presented he deserves no more audience here.
Well said.
 
  • Like
Reactions: cookiesowns