SAS3 HBA Trim Support Without DRAT RZAT Requirement

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Ryushin

New Member
Nov 7, 2019
17
3
3
I just upgraded my server to use ZoL .8.x and I filled it with six new Crucial MX500 2TB drives in a raidz2 that are sitting in a Supermicro 36 Drive SAS3 Chassis connected to a LSI 9300-8i (3008 chip) SAS HBA. To my complete surprise, trim does not work. Digging into these threads:

Can't TRIM Samsung 850 EVO · Issue #8874 · zfsonlinux/zfs
Topicbox
Broadcom Inc. | Connecting Everything

The LSI HBA in IT mode requires DRAT (deterministic read after trim) and RZAT (read zeros after trim) in order for trim to work.

So the Crucial MX500 SSDs does not support DRAT or RZAT.
According to hdparm -I:
hdparm -I /dev/sdn | grep -i trim
* Data Set Management TRIM supported (limit 8 blocks)

From what I can see, there is no firmware upgrade for the Crucial MX500s that add support for DRAT or RZAT So the only option left, is a new SAS3 HBA. Can anyone recommend a native Linux kernel supported SAS3 controller that allows trim to happen without needing DRAT and RZAT. I checked Supermicro's site for HBAs and they all seem LSI based.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
I just upgraded my server to use ZoL .8.x and I filled it with six new Crucial MX500 2TB drives in a raidz2 that are sitting in a Supermicro 36 Drive SAS3 Chassis connected to a LSI 9300-8i (3008 chip) SAS HBA. To my complete surprise, trim does not work. Digging into these threads:

Can't TRIM Samsung 850 EVO · Issue #8874 · zfsonlinux/zfs
Topicbox
Broadcom Inc. | Connecting Everything

The LSI HBA in IT mode requires DRAT (deterministic read after trim) and RZAT (read zeros after trim) in order for trim to work.

So the Crucial MX500 SSDs does not support DRAT or RZAT.
According to hdparm -I:
hdparm -I /dev/sdn | grep -i trim
* Data Set Management TRIM supported (limit 8 blocks)

From what I can see, there is no firmware upgrade for the Crucial MX500s that add support for DRAT or RZAT So the only option left, is a new SAS3 HBA. Can anyone recommend a native Linux kernel supported SAS3 controller that allows trim to happen without needing DRAT and RZAT. I checked Supermicro's site for HBAs and they all seem LSI based.
Most of us all run LSI based RAID\HBA cards, but we also run enterprise drives which don't have this TRIM problem. I'm surprised you didni't run into this during your research... everyone saying "enterprise SSD" don't use consumer ;) although it does feel like it's been 2-3 years since this was chanted weekly :D

WHen I first got going with enterprise gear at home a few years ago I made the same mistake and got 8x NEW in box prosumer intel SSD, very disappointing, then I got enterprise SSD and all smiles :D :D :D
 

Ryushin

New Member
Nov 7, 2019
17
3
3
I remember the mantra. The mantra from memory, where for issues like endurance, power off protection, garbage collection, and warranty. The MX500 drives finally met the enterprise specs that I was looking for for a price half of the Samsung 883 DCTs. Considering I needed six 2TB drives to replace 15K platters, the Crucial MX500 met those requirements and still set me back a hefty $1400 for six of them (and I bought another 10 4TB drives to fill the chassis). I did not think there would be issues with trim though and I did not even think of researching that.

So now that I'm in this boat, I need to solve it, if possible.
 

Ryushin

New Member
Nov 7, 2019
17
3
3
Well, the Crucial MX500 drives are on their way back to Amazon. To not support DRAT and RZAT in this day in age is just not good. Going to do research and see if the new WD Red SSDs support proper trim.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
Manufacturers restrict features for consumer drives so that they cannot \ are not made to run in multi-drive setup \ arrays \ pools for business work loads. No matter what you do with consumer drives you're on that line of "it just may not work".

WD RED HDD line is limited, I suspect their SSD will be too.

Always keep in mind they're not going to cannibalize their enterprise market.


I personally, and have done for dozens and dozens of SSD and NVME is turn to eBay for 2nd hand enterprise gear.
I think most here would agree.
 

Ryushin

New Member
Nov 7, 2019
17
3
3
I don't mind buying some gear from eBay and I just got a temporary CPU from eBay while waiting for Supermicro to release their new Epyc boards. SSDs have only so much endurance, wouldn't buying used SSDs from eBay be risky?

Are there any SAS3 HBAs that do not have the DRAT and RZAT requirement?
 

Ryushin

New Member
Nov 7, 2019
17
3
3
That test is interesting. Hopefully the 2TB 860 Evo won't be like that.

My end goal is to fully populate a Supermicro 72 drive JBOD for a very cost conscious customer:
SC417BE1C-R1K23JBOD | 4U | Chassis | Products | Super Micro Computer, Inc.
with 2TB drives running ZFS with six raidz2 vdevs of 12 drives each and should give us about 100TB of SSD storage. JBOD will be delivered as NAS for SMB and NFS for the purpose of post raw video processing of HD and 4K video.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
That test is interesting. Hopefully the 2TB 860 Evo won't be like that.

My end goal is to fully populate a Supermicro 72 drive JBOD for a very cost conscious customer:
SC417BE1C-R1K23JBOD | 4U | Chassis | Products | Super Micro Computer, Inc.
with 2TB drives running ZFS with six raidz2 vdevs of 12 drives each and should give us about 100TB of SSD storage. JBOD will be delivered as NAS for SMB and NFS for the purpose of post raw video processing of HD and 4K video.
Woah, wait... so this isn't just an 8 drive at home system?

You're selling 72 consumer drives to a client to use in a huge pool for video processing? :eek:

This is not going to end well.
 

Ryushin

New Member
Nov 7, 2019
17
3
3
The six drives are for my system to do testing. Then we'll buy enough to build one full vdev and do more testing.

My customer has a ZFS system with 150 spinning platters with 500TB of space in it's pool. The spinning platters cannot keep up with that they are doing. So SSDs are the way to go going forward. A good reason they are still in business is that we do things as cost affective as possible. Tier one post video production houses had no problem dropping 250-500K on a SSD storage solution. Now some of them are no longer in business.

These may be consumer drives. The warranty and TBW will allow them to be used for their intended purpose. They will provided far more IOPS and speed then the spinning platters. I'm not selling the system to them. They buy the chassis and and drives. It's going to end far better than what they have now.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Videoediting is mostly a sequential workload. While there is no discussion that SSDs have much better iops values, I doubt that you will see an overall faster video filer with desktop Sata SSDs vs 12G SAS mechanical disks in a multi-mirror under load (with a lot of RAM, maybe an Optane L2Arc+read ahead and a special vdev mirror). Trim is bad under load for performance so no replacement for datacenter SSDs. Write performance will go down after a short time of steady write anyway.

I would prefer for sure a 100TB video filer with up to 24x 10/12TB HE SAS disks ex HGST/WD Ultrastar in a multi mirror setup and a solution without expander.(Regular Supermicro case with 24 x 3,5" disks and expanderless backplane). With expander I would not think of up to 72 sata disks behind.

About trim
As ZFS and not the HBA handles trim, i doubt that you get a different result with a different HBA.
 
  • Like
Reactions: T_Minus

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
The six drives are for my system to do testing. Then we'll buy enough to build one full vdev and do more testing.

My customer has a ZFS system with 150 spinning platters with 500TB of space in it's pool. The spinning platters cannot keep up with that they are doing. So SSDs are the way to go going forward. A good reason they are still in business is that we do things as cost affective as possible. Tier one post video production houses had no problem dropping 250-500K on a SSD storage solution. Now some of them are no longer in business.

These may be consumer drives. The warranty and TBW will allow them to be used for their intended purpose. They will provided far more IOPS and speed then the spinning platters. I'm not selling the system to them. They buy the chassis and and drives. It's going to end far better than what they have now.

The rated IOPs and TBW is just that, a rating for consumer work load.
That is not a rating assuming the drives will be in a large pool, be in constant steady-state with severely degraded performance because they are consumer drives not meant for the workload you are looking to accomplish.

I'd go back to reading and researching before you jump on buying more Drives or HBAs.

Again, myself, and everyone I've talked to and seen on forums regret going into a large consumer SSD pool setup.
 

Ryushin

New Member
Nov 7, 2019
17
3
3
Then maybe it's best to let the deal for the 860 Evo's go for now and see what these new WD SA500 SSDs are like and if they support proper trim behind a LSI card.
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,140
594
113
New York City
www.glaver.org
The LSI HBA in IT mode requires DRAT (deterministic read after trim) and RZAT (read zeros after trim) in order for trim to work.
This puzzles me. TRIM (and its SAS cousin UNMAP) is the OS telling the drive "I don't need this data any more, go prepare these logical blocks for re-writing with new data at some future time". It would be nice to have the blocks be all 0's, or A5, or DEADBEEF, but unless the OS is reading data it specifically told the drive it didn't care about, I don't see the need for either DRAT or RZAT unless there is a data confidentiality issue (which would be better served by full disk encryption, anyway).

Can someone enlighten me?
 

UhClem

just another Bozo on the bus
Jun 26, 2012
433
247
43
NH, USA
This puzzles me. TRIM (and its SAS cousin UNMAP) is the OS telling the drive "I don't need this data any more, go prepare these logical blocks for re-writing with new data at some future time". It would be nice to have the blocks be all 0's, or A5, or DEADBEEF, but unless the OS is reading data it specifically told the drive it didn't care about, I don't see the need for either DRAT or RZAT unless there is a data confidentiality issue (which would be better served by full disk encryption, anyway).

Can someone enlighten me?
[Forgive the time gap, but just found this thread in a search (>ssd trim rzat<), and it is exactly what I was trying to figure out.]

Does anyone have an answer to Terry's (and my) puzzle/question?
Thanks!
 

Sawtaytoes

New Member
Oct 17, 2023
16
1
3
I'm in the same boat. Found this thread talking about something I wanted to know.

I have a bunch of Crucial MX500 drives and never realized their TRIM support doesn't work with LSI controllers until today after I already sunk over $25K into this NAS.

I'm wondering if SATA to SAS interposers could help, but I'm not sure. It really sucks all these drives I purchased simply don't handle TRIM on these controllers.
 

bonox

Member
Feb 23, 2021
87
20
8
This puzzles me.

Can someone enlighten me?
for an individual drive, it's irrelevant. I believe it relates to patrol reads/scrubs on arrays where the file system will complain of an error if it compares blocks from different drives. If the blocks are free, they should all have the same 'data' right! Once a block is marked free for use (ie old data deleted) you have two paths - you could actually write zero to the disks, and then a scrub would pass unless there's an underlying problem with a disk. Or you could not change the underlying data but simply mark it as available in which case one of two things happen:

1. The data in the free blocks all come back zero on every drive (enerprise DRAT behaviour) which happens to satisfy checksum requirements, in which case it'll pass a scrub
2. The actual data is read off the disk (consumer disk), doesn't (or may not) satisfy checksum and fails, despite this not being an actual problem because there's no real data stored on those 'free' blocks.

The goal of ZRAT/DRAT etc is that the drive through trim returns consistent values (eg zero) for free block reads regardless of what is actually written to the block, so everything lines up and a scrub passes. The consumer drive without DRAT will return the actual data rather than a consistent value for free blocks and that I think is the source of the issue for scrubs of arrays. It's a behaviour based on the fact that in an array, there is an expected relationship between data on the same sector of all the disks in the vdev and ordinary TRIM doesn't promise to return any particular data after being called. ZRAT/DRAT are means of ensuring that the expected relationship is consistent.

I'm crap at explaining these things but hopefully it makes some sense.
 
Last edited:

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,140
594
113
New York City
www.glaver.org
The goal of ZRAT/DRAT etc is that the drive through trim returns consistent values (eg zero) for free block reads regardless of what is actually written to the block, so everything lines up and a scrub passes. The consumer drive without DRAT will return the actual data rather than a consistent value for free blocks and that I think is the source of the issue for scrubs of arrays. It's a behaviour based on the fact that in an array, there is an expected relationship between data on the same sector of all the disks in the vdev and ordinary TRIM doesn't promise to return any particular data after being called. ZRAT/DRAT are means of ensuring that the expected relationship is consistent.
Bearing in mind that I haven't written any drive firmware in at least a decade, and never wrote (or even looked at) any SSD firmware...

After a block has been TRIMmed or UNMAPped, a request for a disk block that has been processed by either of those should return whatever it returns for an un-allocated block, since the purpose of TRIM/UNMAP is to have the block(s) erased and returned to the free block list. And I don't see why a drive wouldn't just return all zeros, or 0xFEEDFACEDEADBEEF, or some other more boring pattern. :D

My anecdotal experience is that ZFS scrub operations only take into consideration disk blocks that ZFS thinks are still allocated. Otherwise resilvers and scrubs would be constant-time.

As far as mirrors (of varying types, including ZFS and controller-based), if a TRIM/UNMAP made it to less than all mirrors, then that probably should raise a soft inconsistency error of some type. After all, the purpose of a mirror is to have identical data (or identical lack of data) on more than one drive.

For other types of volumes where drives are not expected to be identical, well they're already not identical in their perfect state. Making them even less identical is like being "more dead" (as opposed to "mostly dead" - obligatory Pricess Bride reference).

This still looks like a complicated solution to a past drive that had brain-dead firmware that returned random data on reads of unallocated blocks. Which would appear to be a security hole, since that random data probably isn't really "random".
 
  • Like
Reactions: UhClem

bonox

Member
Feb 23, 2021
87
20
8
lots of these drives are used in non zfs applications (netapp or SAN's in general) where they are dumb, and most were probably intended to sit behind raid controllers rather than software control. May also be much more relevant to block level storage rather than file level, but that's my understanding of why zrat exists. Don't have to agree with me. Here's someone else's opinion, but i doubt anyone here is likely to have been on the design/decision teams behind various drive/controller firmware