S2D slow write (no parity & NVMe cache)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Necrotyr

Active Member
Jun 25, 2017
206
52
28
Denmark
Just catching up on this great thread. I've currently have, or will soon have, the following three types SAS SSD drives from HGST:

1) HUSMM8040ASS205 (Ultrastar SSD800MM)
2) HUSMM8040ASS204 (Ultrastar SSD800MM)
3) HUSSL4040ASS600 (Ultrastar SSD400S)
I wish you well, when I tried the SSD400S and SSD400S.B series ssd for S2D, I wasn't able to add them, because they don't have SES.
 

Cipher

Member
Aug 8, 2014
159
15
18
53
I wish you well, when I tried the SSD400S and SSD400S.B series ssd for S2D, I wasn't able to add them, because they don't have SES.
Hi Necrotyr,

Isn't the SES capability built into the HBA itself and not the drive? Here's a related discussion on this issue - Storage Spaces Direct (S2D): SCSI Enclosure Services (SES) needed only on HBA?

Also, it looks like SES is no longer a requirement as of build 17074:
Announcing Windows Server Insider Preview Build 17074 - Windows Experience Blog

Storage Spaces Direct (S2D)
  • In this preview build we continue to expand and simplify the hardware for Storage Spaces Direct. SCSI Enclosure Services (SES) is no longer required for Storage Spaces Direct to be compatible, this unlocks a breath of new hardware which was not capable of running S2D. Storage Spaces Direct now also supports Persistent Memory (aka. Storage Class Memory), this unlocks a new class of extremely low latency storage which is incredibly interesting in particular as a caching device. Storage Spaces Direct now also supports Direct-connect SATA devices to AHCI controller, again this will expand the hardware ecosystem compatible with S2D as well as unlock a new class of low cost hardware. In this preview we have also enabled the Cluster Shared Volumes (CSV) Cache is enabled by default, which delivers an in-memory write-through cache that can dramatically boost VM performance.
 

Necrotyr

Active Member
Jun 25, 2017
206
52
28
Denmark
It should be in the HBA, but I was using a 9211-8i and it allowed the SAS HDD's and SATA SSD's I ended up using instead.

Didn't they disable S2D in the semi-annual releases or was that only 1709?
 

epon

New Member
Aug 29, 2018
1
0
1
FYI, using direct reads/writes to a CSV is not a good way to test performance. Using something like VMFLEET would make more sense. This is due to how CSVs are optimized. File copies are buffered IO vs VHDX IO, which is unbuffered.

If you're lazy and don't want to setup VMFLEET, at least spin up a single VM and do the CrystalDiskMark test from inside. I would bet you will see better speeds due to the optimized IO at the VHDX level.
 

AlexJoda

New Member
Aug 26, 2013
3
2
3
If somebody is still reading this old thread: We had the same problem with our test installation of S2D 2019. The writing was very slow compared to the potential of our (consumer) test NVMEs. We used a 2 node setup with the new Nested mirror-accelerated parity feature (Mirrored SSDs combined with a RAID5 style HDD tier in a ReFS volume). The components for this Lab setup are:

2 nodes, each has:
HP ML350 Gen8 with 128 GB RAM and 2*E5-2630 V0 (12 Cores)
2*Supermicro (LSI) HBA AOC-S3008L-L8i (IT Mode)
2*RaidSonic IB-2281MSK Backplane für 8*2,5" HDD/SSD with 2*MiniSAS connector (the build-in HP backplanes are not compatible with the 3008 controllers. The ReadSonics are fitting perfectly in the HP case)
8*HP146GB 2,5" SAS HDD (10K). (We had tons of them laying around from old servers. The should simulate modern 7,2K 3,5" HGST drives)
4*Samsung PM883 SSD (We know they are not durable enough (1,3 DWPD) but they should have PLP. It´s only a Lab setup...)
2*Samsung 970 Pro NVME (consumer NVME without PLP, but very fast and without the internal SLC cache from the EVO that made them unusable for random server access) with 2*PCIe to M.2 adapter
Mellanox ConnectX-3 Pro Dual 40 GBE ROCE V2 Nic with direct DAC connections
Asmedia Dual M.2 SATA NVME RAID1 controller as Boot RAID with SATA connector (ASM1092R controller. You get this from various vendors like Roline, Startech, Delock...You can get a monitoring software called 109GUI 2.02 too). This thing is fully transparent as a SATA drive to the node and boots without problems with Server 2019. We used 2*Crucial MX500 M.2 SATA drives with it that also had PLP.

This setup should be good enough to check and play around with a S2D 2019 setup in the Lab and to give some impressions of the real live performance you can get with it. We think this is better than a virtualized "nested" test environment, because you see all the hardware problems too.

After settings this up we saw a frustrating 450 Mb/s sequentiell write rate and about 6000 IOPS (4K) with IOMeter. Then we found this thread and found out about the PLP problem with consumer NVME/SSDs. The check with Get-PhysicalDisk |? FriendlyName -Like "*" | Get-StorageAdvancedProperty showed that no drive had PLP (IsPowerProtected False, although the PM883 should have PLP) and that we are not able to retrieve the write cache status (WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1).

The solution to this was to use this command (Set-StoragePool -FriendlyName "Cluster Name" -IsPowerProtected $true) to overrule the PLP and even more important to check the Policy "Enable write caching on the device" in the Device Manager/Disk drives. The sad thing is that you have to set this policy every time you reboot the server again. That should maybe not the the case with PLP drives.

After that we had over 2,2 GB/ seq write speed and over 90ooo IOPS. The combined Read/Write performance (4K 50/50) is about 130000 IOPS and the Read IOPS is about 400000!
 

Timmerdanny

New Member
Oct 21, 2019
8
0
1
If somebody is still reading this old thread: We had the same problem with our test installation of S2D 2019. The writing was very slow compared to the potential of our (consumer) test NVMEs. We used a 2 node setup with the new Nested mirror-accelerated parity feature (Mirrored SSDs combined with a RAID5 style HDD tier in a ReFS volume). The components for this Lab setup are:

2 nodes, each has:
HP ML350 Gen8 with 128 GB RAM and 2*E5-2630 V0 (12 Cores)
2*Supermicro (LSI) HBA AOC-S3008L-L8i (IT Mode)
2*RaidSonic IB-2281MSK Backplane für 8*2,5" HDD/SSD with 2*MiniSAS connector (the build-in HP backplanes are not compatible with the 3008 controllers. The ReadSonics are fitting perfectly in the HP case)
8*HP146GB 2,5" SAS HDD (10K). (We had tons of them laying around from old servers. The should simulate modern 7,2K 3,5" HGST drives)
4*Samsung PM883 SSD (We know they are not durable enough (1,3 DWPD) but they should have PLP. It´s only a Lab setup...)
2*Samsung 970 Pro NVME (consumer NVME without PLP, but very fast and without the internal SLC cache from the EVO that made them unusable for random server access) with 2*PCIe to M.2 adapter
Mellanox ConnectX-3 Pro Dual 40 GBE ROCE V2 Nic with direct DAC connections
Asmedia Dual M.2 SATA NVME RAID1 controller as Boot RAID with SATA connector (ASM1092R controller. You get this from various vendors like Roline, Startech, Delock...You can get a monitoring software called 109GUI 2.02 too). This thing is fully transparent as a SATA drive to the node and boots without problems with Server 2019. We used 2*Crucial MX500 M.2 SATA drives with it that also had PLP.

This setup should be good enough to check and play around with a S2D 2019 setup in the Lab and to give some impressions of the real live performance you can get with it. We think this is better than a virtualized "nested" test environment, because you see all the hardware problems too.

After settings this up we saw a frustrating 450 Mb/s sequentiell write rate and about 6000 IOPS (4K) with IOMeter. Then we found this thread and found out about the PLP problem with consumer NVME/SSDs. The check with Get-PhysicalDisk |? FriendlyName -Like "*" | Get-StorageAdvancedProperty showed that no drive had PLP (IsPowerProtected False, although the PM883 should have PLP) and that we are not able to retrieve the write cache status (WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1).

The solution to this was to use this command (Set-StoragePool -FriendlyName "Cluster Name" -IsPowerProtected $true) to overrule the PLP and even more important to check the Policy "Enable write caching on the device" in the Device Manager/Disk drives. The sad thing is that you have to set this policy every time you reboot the server again. That should maybe not the the case with PLP drives.

After that we had over 2,2 GB/ seq write speed and over 90ooo IOPS. The combined Read/Write performance (4K 50/50) is about 130000 IOPS and the Read IOPS is about 400000!
I'm having the same issue with Samsung PM1725b SSD drives as cache in a S2D cluster. The S2D is setup as three way mirror and in total I have three cluster members with 8x SAS drives and 2x Samsung PM1725b each node. I ran the command Get-PhysicalDisk |? FriendlyName -Like "*" | Get-StorageAdvancedProperty and it returns the error Retrieving IsDeviceCacheEnabled failed with ErrorCode 1 and the IsPowerProtected is set to false for my SSD drives.

However when I run the Get-StoragePool and I see that my storage pool is already set to PowerProtected. In Windows Admin Center I see my read latency avarage is around 100µs and write lantency 1.8ms. Is this normal for a mixed S2D envoriment?
 

Christobisto

New Member
Jan 30, 2020
5
0
1
I'm having the same issue with Samsung PM1725b SSD drives as cache in a S2D cluster. The S2D is setup as three way mirror and in total I have three cluster members with 8x SAS drives and 2x Samsung PM1725b each node. I ran the command Get-PhysicalDisk |? FriendlyName -Like "*" | Get-StorageAdvancedProperty and it returns the error Retrieving IsDeviceCacheEnabled failed with ErrorCode 1 and the IsPowerProtected is set to false for my SSD drives.

However when I run the Get-StoragePool and I see that my storage pool is already set to PowerProtected. In Windows Admin Center I see my read latency avarage is around 100µs and write lantency 1.8ms. Is this normal for a mixed S2D envoriment?
Timmerdanny, did you ever get to the bottom of the "IsPowerProtected is set to false" and slow cache write issues?

We have just got a cluster up and running using these SSD's and are seeing the same issues. I have a call out with Samsung but they are next to useless.
 

Timmerdanny

New Member
Oct 21, 2019
8
0
1
Timmerdanny, did you ever get to the bottom of the "IsPowerProtected is set to false" and slow cache write issues?

We have just got a cluster up and running using these SSD's and are seeing the same issues. I have a call out with Samsung but they are next to useless.
We openened a support ticket with Microsoft to investigate the issue. We switched from Failover NIC team to a SET switch and have RoCE configured properly now. However the writes are still horrible slow. If you do a test locally the drives are doing perfect. I tried to set the IsPOwerProtected to true but that doesn't help. The ticket is now open for weeks at Microsoft and we are considering to expand our storage array with SAS-SSD's so we can give NVMe SSD's the role of read caching only.
 

Christobisto

New Member
Jan 30, 2020
5
0
1
We openened a support ticket with Microsoft to investigate the issue. We switched from Failover NIC team to a SET switch and have RoCE configured properly now. However the writes are still horrible slow. If you do a test locally the drives are doing perfect. I tried to set the IsPOwerProtected to true but that doesn't help. The ticket is now open for weeks at Microsoft and we are considering to expand our storage array with SAS-SSD's so we can give NVMe SSD's the role of read caching only.
Yeah MS are being rather slow with us as well. We are using iWarp rather than ROCEv2 which makes things easier. Went with simplified networking (2 x IP's on the same subnet) but currently have cluster and client on the same set of NICs. MS seem to be saying that its the network and not the SSD's but I still suspect them.

Tempted just to knock a node out of the cluster and see how it performs as a stand alone Storage Spaces box.

Please let us know if you have any luck (as will I).
 

gb00s

Well-Known Member
Jul 25, 2018
1,175
586
113
Poland
Does someone know if these LSI 6208 Nytro WarpDrive NWD-RLP4-1860 are PLP drives? I sometimes read "Battery less power fail protection" or "Less than 5 second recovery from power failure". But I don't have the knowledge to point this to PLP. Would be helpful if you could comment on that. Thank you in advance.
 

Set Iron

New Member
Mar 18, 2020
7
0
1
We openened a support ticket with Microsoft to investigate the issue. We switched from Failover NIC team to a SET switch and have RoCE configured properly now. However the writes are still horrible slow. If you do a test locally the drives are doing perfect. I tried to set the IsPOwerProtected to true but that doesn't help. The ticket is now open for weeks at Microsoft and we are considering to expand our storage array with SAS-SSD's so we can give NVMe SSD's the role of read caching only.
Did you ever get the matter resolved? I was experimenting with full NVMe flash S2D using Samsung PM1725b drives and found myself in the same situation. It seems like a driver issue, but Samsung doesn't provide any.
 

Timmerdanny

New Member
Oct 21, 2019
8
0
1
Did you ever get the matter resolved? I was experimenting with full NVMe flash S2D using Samsung PM1725b drives and found myself in the same situation. It seems like a driver issue, but Samsung doesn't provide any.
Not yet, I opened two cases with Microsoft to see if another support engineer might know a solution. The strange thing is that the disks locally without S2D clustering, works perfect. I did some testing with diskspd and tested the random read/write performance with PM1725 locally and the results are very good. I also tested the Intel P3700 SSD but that device is also not recognized as PLP device. Microsoft keeps trying to say it is something with your network but I don't think that is the issue. I will be redesigning my Storage Spaces Direct cluster with 24x Western Digital SS540 1.6TB SAS SSD. I will try the PM1725 as cache with that senario. I also bought some new network cards from chelsio to do iWARP instead of RoCE networking.

Here is the random read/write performance 4 Threads - 32 Queue - 4K
COmmand used for testing: DiskSpd.exe -r -c4G -d60 -w50 -t4 -o32 -b4K -L -Sh C:\tmp\testfile.dat
without S2D: Read (MB/s): 1046, Write (MB/s): 1019, Latency read: 476 μs, Latency write: 490 μs, IOPS read: 267.957, IOPS write: 260.948
with S2D and HDDs: Read (MB/s): 781,29, Write (MB/s): 89,6, Latency read: 640 μs, Latency write: 5572 μs, IOPS read: 200.009, IOPS write: 22.938

Here is the random read/write performance 1 Threads - 1 Queue - 4K
Command used for testing: DiskSpd.exe -r -c4G -d60 -w50 -t1 -o1 -b4K -L -Sh C:\tmp\testfile.dat
without S2D: Read (MB/s): 40,53, Write (MB/s): 149,11, Latency read: 26 μs, Latency write: 26 μs, IOPS read: 10.376, IOPS write: 38.172
with S2D and HDDs: Read (MB/s): 38,77, Write (MB/s): 1,89, Latency read: 100 μs, Latency write: 2061 μs, IOPS read: 9.924, IOPS write: 484

As soon as I redesigned the cluster I will reply here to send you the details.
 

Christobisto

New Member
Jan 30, 2020
5
0
1
We have also been going round in circles for a while, in desperation I turned off RDMA on the NIC's ...

With RDMA = 340 MB/s
Without RDMA = 1300 MB/s

Although this isn't a supported configuration it might mean that the Samsung SSD's are working as intended.

We have just rebuilt our cluster as a 6 node (the Production function are running on the other two as standard Storage Spaces NAS's) and I am retesting now.

For Reference Config is as Follows:
6 x HPE DL835 Gen 10 (2 x EPYC 7251, 128 GB RAM, 12 x 8TB SAS, 4 x 3.2 GB PM1725b, 2 x 25 GB Cavium based NICs,)
2 x Cisco 9300's,
 

Timmerdanny

New Member
Oct 21, 2019
8
0
1
We have also been going round in circles for a while, in desperation I turned off RDMA on the NIC's ...

With RDMA = 340 MB/s
Without RDMA = 1300 MB/s

Although this isn't a supported configuration it might mean that the Samsung SSD's are working as intended.

We have just rebuilt our cluster as a 6 node (the Production function are running on the other two as standard Storage Spaces NAS's) and I am retesting now.

For Reference Config is as Follows:
6 x HPE DL835 Gen 10 (2 x EPYC 7251, 128 GB RAM, 12 x 8TB SAS, 4 x 3.2 GB PM1725b, 2 x 25 GB Cavium based NICs,)
2 x Cisco 9300's,
Are you using RoCE RDMA or iWARP RDMA? The RoCE requires the DCB and QoS to operate sucessfully. We are switching the network cards from Mellanox ConnectX-3 Pro to Chelsio T580-LP-CR because of the QoS settings on RoCE.
 

Christobisto

New Member
Jan 30, 2020
5
0
1
Are you using RoCE RDMA or iWARP RDMA? The RoCE requires the DCB and QoS to operate sucessfully. We are switching the network cards from Mellanox ConnectX-3 Pro to Chelsio T580-LP-CR because of the QoS settings on RoCE.
Currently using iWARP as DCB is a cost option on the Cisco Switches. NICs are the HPE 622FLR-SFP28 this is Marvell QL41401 based.
 

yara2

New Member
Mar 26, 2020
1
0
1
Not yet, I opened two cases with Microsoft to see if another support engineer might know a solution. The strange thing is that the disks locally without S2D clustering, works perfect. I did some testing with diskspd and tested the random read/write performance with PM1725 locally and the results are very good. I also tested the Intel P3700 SSD but that device is also not recognized as PLP device. Microsoft keeps trying to say it is something with your network but I don't think that is the issue. I will be redesigning my Storage Spaces Direct cluster with 24x Western Digital SS540 1.6TB SAS SSD. I will try the PM1725 as cache with that senario. I also bought some new network cards from chelsio to do iWARP instead of RoCE networking.

Here is the random read/write performance 4 Threads - 32 Queue - 4K
COmmand used for testing: DiskSpd.exe -r -c4G -d60 -w50 -t4 -o32 -b4K -L -Sh C:\tmp\testfile.dat
without S2D: Read (MB/s): 1046, Write (MB/s): 1019, Latency read: 476 μs, Latency write: 490 μs, IOPS read: 267.957, IOPS write: 260.948
with S2D and HDDs: Read (MB/s): 781,29, Write (MB/s): 89,6, Latency read: 640 μs, Latency write: 5572 μs, IOPS read: 200.009, IOPS write: 22.938

Here is the random read/write performance 1 Threads - 1 Queue - 4K
Command used for testing: DiskSpd.exe -r -c4G -d60 -w50 -t1 -o1 -b4K -L -Sh C:\tmp\testfile.dat
without S2D: Read (MB/s): 40,53, Write (MB/s): 149,11, Latency read: 26 μs, Latency write: 26 μs, IOPS read: 10.376, IOPS write: 38.172
with S2D and HDDs: Read (MB/s): 38,77, Write (MB/s): 1,89, Latency read: 100 μs, Latency write: 2061 μs, IOPS read: 9.924, IOPS write: 484

As soon as I redesigned the cluster I will reply here to send you the details.
Are there any new results on Chelsio cards?
 

wimoy

New Member
May 15, 2020
2
0
1
psannz, will using a set of PLP SSDs for a dedicated journal improve performance even if the data SSDs do not support PLP?
 

psannz

Member
Jun 15, 2016
79
19
8
39
psannz, will using a set of PLP SSDs for a dedicated journal improve performance even if the data SSDs do not support PLP?
A dedicated journal generally does increase performance with parity spaces. Never worked with them, though. And sadly no longer have access to the old playground to test it now. Sorry.