S2D slow write (no parity & NVMe cache)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Cutha

Member
Sep 24, 2016
75
7
8
50
Canada
With S2D I am getting terrible writes no matter what I try. This is not a post about slow writes because of a parity setup as I have my stuff configured as all mirror. It seems like the cache devices are never written to. At first I thought I needed to use SOFS but I am using this in a hyper converged setup and going by Deploy Storage Spaces Direct (Step 4) it is not required. I tried SOFS and General File Server as cluster roles with storage attached in various ways but the results were always pretty poor.

To keep it simple there is one VM allocated with 8 vCPU’s and a drive (VHDX fixed) for benchmarking against. Currently I have no FS roles setup (SOFS or General) and I am just using the C:\ClusterStorage\. I have tried it with SOFS and General but from what I can tell I shouldn’t need it for the setup I am using. At some point I would like to have a share for client workstation VHDX’s with the quickest configuration possible, CA not required but HA would be nice.

RDMA is tested and working and drives perform as expected on their own. I am waiting until I get this sorted out before I deploy this so at the moment I can do whatever configuring and testing is required. I am stumped and disappointed with myself. I have spent 2 years gathering the hardware to make this work and for the life of me I can't get it to perform. Any help would be appreciated.

I have 4 nodes and each node has:
2xNVMe 250GB (SM961, has PLP)(Does NOT have PLP, I was incorrect)
3xSSD 480GB (MZ7WD480, has PLP)
4xHDD 4TB (consumer grade SATA)
ConnectX-2 dual port
2xE5-2670v1
96GB RAM
Windows Server 2016

I was a little confused as to which performance counters I should be looking at to see if the cache was being used but from everything I could see, it was not being used during any of my tests. I have read everything I could find online that I think might help me and I don't know what to try next.

Benchmarks on the client with the client VHDX on different volumes. The info on the volumes is below and they are in the same order.


Code:
New-Volume -FriendlyName "CSV-1" -FileSystem CSVFS_ReFS -StoragePoolFriendlyName S2D* -Size 100GB

New-Volume -FriendlyName "Test0SSDOnly" -FileSystem CSVFS_ReFS -StoragePoolFriendlyName S2D* -Size 10GB -ResiliencySettingName Mirror

New-Volume -FriendlyName "Test2MRV" -FileSystem CSVFS_ReFS -StoragePoolFriendlyName S2D* -StorageTierFriendlyNames Performance, Capacity -StorageTierSizes 50GB, 50GB

New-Volume -FriendlyName "Test3Tiered" -FileSystem CSVFS_ReFS -StoragePoolFriendlyName S2D* -StorageTierFriendlyNames Performance, CapacityMirror -StorageTierSizes 50GB, 50GB
Individual drives I tested some time ago:

1xNVMe


1xSATA SSD



Code:
Get-ClusterS2D

CacheMetadataReserveBytes : 34359738368
CacheModeHDD              : ReadWrite
CacheModeSSD              : WriteOnly
CachePageSizeKBytes       : 16
CacheState                : Enabled
State                     : Enabled
Code:
Get-ClusterStorageSpacesDirect

CacheMetadataReserveBytes : 34359738368
CacheModeHDD              : ReadWrite
CacheModeSSD              : WriteOnly
CachePageSizeKBytes       : 16
CacheState                : Enabled
State                     : Enabled
Code:
Get-StorageTier | FT FriendlyName, ResiliencySettingName, MediaType, PhysicalDiskRedundancy -autosize

FriendlyName         ResiliencySettingName MediaType PhysicalDiskRedundancy
------------         --------------------- --------- ----------------------
CapacityMirror       Mirror                HDD                            2
Capacity             Parity                HDD                            2
Performance          Mirror                SSD                            2
I wanted to use the HDD's in a 3 way mirror so I created the CapacityMirror with:
Code:
$hddTierMirror = New-StorageTier -StoragePoolFriendlyName "S2D on theCluster" -MediaType HDD -FriendlyName "CapacityMirror" -ResiliencySettingName Mirror -NumberOfDataCopies 3
I set the CSV cache with:
Code:
(Get-Cluster $ClusterName).BlockCacheSize = $CSVCacheSize
Code:
Get-StorageSubSystem *cluster* | Get-StorageHealthReport
CPUUsageAverage                 :   2.52 %
CapacityPhysicalPooledAvailable :  45.26 TB
CapacityPhysicalPooledTotal     :  65.32 TB
CapacityPhysicalTotal           :  65.32 TB
CapacityPhysicalUnpooled        :      0 B
CapacityVolumesAvailable        :   5.83 TB
CapacityVolumesTotal            :   5.91 TB
IOLatencyAverage                :      0 ns
IOLatencyRead                   :      0 ns
IOLatencyWrite                  :      0 ns
IOPSRead                        :      0 /S
IOPSTotal                       :      0 /S
IOPSWrite                       :      0 /S
IOThroughputRead                :      0 B/S
IOThroughputTotal               :      0 B/S
IOThroughputWrite               :      0 B/S
MemoryAvailable                 : 377.19 GB
MemoryTotal                     :    416 GB

ExtendedStatus :
ReturnValue    : 0
PSComputerName :
Code:
Get-PhysicalDisk

FriendlyName               SerialNumber         CanPool OperationalStatus HealthStatus Usage            Size
------------               ------------         ------- ----------------- ------------ -----            ----
ATA INTEL SSDSA2BW16       CVPR1290041L160DGN   False   OK                Healthy      Auto-Select 149.05 GB
ATA SAMSUNG MZ7WD480       S1G1NYAF926440       False   OK                Healthy      Auto-Select    447 GB
ATA SAMSUNG MZ7WD480       S16MNYAD911565       False   OK                Healthy      Auto-Select    447 GB
ATA SAMSUNG MZ7WD480       S16MNYAF605357       False   OK                Healthy      Auto-Select    447 GB
ATA SAMSUNG MZ7WD480       S16MNYAD911436       False   OK                Healthy      Auto-Select    447 GB
SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E84F. False   OK                Healthy      Journal     238.25 GB
ATA ST4000DM005-2DP1       WDH1L3DG             False   OK                Healthy      Auto-Select   3.64 TB
SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E88B. False   OK                Healthy      Journal     238.25 GB
ATA ST4000DM005-2DP1       ZDH0L26D             False   OK                Healthy      Auto-Select   3.64 TB
SAMSUNG MZVPW256HEGL-00000 0025_38CB_6100_B550. False   OK                Healthy      Journal     238.25 GB
ATA SAMSUNG MZ7WD480       S16MNEAD740753       False   OK                Healthy      Auto-Select    447 GB
ATA ST4000DM005-2DP1       ZDH0X9QM             False   OK                Healthy      Auto-Select   3.64 TB
SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E846. False   OK                Healthy      Journal     238.25 GB
ATA ST4000VN000-1H41       Z301MZSZ             False   OK                Healthy      Auto-Select   3.64 TB
SAMSUNG MZVPW256HEGL-00000 0025_38CB_6100_BFA2. False   OK                Healthy      Journal     238.25 GB
ATA SAMSUNG MZ7WD480       S16MNEAD500108       False   OK                Healthy      Auto-Select    447 GB
ATA ST4000DM005-2DP1       ZDH0BSBV             False   OK                Healthy      Auto-Select   3.64 TB
ATA ST4000VN000-1H41       Z306C6GF             False   OK                Healthy      Auto-Select   3.64 TB
SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E8EF. False   OK                Healthy      Journal     238.25 GB
ATA ST4000VN000-2AH1       WDH0KMY3             False   OK                Healthy      Auto-Select   3.64 TB
SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E84A. False   OK                Healthy      Journal     238.25 GB
ATA ST4000VN000-1H41       Z3060E2J             False   OK                Healthy      Auto-Select   3.64 TB
ATA ST4000DM005-2DP1       WDH1L39R             False   OK                Healthy      Auto-Select   3.64 TB
ATA ST4000VN000-2AH1       WDH0KN75             False   OK                Healthy      Auto-Select   3.64 TB
ATA SAMSUNG MZ7WD480       S16MNYAF604909       False   OK                Healthy      Auto-Select    447 GB
ATA ST4000VN000-2AH1       WDH0KNEM             False   OK                Healthy      Auto-Select   3.64 TB
ATA ST4000VN000-1H41       Z301NEKY             False   OK                Healthy      Auto-Select   3.64 TB
ATA ST4000VN000-1H41       Z304XGDB             False   OK                Healthy      Auto-Select   3.64 TB
ATA ST4000DM005-2DP1       ZDH0X90N             False   OK                Healthy      Auto-Select   3.64 TB
SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E8C7. False   OK                Healthy      Journal     238.25 GB
ATA SAMSUNG MZ7WD480       S1G1NYAF921789       False   OK                Healthy      Auto-Select    447 GB
ATA SAMSUNG MZ7WD480       S1G1NYAF925500       False   OK                Healthy      Auto-Select    447 GB
ATA SAMSUNG MZ7WD480       S16MNYAF114484       False   OK                Healthy      Auto-Select    447 GB
ATA ST4000DM005-2DP1       ZDH0X9SJ             False   OK                Healthy      Auto-Select   3.64 TB
ATA SAMSUNG MZ7WD480       S16MNYAF334083       False   OK                Healthy      Auto-Select    447 GB
ATA SAMSUNG MZ7WD480       S1G1NYAF921332       False   OK                Healthy      Auto-Select    447 GB
ATA ST4000VN000-2AH1       WDH0KNE5             False   OK                Healthy      Auto-Select   3.64 TB
 
Last edited:

psannz

Member
Jun 15, 2016
79
19
8
39
You do realise, that the SM961 does NOT have PLP, right? You might be mixing it up with the SM963, which does have PLP...
That's reason #1 for bad write performance.

Next NVMe point is the driver installation:
Check the device manager, whether the Samsung NVMe drivers are installed, or just the "Standard NVM Express Controller" drivers. The latter have crappy performance.

Also: Are you running those ConnectX-2 in Infiniband mode?
Afaik, full RoCEv2+ and SMBdirect is only supported from the X-3 on in Ethernet mode.
 
Last edited:
  • Like
Reactions: NISMO1968

Cutha

Member
Sep 24, 2016
75
7
8
50
Canada
Thanks for the replies guys, I appreciate it.

You do realise, that the SM961 does NOT have PLP, right? You might be mixing it up with the SM963, which does have PLP...
That's reason #1 for bad write performance.
Well that sucks. I have had it in my head that the SM961 had PLP so when I was going over everything again and I saw that as a requirement I didn't even give it a second thought. Is there a power shell command that would show that they don't have PLP or some other command that would show that they aren't being used for cache devices?

Stupid thing is, I posted about a year ago looking for NVMe drives for this build and clearly I knew they didn't have PLP. Arg.
Help selecting NVMe drives to finish build

Do you know if the SM953's would work ok for S2D? They seem to be available and the cheapest fix until I can beef it up. I think I recently read somewhere that S2D now only requires one caching device per node. If that is true the cost to get this up and running will not be too terrible. The Intel P3700's seem to be available but they are much more costly then the SM953's. I was excited about the Optane drives but the P900 series does not have PLP.

Next NVMe point is the driver installation:
Check the device manager, whether the Samsung NVMe drivers are installed, or just the "Standard NVM Express Controller" drivers. The latter have crappy performance.
The scale of the slow writes is well beyond bad drivers and I have done this setup a few times before and previously I made sure to get the Samsung drivers installed and I even saved the benchmarks from pre/post drive change. I forgot this time. I tried to get the drivers updated today but Windows is resisting me. As they don't have PLP anyway I will be pulling them from the servers so I have stopped trying to update the drivers.


Also: Are you running those ConnectX-2 in Infiniband mode?
Afaik, full RoCEv2+ and SMBdirect is only supported from the X-3 on in Ethernet mode.
Thanks for the guides Chris. I will be saving that one, very nice step by step directions.

I am pretty sure RoCE is working because I can transfer files quickly between the cluster nodes and the IB NIC's don't show the traffic. When I built out my 2 node cluster for proof of concept I went through the steps of getting RoCE configured but I didn't bother this time, because I forgot. I went through the steps in the guide cesmith9999 provided and I received an error when trying to apply the SMB QoS and when I looked up the error it was because the ConnectX-2 cards don't support it. RDMA works and I will work on upgrading the IB NIC's after I get the NVMe drives sorted out.

One of the ports on the IB NIC is on a subnet that is used only for cluster storage. If RDMA is working but the QoS is not setup will this be a problem in my 4 node setup? I found this technet blog, to rdma or not to rdma that makes me think it is not required.

My IB switch is a VLT-30111-GRID DIRECTOR 4036 that has the firmware updated based on the guide on this forum.

I am going to remove the NVMe drives from the pool and see what happens.
 

psannz

Member
Jun 15, 2016
79
19
8
39
  • Like
Reactions: Aluminat

Cutha

Member
Sep 24, 2016
75
7
8
50
Canada
Great web site, thanks.

I removed the NVMe drives from the pool and tried a couple quick tests with the SSDs and HDD's but the results were not great. I then tried changing the 12 SSD's to Journal drives and so far the results are still about the same, ~3500MB read and < 100MB write.
 

Cutha

Member
Sep 24, 2016
75
7
8
50
Canada
If I use this:
Code:
Set-ClusterS2D -CacheModeSSD ReadOnly
Will it allow the physical cache on the SATA SSD's (MZ7WD480HCGM-00003, 99% sure they have PLP) to be utilized bypassing the NVMe drives for writes?
 
Last edited:

sth

Active Member
Oct 29, 2015
379
91
28

psannz

Member
Jun 15, 2016
79
19
8
39

sth

Active Member
Oct 29, 2015
379
91
28
Thats an interesting and useful clarification, thank you.

The white paper you linked make reference to "For the SM953, the SFF-8639 form factor SSD supports the hot plug function; multiple tantalum capacitors ensure stable data integrity even when the system is in sudden power off recovery (SPOR) state. Currently, however, the M.2 form factor SSD does not support this function."

Can you explain the difference between a true enterprise level end-to-end data protection of devices say like a P3700 and this 'data at rest' protection. I'm interested in fully understanding the implications of these differences under ZFS and Storage Spaces.
 

psannz

Member
Jun 15, 2016
79
19
8
39
Can you explain the difference between a true enterprise level end-to-end data protection of devices say like a P3700 and this 'data at rest' protection. I'm interested in fully understanding the implications of these differences under ZFS and Storage Spaces.
I can not speak for ZFS, but as far as Windows / Storage Spaces go, it comes down to whether Microsoft sees the SSD as fully protected (end-to-end) from any and all power events, or not. If it is, the flag IsPowerProtected is set $true in the Get-StorageAdvancedProperty query of the disk, which is in turn checked by Storage Spaces.
Your problem with this IsPowerProtected is that it's a ReadOnly flag (MSFT_PhysicalDisk class (Windows)).
If Windows does not see end-to-end PLP, data integrity is no longer secured and thus any write must be fully written before it is acknoledged, which costs time and thus write performance.

The "data at rest" PLP just means that a sudden power outage will not corrupt any data that is already written. That's very nice to have in a client system if e.g. your power company or your building's power distribution is unreliable.
However from an enterprise point of view where data integrity counts..... basically, it's proper PLP or bust.
And in my optinion MS is correct in not allowing the IsPowerProtected physical disk attribute to be changed.
 
  • Like
Reactions: Cutha

psannz

Member
Jun 15, 2016
79
19
8
39
You can set it in powershell: Set-StoragePool (storage)
Different Parameter. You can set the StoragePool PowerProtection to $true, but the disks themselves are still recognized as $false.
While setting the StoragePool to $true will increase the write speed, crashes and power failures will most certainly mess up your data.
 

Cutha

Member
Sep 24, 2016
75
7
8
50
Canada
I can not speak for ZFS, but as far as Windows / Storage Spaces go, it comes down to whether Microsoft sees the SSD as fully protected (end-to-end) from any and all power events, or not. If it is, the flag IsPowerProtected is set $true in the Get-StorageAdvancedProperty query of the disk, which is in turn checked by Storage Spaces.
Thanks!

Using Get-StorageAdvancedProperty should make it very clear but it has returned some confusing results.

I have 12 of the same SSD's and only 1 of the 4 nodes shows them as True for IsPowerProtected for it's 3 SSD's and 1 of the nodes only has 1/3 sSD's as True.

Node01
Code:
Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty
FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
------------         ------------   ---------------- --------------------
ATA SAMSUNG MZ7WD480 S1G1NYAF926440             True                False
ATA SAMSUNG MZ7WD480 S16MNEAD500108            False                False
ATA SAMSUNG MZ7WD480 S16MNYAF334083            False                False
Node02
Code:
Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty
FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
------------         ------------   ---------------- --------------------
ATA SAMSUNG MZ7WD480 S16MNYAD911565            False                False
ATA SAMSUNG MZ7WD480 S16MNEAD740753            False                False
ATA SAMSUNG MZ7WD480 S1G1NYAF921332            False                False
Node03
Code:
Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty
FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
------------         ------------   ---------------- --------------------
ATA SAMSUNG MZ7WD480 S1G1NYAF921789            False                False
ATA SAMSUNG MZ7WD480 S1G1NYAF925500            False                False
ATA SAMSUNG MZ7WD480 S16MNYAF114484            False                False
Node04
Code:
Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty

FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
------------         ------------   ---------------- --------------------
ATA SAMSUNG MZ7WD480 S16MNYAF605357             True                False
ATA SAMSUNG MZ7WD480 S16MNYAD911436             True                False
ATA SAMSUNG MZ7WD480 S16MNYAF604909             True                False
Node04 backplane is is dead so I removed it and scabbed in the cables to connect the drives until I get a new backplane. As the SSD's directly attached to the SAS HBA seemed to show up as properly power protected I thought the backplane might have something to do with it. I have since directly connected all the SSD's to the HBA adapter but that didn't seem to change anything.

Any idea's on whats going on?
 

Cutha

Member
Sep 24, 2016
75
7
8
50
Canada
I think I got it. These benchmarks are from within a VM on the cluster. SSD's as cache, no NVMe drives.

Mirror:


Mirror:


Mirror (QD8):


Parity:
 
  • Like
Reactions: adam_kf

psannz

Member
Jun 15, 2016
79
19
8
39
Node04
Code:
Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty

FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
------------         ------------   ---------------- --------------------
ATA SAMSUNG MZ7WD480 S16MNYAF605357             True                False
ATA SAMSUNG MZ7WD480 S16MNYAD911436             True                False
ATA SAMSUNG MZ7WD480 S16MNYAF604909             True                False
Node04 backplane is is dead so I removed it and scabbed in the cables to connect the drives until I get a new backplane. As the SSD's directly attached to the SAS HBA seemed to show up as properly power protected I thought the backplane might have something to do with it. I have since directly connected all the SSD's to the HBA adapter but that didn't seem to change anything.

Any idea's on whats going on?
That's really weird.... Does your backplane support SES? If not that could explain things.
Which Windows Server version are you trying? 2016 or 2019TP?

Also, if you want support for non-PLP SSDs in S2D, there's a request open on the Windows Server UserVoice forums:
Storage Spaces allow option for volatile cache (aka consumer SSDs)
 

Cutha

Member
Sep 24, 2016
75
7
8
50
Canada
That's really weird.... Does your backplane support SES? If not that could explain things.
Which Windows Server version are you trying? 2016 or 2019TP?

Also, if you want support for non-PLP SSDs in S2D, there's a request open on the Windows Server UserVoice forums:
Storage Spaces allow option for volatile cache (aka consumer SSDs)
The backplanes are SM bpn-sas2-826el1. I am not sure if it supports SES and the pdf user guide from SuperMicro doesn't mention SES.

I am using Server 2016.

What is the risk with non-PLP SSD's while using a 3 way mirroring?
 

psannz

Member
Jun 15, 2016
79
19
8
39
That Backplane has SES 2.0, so you should be fine. The SES requirement will only be removed with WS2019, at least in terms of

You can use them, sure. 2 or 3 way mirroring is fine. with 3 disks and 2 way mirroring, you'd get twice the write performance compared to 3 way mirror. Unlike Raid 1, 2 way mirror can spread data over an uneven number of disks. Of course 2 way mirror can only cover 1 disk failure, while 3 way can and will cover 2 failed disk .

Oher than slow write speed, there is no risk whatsoever in using non-PLP SSDs. Writes are only acknowleged once they have been fully written. That's why they're so slow in Storage Spaces.
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
That Backplane has SES 2.0, so you should be fine. The SES requirement will only be removed with WS2019, at least in terms of

You can use them, sure. 2 or 3 way mirroring is fine. with 3 disks and 2 way mirroring, you'd get twice the write performance compared to 3 way mirror. Unlike Raid 1, 2 way mirror can spread data over an uneven number of disks. Of course 2 way mirror can only cover 1 disk failure, while 3 way can and will cover 2 failed disk .

Oher than slow write speed, there is no risk whatsoever in using non-PLP SSDs. Writes are only acknowleged once they have been fully written. That's why they're so slow in Storage Spaces.
psannz, you seem to have quite a good grasp on this subject. Do you know of any lists or any SATA/NVMe SSD models that are guaranteed to work with S2D? It's difficult to track down compatible drives as Samsung listing their drives as PLP capable confuses the matter and I can't seem to find any lists of what will actually work properly!

Thanks
 
  • Like
Reactions: psannz

psannz

Member
Jun 15, 2016
79
19
8
39
psannz, you seem to have quite a good grasp on this subject. Do you know of any lists or any SATA/NVMe SSD models that are guaranteed to work with S2D? It's difficult to track down compatible drives as Samsung listing their drives as PLP capable confuses the matter and I can't seem to find any lists of what will actually work properly!

Thanks
Your primary source should be the Windows Server Catalog: Windows Server Catalog

For disk drives supporting S2D you need the chriteria Software-Defined Data Center (SDDC) Standard or Software-Defined Data Center (SDDC) Premium. For your purposes Standard is enough:
Windows Server Catalog

The difference between Standard and Premium:
http://download.microsoft.com/downl...ws_Server_2016_Software-defined_Solutions.pdf

Me personally, I like to check geizhals.de, too. They have an english site as well:
Solid State Drives (SSD) with Special features: Power-Loss Protection Skinflint Price Comparison UK

Some of the newer products may not be certified by Microsoft yet, but still provide it and be visible that way.

Hope that helps you.