S2D slow write (no parity & NVMe cache)

Discussion in 'Windows Server, Hyper-V Virtualization' started by Cutha, May 22, 2018.

  1. Cutha

    Cutha Member

    Joined:
    Sep 24, 2016
    Messages:
    52
    Likes Received:
    2
    With S2D I am getting terrible writes no matter what I try. This is not a post about slow writes because of a parity setup as I have my stuff configured as all mirror. It seems like the cache devices are never written to. At first I thought I needed to use SOFS but I am using this in a hyper converged setup and going by Deploy Storage Spaces Direct (Step 4) it is not required. I tried SOFS and General File Server as cluster roles with storage attached in various ways but the results were always pretty poor.

    To keep it simple there is one VM allocated with 8 vCPU’s and a drive (VHDX fixed) for benchmarking against. Currently I have no FS roles setup (SOFS or General) and I am just using the C:\ClusterStorage\. I have tried it with SOFS and General but from what I can tell I shouldn’t need it for the setup I am using. At some point I would like to have a share for client workstation VHDX’s with the quickest configuration possible, CA not required but HA would be nice.

    RDMA is tested and working and drives perform as expected on their own. I am waiting until I get this sorted out before I deploy this so at the moment I can do whatever configuring and testing is required. I am stumped and disappointed with myself. I have spent 2 years gathering the hardware to make this work and for the life of me I can't get it to perform. Any help would be appreciated.

    I have 4 nodes and each node has:
    2xNVMe 250GB (SM961, has PLP)(Does NOT have PLP, I was incorrect)
    3xSSD 480GB (MZ7WD480, has PLP)
    4xHDD 4TB (consumer grade SATA)
    ConnectX-2 dual port
    2xE5-2670v1
    96GB RAM
    Windows Server 2016

    I was a little confused as to which performance counters I should be looking at to see if the cache was being used but from everything I could see, it was not being used during any of my tests. I have read everything I could find online that I think might help me and I don't know what to try next.

    Benchmarks on the client with the client VHDX on different volumes. The info on the volumes is below and they are in the same order.
    [​IMG]

    Code:
    New-Volume -FriendlyName "CSV-1" -FileSystem CSVFS_ReFS -StoragePoolFriendlyName S2D* -Size 100GB
    
    New-Volume -FriendlyName "Test0SSDOnly" -FileSystem CSVFS_ReFS -StoragePoolFriendlyName S2D* -Size 10GB -ResiliencySettingName Mirror
    
    New-Volume -FriendlyName "Test2MRV" -FileSystem CSVFS_ReFS -StoragePoolFriendlyName S2D* -StorageTierFriendlyNames Performance, Capacity -StorageTierSizes 50GB, 50GB
    
    New-Volume -FriendlyName "Test3Tiered" -FileSystem CSVFS_ReFS -StoragePoolFriendlyName S2D* -StorageTierFriendlyNames Performance, CapacityMirror -StorageTierSizes 50GB, 50GB
    Individual drives I tested some time ago:

    1xNVMe
    [​IMG]

    1xSATA SSD
    [​IMG]


    Code:
    Get-ClusterS2D
    
    CacheMetadataReserveBytes : 34359738368
    CacheModeHDD              : ReadWrite
    CacheModeSSD              : WriteOnly
    CachePageSizeKBytes       : 16
    CacheState                : Enabled
    State                     : Enabled
    Code:
    Get-ClusterStorageSpacesDirect
    
    CacheMetadataReserveBytes : 34359738368
    CacheModeHDD              : ReadWrite
    CacheModeSSD              : WriteOnly
    CachePageSizeKBytes       : 16
    CacheState                : Enabled
    State                     : Enabled
    Code:
    Get-StorageTier | FT FriendlyName, ResiliencySettingName, MediaType, PhysicalDiskRedundancy -autosize
    
    FriendlyName         ResiliencySettingName MediaType PhysicalDiskRedundancy
    ------------         --------------------- --------- ----------------------
    CapacityMirror       Mirror                HDD                            2
    Capacity             Parity                HDD                            2
    Performance          Mirror                SSD                            2
    
    I wanted to use the HDD's in a 3 way mirror so I created the CapacityMirror with:
    Code:
    $hddTierMirror = New-StorageTier -StoragePoolFriendlyName "S2D on theCluster" -MediaType HDD -FriendlyName "CapacityMirror" -ResiliencySettingName Mirror -NumberOfDataCopies 3
    I set the CSV cache with:
    Code:
    (Get-Cluster $ClusterName).BlockCacheSize = $CSVCacheSize
    Code:
    Get-StorageSubSystem *cluster* | Get-StorageHealthReport
    CPUUsageAverage                 :   2.52 %
    CapacityPhysicalPooledAvailable :  45.26 TB
    CapacityPhysicalPooledTotal     :  65.32 TB
    CapacityPhysicalTotal           :  65.32 TB
    CapacityPhysicalUnpooled        :      0 B
    CapacityVolumesAvailable        :   5.83 TB
    CapacityVolumesTotal            :   5.91 TB
    IOLatencyAverage                :      0 ns
    IOLatencyRead                   :      0 ns
    IOLatencyWrite                  :      0 ns
    IOPSRead                        :      0 /S
    IOPSTotal                       :      0 /S
    IOPSWrite                       :      0 /S
    IOThroughputRead                :      0 B/S
    IOThroughputTotal               :      0 B/S
    IOThroughputWrite               :      0 B/S
    MemoryAvailable                 : 377.19 GB
    MemoryTotal                     :    416 GB
    
    ExtendedStatus :
    ReturnValue    : 0
    PSComputerName :
    
    Code:
    Get-PhysicalDisk
    
    FriendlyName               SerialNumber         CanPool OperationalStatus HealthStatus Usage            Size
    ------------               ------------         ------- ----------------- ------------ -----            ----
    ATA INTEL SSDSA2BW16       CVPR1290041L160DGN   False   OK                Healthy      Auto-Select 149.05 GB
    ATA SAMSUNG MZ7WD480       S1G1NYAF926440       False   OK                Healthy      Auto-Select    447 GB
    ATA SAMSUNG MZ7WD480       S16MNYAD911565       False   OK                Healthy      Auto-Select    447 GB
    ATA SAMSUNG MZ7WD480       S16MNYAF605357       False   OK                Healthy      Auto-Select    447 GB
    ATA SAMSUNG MZ7WD480       S16MNYAD911436       False   OK                Healthy      Auto-Select    447 GB
    SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E84F. False   OK                Healthy      Journal     238.25 GB
    ATA ST4000DM005-2DP1       WDH1L3DG             False   OK                Healthy      Auto-Select   3.64 TB
    SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E88B. False   OK                Healthy      Journal     238.25 GB
    ATA ST4000DM005-2DP1       ZDH0L26D             False   OK                Healthy      Auto-Select   3.64 TB
    SAMSUNG MZVPW256HEGL-00000 0025_38CB_6100_B550. False   OK                Healthy      Journal     238.25 GB
    ATA SAMSUNG MZ7WD480       S16MNEAD740753       False   OK                Healthy      Auto-Select    447 GB
    ATA ST4000DM005-2DP1       ZDH0X9QM             False   OK                Healthy      Auto-Select   3.64 TB
    SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E846. False   OK                Healthy      Journal     238.25 GB
    ATA ST4000VN000-1H41       Z301MZSZ             False   OK                Healthy      Auto-Select   3.64 TB
    SAMSUNG MZVPW256HEGL-00000 0025_38CB_6100_BFA2. False   OK                Healthy      Journal     238.25 GB
    ATA SAMSUNG MZ7WD480       S16MNEAD500108       False   OK                Healthy      Auto-Select    447 GB
    ATA ST4000DM005-2DP1       ZDH0BSBV             False   OK                Healthy      Auto-Select   3.64 TB
    ATA ST4000VN000-1H41       Z306C6GF             False   OK                Healthy      Auto-Select   3.64 TB
    SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E8EF. False   OK                Healthy      Journal     238.25 GB
    ATA ST4000VN000-2AH1       WDH0KMY3             False   OK                Healthy      Auto-Select   3.64 TB
    SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E84A. False   OK                Healthy      Journal     238.25 GB
    ATA ST4000VN000-1H41       Z3060E2J             False   OK                Healthy      Auto-Select   3.64 TB
    ATA ST4000DM005-2DP1       WDH1L39R             False   OK                Healthy      Auto-Select   3.64 TB
    ATA ST4000VN000-2AH1       WDH0KN75             False   OK                Healthy      Auto-Select   3.64 TB
    ATA SAMSUNG MZ7WD480       S16MNYAF604909       False   OK                Healthy      Auto-Select    447 GB
    ATA ST4000VN000-2AH1       WDH0KNEM             False   OK                Healthy      Auto-Select   3.64 TB
    ATA ST4000VN000-1H41       Z301NEKY             False   OK                Healthy      Auto-Select   3.64 TB
    ATA ST4000VN000-1H41       Z304XGDB             False   OK                Healthy      Auto-Select   3.64 TB
    ATA ST4000DM005-2DP1       ZDH0X90N             False   OK                Healthy      Auto-Select   3.64 TB
    SAMSUNG MZVPW256HEGL-00000 0025_38CC_6100_E8C7. False   OK                Healthy      Journal     238.25 GB
    ATA SAMSUNG MZ7WD480       S1G1NYAF921789       False   OK                Healthy      Auto-Select    447 GB
    ATA SAMSUNG MZ7WD480       S1G1NYAF925500       False   OK                Healthy      Auto-Select    447 GB
    ATA SAMSUNG MZ7WD480       S16MNYAF114484       False   OK                Healthy      Auto-Select    447 GB
    ATA ST4000DM005-2DP1       ZDH0X9SJ             False   OK                Healthy      Auto-Select   3.64 TB
    ATA SAMSUNG MZ7WD480       S16MNYAF334083       False   OK                Healthy      Auto-Select    447 GB
    ATA SAMSUNG MZ7WD480       S1G1NYAF921332       False   OK                Healthy      Auto-Select    447 GB
    ATA ST4000VN000-2AH1       WDH0KNE5             False   OK                Healthy      Auto-Select   3.64 TB
    
     
    #1
    Last edited: Jun 11, 2018
  2. psannz

    psannz Member

    Joined:
    Jun 15, 2016
    Messages:
    38
    Likes Received:
    8
    You do realise, that the SM961 does NOT have PLP, right? You might be mixing it up with the SM963, which does have PLP...
    That's reason #1 for bad write performance.

    Next NVMe point is the driver installation:
    Check the device manager, whether the Samsung NVMe drivers are installed, or just the "Standard NVM Express Controller" drivers. The latter have crappy performance.

    Also: Are you running those ConnectX-2 in Infiniband mode?
    Afaik, full RoCEv2+ and SMBdirect is only supported from the X-3 on in Ethernet mode.
     
    #2
    Last edited: May 22, 2018
  3. cesmith9999

    cesmith9999 Well-Known Member

    Joined:
    Mar 26, 2013
    Messages:
    1,028
    Likes Received:
    313
  4. Cutha

    Cutha Member

    Joined:
    Sep 24, 2016
    Messages:
    52
    Likes Received:
    2
    Thanks for the replies guys, I appreciate it.

    Well that sucks. I have had it in my head that the SM961 had PLP so when I was going over everything again and I saw that as a requirement I didn't even give it a second thought. Is there a power shell command that would show that they don't have PLP or some other command that would show that they aren't being used for cache devices?

    Stupid thing is, I posted about a year ago looking for NVMe drives for this build and clearly I knew they didn't have PLP. Arg.
    Help selecting NVMe drives to finish build

    Do you know if the SM953's would work ok for S2D? They seem to be available and the cheapest fix until I can beef it up. I think I recently read somewhere that S2D now only requires one caching device per node. If that is true the cost to get this up and running will not be too terrible. The Intel P3700's seem to be available but they are much more costly then the SM953's. I was excited about the Optane drives but the P900 series does not have PLP.

    The scale of the slow writes is well beyond bad drivers and I have done this setup a few times before and previously I made sure to get the Samsung drivers installed and I even saved the benchmarks from pre/post drive change. I forgot this time. I tried to get the drivers updated today but Windows is resisting me. As they don't have PLP anyway I will be pulling them from the servers so I have stopped trying to update the drivers.


    Thanks for the guides Chris. I will be saving that one, very nice step by step directions.

    I am pretty sure RoCE is working because I can transfer files quickly between the cluster nodes and the IB NIC's don't show the traffic. When I built out my 2 node cluster for proof of concept I went through the steps of getting RoCE configured but I didn't bother this time, because I forgot. I went through the steps in the guide cesmith9999 provided and I received an error when trying to apply the SMB QoS and when I looked up the error it was because the ConnectX-2 cards don't support it. RDMA works and I will work on upgrading the IB NIC's after I get the NVMe drives sorted out.

    One of the ports on the IB NIC is on a subnet that is used only for cluster storage. If RDMA is working but the QoS is not setup will this be a problem in my 4 node setup? I found this technet blog, to rdma or not to rdma that makes me think it is not required.

    My IB switch is a VLT-30111-GRID DIRECTOR 4036 that has the firmware updated based on the guide on this forum.

    I am going to remove the NVMe drives from the pool and see what happens.
     
    #4
  5. psannz

    psannz Member

    Joined:
    Jun 15, 2016
    Messages:
    38
    Likes Received:
    8
    #5
  6. Cutha

    Cutha Member

    Joined:
    Sep 24, 2016
    Messages:
    52
    Likes Received:
    2
    Great web site, thanks.

    I removed the NVMe drives from the pool and tried a couple quick tests with the SSDs and HDD's but the results were not great. I then tried changing the 12 SSD's to Journal drives and so far the results are still about the same, ~3500MB read and < 100MB write.
     
    #6
  7. Cutha

    Cutha Member

    Joined:
    Sep 24, 2016
    Messages:
    52
    Likes Received:
    2
    If I use this:
    Code:
    Set-ClusterS2D -CacheModeSSD ReadOnly
    Will it allow the physical cache on the SATA SSD's (MZ7WD480HCGM-00003, 99% sure they have PLP) to be utilized bypassing the NVMe drives for writes?
     
    #7
    Last edited: May 26, 2018
  8. sth

    sth Active Member

    Joined:
    Oct 29, 2015
    Messages:
    203
    Likes Received:
    32
    #8
  9. psannz

    psannz Member

    Joined:
    Jun 15, 2016
    Messages:
    38
    Likes Received:
    8
    #9
  10. sth

    sth Active Member

    Joined:
    Oct 29, 2015
    Messages:
    203
    Likes Received:
    32
    Thats an interesting and useful clarification, thank you.

    The white paper you linked make reference to "For the SM953, the SFF-8639 form factor SSD supports the hot plug function; multiple tantalum capacitors ensure stable data integrity even when the system is in sudden power off recovery (SPOR) state. Currently, however, the M.2 form factor SSD does not support this function."

    Can you explain the difference between a true enterprise level end-to-end data protection of devices say like a P3700 and this 'data at rest' protection. I'm interested in fully understanding the implications of these differences under ZFS and Storage Spaces.
     
    #10
  11. psannz

    psannz Member

    Joined:
    Jun 15, 2016
    Messages:
    38
    Likes Received:
    8
    I can not speak for ZFS, but as far as Windows / Storage Spaces go, it comes down to whether Microsoft sees the SSD as fully protected (end-to-end) from any and all power events, or not. If it is, the flag IsPowerProtected is set $true in the Get-StorageAdvancedProperty query of the disk, which is in turn checked by Storage Spaces.
    Your problem with this IsPowerProtected is that it's a ReadOnly flag (MSFT_PhysicalDisk class (Windows)).
    If Windows does not see end-to-end PLP, data integrity is no longer secured and thus any write must be fully written before it is acknoledged, which costs time and thus write performance.

    The "data at rest" PLP just means that a sudden power outage will not corrupt any data that is already written. That's very nice to have in a client system if e.g. your power company or your building's power distribution is unreliable.
    However from an enterprise point of view where data integrity counts..... basically, it's proper PLP or bust.
    And in my optinion MS is correct in not allowing the IsPowerProtected physical disk attribute to be changed.
     
    #11
    Cutha likes this.
  12. i386

    i386 Well-Known Member

    Joined:
    Mar 18, 2016
    Messages:
    1,428
    Likes Received:
    327
    #12
  13. psannz

    psannz Member

    Joined:
    Jun 15, 2016
    Messages:
    38
    Likes Received:
    8
    Different Parameter. You can set the StoragePool PowerProtection to $true, but the disks themselves are still recognized as $false.
    While setting the StoragePool to $true will increase the write speed, crashes and power failures will most certainly mess up your data.
     
    #13
  14. Cutha

    Cutha Member

    Joined:
    Sep 24, 2016
    Messages:
    52
    Likes Received:
    2
    Thanks!

    Using Get-StorageAdvancedProperty should make it very clear but it has returned some confusing results.

    I have 12 of the same SSD's and only 1 of the 4 nodes shows them as True for IsPowerProtected for it's 3 SSD's and 1 of the nodes only has 1/3 sSD's as True.

    Node01
    Code:
    Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty
    FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
    ------------         ------------   ---------------- --------------------
    ATA SAMSUNG MZ7WD480 S1G1NYAF926440             True                False
    ATA SAMSUNG MZ7WD480 S16MNEAD500108            False                False
    ATA SAMSUNG MZ7WD480 S16MNYAF334083            False                False
    Node02
    Code:
    Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty
    FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
    ------------         ------------   ---------------- --------------------
    ATA SAMSUNG MZ7WD480 S16MNYAD911565            False                False
    ATA SAMSUNG MZ7WD480 S16MNEAD740753            False                False
    ATA SAMSUNG MZ7WD480 S1G1NYAF921332            False                False
    Node03
    Code:
    Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty
    FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
    ------------         ------------   ---------------- --------------------
    ATA SAMSUNG MZ7WD480 S1G1NYAF921789            False                False
    ATA SAMSUNG MZ7WD480 S1G1NYAF925500            False                False
    ATA SAMSUNG MZ7WD480 S16MNYAF114484            False                False
    Node04
    Code:
    Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty
    
    FriendlyName         SerialNumber   IsPowerProtected IsDeviceCacheEnabled
    ------------         ------------   ---------------- --------------------
    ATA SAMSUNG MZ7WD480 S16MNYAF605357             True                False
    ATA SAMSUNG MZ7WD480 S16MNYAD911436             True                False
    ATA SAMSUNG MZ7WD480 S16MNYAF604909             True                False
    Node04 backplane is is dead so I removed it and scabbed in the cables to connect the drives until I get a new backplane. As the SSD's directly attached to the SAS HBA seemed to show up as properly power protected I thought the backplane might have something to do with it. I have since directly connected all the SSD's to the HBA adapter but that didn't seem to change anything.

    Any idea's on whats going on?
     
    #14
  15. Cutha

    Cutha Member

    Joined:
    Sep 24, 2016
    Messages:
    52
    Likes Received:
    2
    I think I got it. These benchmarks are from within a VM on the cluster. SSD's as cache, no NVMe drives.

    Mirror:
    [​IMG]

    Mirror:
    [​IMG]

    Mirror (QD8):
    [​IMG]

    Parity:
    [​IMG]
     
    #15
  16. psannz

    psannz Member

    Joined:
    Jun 15, 2016
    Messages:
    38
    Likes Received:
    8
    That's really weird.... Does your backplane support SES? If not that could explain things.
    Which Windows Server version are you trying? 2016 or 2019TP?

    Also, if you want support for non-PLP SSDs in S2D, there's a request open on the Windows Server UserVoice forums:
    Storage Spaces allow option for volatile cache (aka consumer SSDs)
     
    #16
  17. Cutha

    Cutha Member

    Joined:
    Sep 24, 2016
    Messages:
    52
    Likes Received:
    2
    The backplanes are SM bpn-sas2-826el1. I am not sure if it supports SES and the pdf user guide from SuperMicro doesn't mention SES.

    I am using Server 2016.

    What is the risk with non-PLP SSD's while using a 3 way mirroring?
     
    #17
  18. psannz

    psannz Member

    Joined:
    Jun 15, 2016
    Messages:
    38
    Likes Received:
    8
    That Backplane has SES 2.0, so you should be fine. The SES requirement will only be removed with WS2019, at least in terms of

    You can use them, sure. 2 or 3 way mirroring is fine. with 3 disks and 2 way mirroring, you'd get twice the write performance compared to 3 way mirror. Unlike Raid 1, 2 way mirror can spread data over an uneven number of disks. Of course 2 way mirror can only cover 1 disk failure, while 3 way can and will cover 2 failed disk .

    Oher than slow write speed, there is no risk whatsoever in using non-PLP SSDs. Writes are only acknowleged once they have been fully written. That's why they're so slow in Storage Spaces.
     
    #18
  19. Jeff Robertson

    Jeff Robertson Active Member

    Joined:
    Oct 18, 2016
    Messages:
    245
    Likes Received:
    51
    psannz, you seem to have quite a good grasp on this subject. Do you know of any lists or any SATA/NVMe SSD models that are guaranteed to work with S2D? It's difficult to track down compatible drives as Samsung listing their drives as PLP capable confuses the matter and I can't seem to find any lists of what will actually work properly!

    Thanks
     
    #19
    psannz likes this.
  20. psannz

    psannz Member

    Joined:
    Jun 15, 2016
    Messages:
    38
    Likes Received:
    8
    Your primary source should be the Windows Server Catalog: Windows Server Catalog

    For disk drives supporting S2D you need the chriteria Software-Defined Data Center (SDDC) Standard or Software-Defined Data Center (SDDC) Premium. For your purposes Standard is enough:
    Windows Server Catalog

    The difference between Standard and Premium:
    http://download.microsoft.com/downl...ws_Server_2016_Software-defined_Solutions.pdf

    Me personally, I like to check geizhals.de, too. They have an english site as well:
    Solid State Drives (SSD) with Special features: Power-Loss Protection Skinflint Price Comparison UK

    Some of the newer products may not be certified by Microsoft yet, but still provide it and be visible that way.

    Hope that helps you.
     
    #20
Similar Threads: slow write
Forum Title Date
Windows Server, Hyper-V Virtualization WS 2016 Slow VPN May 23, 2018
Windows Server, Hyper-V Virtualization Simple RAID0/Stripe using Windows Storage Spaces on Windows Server 2016/1709 very slow? Jan 18, 2018
Windows Server, Hyper-V Virtualization Slow LOCALHOST iperf on Windows (on esxi) Aug 19, 2017
Windows Server, Hyper-V Virtualization 2012R2 in colo - slow transfer speed in one direction only? May 12, 2016
Windows Server, Hyper-V Virtualization Slow mirrored tiered storage spaces Aug 25, 2015

Share This Page