S2D gurus, performance again.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
Hi, I've just rebuilt my 2 node cluster using Server 2019 and have set up S2d. I've been running 2016 since it came out and have always experienced erratic performance, that was using 4x Samsung SSDs (PM863) per node. Performance wasn't bad but I always thought it should be better. So now I have two nodes running 2019 each with 4x Toshiba HK4 SSDs (on the MS SDDC list) and 2x Intel P3700 NVMe SSDs (also on the list). The nodes do tell me the storage pool is power protected so I think the hardware is working as it should. I've created a couple of volumes using these commands:

New-Volume -StoragePoolFriendlyName S2D* -FriendlyName CSV-01 -FileSystem CSVFS_REFS -StorageTierFriendlyName MirrorOnSSD, Capacity -StorageTierSizes 180GB, 2000GB
New-Volume -StoragePoolFriendlyName S2D* -FriendlyName CSV-02 -FileSystem CSVFS_REFS -StorageTierFriendlyName MirrorOnSSD, Capacity -StorageTierSizes 180GB, 1500GB

My only real concern is with the small writes, 4k at qd1. It appears to be no better than the old setup (or not much) and strangely noticeably worse than a test I did a few months back using a couple of random desktops with the 4x Toshiba drives by themselves in each node. I have verified that tiering is working, windows admin center shows heavy writes to the P3700s when benchmarking. Here is a quick benchmark that shows what I'm talking about, look at the bottom right number, 4k writes are pretty low:

a.JPG

I fully admit I could be off my rocker even complaining but if anyone has any ideas on how to improve performance i would appreciate it! There are only a couple of test VMs on the cluster so I can blow it out and start over if need be.

A bit more info about each node:
E5-2699v4
256GB RAM
Supermicro X10 board with built in LSI 3008 flashed to IT mode (Toshiba SATA SSDs are running on this)
Connectx3-Pro Nic for coms, verified RDMA is working properly @ 40Gbps for the storage network
 

PnoT

Active Member
Mar 1, 2015
650
162
43
Texas
I had a 2 node S2D cluster running over IB using RDMA with 4 Samsung PM853T in each. I believe this was a snap when it was first built and while the Seq isn't quite there vs. your cluster the write certain is.


upload_2018-11-22_15-9-25.png

What controllers are you using btw and how did you create your pool?
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
I had a 2 node S2D cluster running over IB using RDMA with 4 Samsung PM853T in each. I believe this was a snap when it was first built and while the Seq isn't quite there vs. your cluster the write certain is.


View attachment 9664

What controllers are you using btw and how did you create your pool?
I'm beginning to think what I'm experiencing is just a limitation of a 2node configuration. The Intel P3700s are NVMe drives and are plugged right into the board. The 4x Toshiba H4k drives are running off of an onboard LSI 3008 controller in IT mode.

I created the pool by running enable-clusters2d. I tried it a few different ways with different flags but nothing made a difference. I've created volumes quite a few different ways but settled on this power shell command: New-Volume -StoragePoolFriendlyName S2D* -FriendlyName CSV-01 -FileSystem CSVFS_REFS -StorageTierFriendlyName MirrorOnSSD, Capacity -StorageTierSizes 180GB, 2000GB. I've created tiered volumes as well as non tiered volumes, no real change, I'm beginning to think the P3700s aren't worth keeping in the systems.

I've disabled all of the P3700s and enabled s2d with just the Toshiba drives, same low performance. I've then disabled the Toshiba drives and run just the P3700s and... exact same performance. I'm pretty much ready to give up trying to figure out this performance thing and just run with what I've got. If you have any thoughts let me know!

Thanks,

Jeff R.
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
What type of networking are you using between the boxes IB?
I'm running a straight point to point Ethernet network with rdma enabled (verified working, performance is significantly worse with rdma disabled). Data center bridging is installed and configurat even though the nodes are directly connected via fiber.
 
Last edited:

PnoT

Active Member
Mar 1, 2015
650
162
43
Texas
Have you turned on Integrity Streams by accident? ReFS integrity streams What about also looking to make sure that the drive cache is enabled on all your disks?

You could setup a RAM drive on each host and throw a large ISO in it and run a copy operation just to make sure nothing is off with your 40Gb network as it should max out close to your RAM speeds and rule out that piece of the puzzle. Just throwing some ideas out there....
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
Have you turned on Integrity Streams by accident? ReFS integrity streams What about also looking to make sure that the drive cache is enabled on all your disks?

You could setup a RAM drive on each host and throw a large ISO in it and run a copy operation just to make sure nothing is off with your 40Gb network as it should max out close to your RAM speeds and rule out that piece of the puzzle. Just throwing some ideas out there....
Are integrity streams turned on by default? My understanding is they aren't. BUT I did use set-fileintegrity to turn it off with the test results being... the same. Which drive cache are you referring to? If you know the command I can use to check it I'll do so!

I've verified I can get close to 40Gbps over the link, I set live migrations to use the same link and was pushing almost 30Gbps moving a single VM back and forth so I don't think there is a bandwidth problem!

Thanks,

Jeff R.
 

zkrr01

Member
Jun 28, 2018
106
6
18
What I don't understand is your low write numbers on the CrystalMark tests. The only things that matter on that test would be the cpu and the ssd. Your cpu compared to mine is like comparing the fastest race car in the world to a snail. And the speed of the network has no affect on the test.
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
What I don't understand is your low write numbers on the CrystalMark tests. The only things that matter on that test would be the cpu and the ssd. Your cpu compared to mine is like comparing the fastest race car in the world to a snail. And the speed of the network has no affect on the test.
This whole upgrade has been extremely frustrating. I've tried everything I can to speed things up but it just ignores whatever I do and performs exactly the same. I made sure to get parts that are certified on for SDDC premium just to make sure this doesn't happen. My only complaint with my 2016 cluster was the write performance and it seems those issues have followed me to 2019 as well. My gut tells me there is a single switch I need to flip to get the proper performance, finding it has been impossible so far!
 

zkrr01

Member
Jun 28, 2018
106
6
18
Is the ssd's you are using under server 2019 the same as you were using under server 2016? If so, could something happen to slow them down?
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
benchmark all of the disks, S2D is VERY sensitive to disks that are slow.

get-storagereliabilitycounter is your performance friend, make sure that you run reset-storagereliabilitycounter to zero out the counters.

many many times when I saw slow performance. it was due to a disk starting to be bad. and replaced it before it killed the VDISK.

If I had access to GITHUB from work, I would also point you to a diagnostic tool for S2D.

Chris
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
Is the ssd's you are using under server 2019 the same as you were using under server 2016? If so, could something happen to slow them down?
They are different. I was using 8x Samsung PM863 drives and now I'm using 8x Toshiba H4k drives plus 4x Intel P3700 drives.
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
benchmark all of the disks, S2D is VERY sensitive to disks that are slow.

get-storagereliabilitycounter is your performance friend, make sure that you run reset-storagereliabilitycounter to zero out the counters.

many many times when I saw slow performance. it was due to a disk starting to be bad. and replaced it before it killed the VDISK.

If I had access to GITHUB from work, I would also point you to a diagnostic tool for S2D.

Chris
I had never heard of the get-storagereliabilitycounter command. I've been running it against the cluster (which I rebuilt from scratch) and this is what I get:

a.JPG

I've run it against all of the drives and they all show similar results, it doesn't *appear* any of them are failing. Any other troubleshooting tips would be welcome! I rebuilt the cluster from scratch again with the same results, the only thing I did different this time was to stick with the built in drivers instead of updating them first, no change of course.
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
Please run the command again and add

| fl *

There are more counters than what is shown.

Chris. One counter is max latency
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
Please run the command again and add

| fl *

There are more counters than what is shown.

Chris. One counter is max latency
Ok, ran it again with | fl* but the results look the same, not sure what that means:

c.JPG

I did do some more digging and it looks like I have an oddball Toshiba drive, it is a 512/512 vs 4096/4096 like the rest of the drives:

b.JPG

I've heard that can make a difference but I wouldn't expect it to bring performance down to where it's at.
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
Did some more experimenting. Rebuilt the cluster again with just the Toshiba drives plugged in, then just the P3700s, then both. Performance is still low as expected. I may have to just run with it if I can't figure it out soon, I've got a couple dozen VMs that I need to load onto it. I did load up windows admin center to get some graphs, I checked each drive individually, the P3700s have an average write latency of about 60u and the Toshibas about 85u. All 12 of them seem to be consistent, no weird spikes or glitches so I think all of the drives are performing up to spec. Here is a graph from one of the drives, they all look similar:
Capture3.JPG

The only concerning thing I found was the volume write latency, it seems to fluctuate between 600u and 800u:
Capture2.JPG

Maybe this is normal? Seems high to me and may indicate more of a network issue than a drive issue?

I'm starting to run out of ideas to try, I'm at spot where I can completely wipe the whole thing so if you have any ideas, even oddball ones this is a good time to try!
 

Jeff Robertson

Active Member
Oct 18, 2016
429
115
43
Chico, CA
So I threw some old emulex 10Gbps cards in hoping they supported RDMA, they didn't. But I did test again while one node was down and managed more than double the 4K writes, to between 22-25.

Capture4.JPG
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
While one node was down?
And half the speed when it needs to sync over network then?