S2D gurus, performance again.

Jeff Robertson · Nov 21, 2018

Hi, I've just rebuilt my 2 node cluster using Server 2019 and have set up S2d. I've been running 2016 since it came out and have always experienced erratic performance, that was using 4x Samsung SSDs (PM863) per node. Performance wasn't bad but I always thought it should be better. So now I have two nodes running 2019 each with 4x Toshiba HK4 SSDs (on the MS SDDC list) and 2x Intel P3700 NVMe SSDs (also on the list). The nodes do tell me the storage pool is power protected so I think the hardware is working as it should. I've created a couple of volumes using these commands:

New-Volume -StoragePoolFriendlyName S2D* -FriendlyName CSV-01 -FileSystem CSVFS_REFS -StorageTierFriendlyName MirrorOnSSD, Capacity -StorageTierSizes 180GB, 2000GB
New-Volume -StoragePoolFriendlyName S2D* -FriendlyName CSV-02 -FileSystem CSVFS_REFS -StorageTierFriendlyName MirrorOnSSD, Capacity -StorageTierSizes 180GB, 1500GB

My only real concern is with the small writes, 4k at qd1. It appears to be no better than the old setup (or not much) and strangely noticeably worse than a test I did a few months back using a couple of random desktops with the 4x Toshiba drives by themselves in each node. I have verified that tiering is working, windows admin center shows heavy writes to the P3700s when benchmarking. Here is a quick benchmark that shows what I'm talking about, look at the bottom right number, 4k writes are pretty low:

I fully admit I could be off my rocker even complaining but if anyone has any ideas on how to improve performance i would appreciate it! There are only a couple of test VMs on the cluster so I can blow it out and start over if need be.

A bit more info about each node:
E5-2699v4
256GB RAM
Supermicro X10 board with built in LSI 3008 flashed to IT mode (Toshiba SATA SSDs are running on this)
Connectx3-Pro Nic for coms, verified RDMA is working properly @ 40Gbps for the storage network

zkrr01 · Nov 21, 2018

This is what I get on Microsoft Server 2019 with a Samsung SSD 970 pro

Jeff Robertson · Nov 21, 2018

zkrr01 said:
This is what I get on Microsoft Server 2019 with a Samsung SSD 970 pro

That is much higher than what I'm able to pull off. If I can even get close to half that I would be happy!

PnoT · Nov 22, 2018

I had a 2 node S2D cluster running over IB using RDMA with 4 Samsung PM853T in each. I believe this was a snap when it was first built and while the Seq isn't quite there vs. your cluster the write certain is.

What controllers are you using btw and how did you create your pool?

Jeff Robertson · Nov 22, 2018

PnoT said:
I had a 2 node S2D cluster running over IB using RDMA with 4 Samsung PM853T in each. I believe this was a snap when it was first built and while the Seq isn't quite there vs. your cluster the write certain is.

View attachment 9664

What controllers are you using btw and how did you create your pool?

I'm beginning to think what I'm experiencing is just a limitation of a 2node configuration. The Intel P3700s are NVMe drives and are plugged right into the board. The 4x Toshiba H4k drives are running off of an onboard LSI 3008 controller in IT mode.

I created the pool by running enable-clusters2d. I tried it a few different ways with different flags but nothing made a difference. I've created volumes quite a few different ways but settled on this power shell command: New-Volume -StoragePoolFriendlyName S2D* -FriendlyName CSV-01 -FileSystem CSVFS_REFS -StorageTierFriendlyName MirrorOnSSD, Capacity -StorageTierSizes 180GB, 2000GB. I've created tiered volumes as well as non tiered volumes, no real change, I'm beginning to think the P3700s aren't worth keeping in the systems.

I've disabled all of the P3700s and enabled s2d with just the Toshiba drives, same low performance. I've then disabled the Toshiba drives and run just the P3700s and... exact same performance. I'm pretty much ready to give up trying to figure out this performance thing and just run with what I've got. If you have any thoughts let me know!

Thanks,

Jeff R.

PnoT · Nov 22, 2018

What type of networking are you using between the boxes IB?

Jeff Robertson · Nov 22, 2018

PnoT said:
What type of networking are you using between the boxes IB?

I'm running a straight point to point Ethernet network with rdma enabled (verified working, performance is significantly worse with rdma disabled). Data center bridging is installed and configurat even though the nodes are directly connected via fiber.

PnoT · Nov 23, 2018

Have you turned on Integrity Streams by accident? ReFS integrity streams What about also looking to make sure that the drive cache is enabled on all your disks?

You could setup a RAM drive on each host and throw a large ISO in it and run a copy operation just to make sure nothing is off with your 40Gb network as it should max out close to your RAM speeds and rule out that piece of the puzzle. Just throwing some ideas out there....

Jeff Robertson · Nov 23, 2018

PnoT said:
Have you turned on Integrity Streams by accident? ReFS integrity streams What about also looking to make sure that the drive cache is enabled on all your disks?

You could setup a RAM drive on each host and throw a large ISO in it and run a copy operation just to make sure nothing is off with your 40Gb network as it should max out close to your RAM speeds and rule out that piece of the puzzle. Just throwing some ideas out there....

Are integrity streams turned on by default? My understanding is they aren't. BUT I did use set-fileintegrity to turn it off with the test results being... the same. Which drive cache are you referring to? If you know the command I can use to check it I'll do so!

I've verified I can get close to 40Gbps over the link, I set live migrations to use the same link and was pushing almost 30Gbps moving a single VM back and forth so I don't think there is a bandwidth problem!

Thanks,

Jeff R.

zkrr01 · Nov 23, 2018

What I don't understand is your low write numbers on the CrystalMark tests. The only things that matter on that test would be the cpu and the ssd. Your cpu compared to mine is like comparing the fastest race car in the world to a snail. And the speed of the network has no affect on the test.

Jeff Robertson · Nov 23, 2018

zkrr01 said:
What I don't understand is your low write numbers on the CrystalMark tests. The only things that matter on that test would be the cpu and the ssd. Your cpu compared to mine is like comparing the fastest race car in the world to a snail. And the speed of the network has no affect on the test.

This whole upgrade has been extremely frustrating. I've tried everything I can to speed things up but it just ignores whatever I do and performs exactly the same. I made sure to get parts that are certified on for SDDC premium just to make sure this doesn't happen. My only complaint with my 2016 cluster was the write performance and it seems those issues have followed me to 2019 as well. My gut tells me there is a single switch I need to flip to get the proper performance, finding it has been impossible so far!

zkrr01 · Nov 23, 2018

Is the ssd's you are using under server 2019 the same as you were using under server 2016? If so, could something happen to slow them down?

cesmith9999 · Nov 23, 2018

benchmark all of the disks, S2D is VERY sensitive to disks that are slow.

get-storagereliabilitycounter is your performance friend, make sure that you run reset-storagereliabilitycounter to zero out the counters.

many many times when I saw slow performance. it was due to a disk starting to be bad. and replaced it before it killed the VDISK.

If I had access to GITHUB from work, I would also point you to a diagnostic tool for S2D.

Chris

Jeff Robertson · Nov 23, 2018

zkrr01 said:
Is the ssd's you are using under server 2019 the same as you were using under server 2016? If so, could something happen to slow them down?

They are different. I was using 8x Samsung PM863 drives and now I'm using 8x Toshiba H4k drives plus 4x Intel P3700 drives.

Jeff Robertson · Nov 23, 2018

cesmith9999 said:
benchmark all of the disks, S2D is VERY sensitive to disks that are slow.

get-storagereliabilitycounter is your performance friend, make sure that you run reset-storagereliabilitycounter to zero out the counters.

many many times when I saw slow performance. it was due to a disk starting to be bad. and replaced it before it killed the VDISK.

If I had access to GITHUB from work, I would also point you to a diagnostic tool for S2D.

Chris

I had never heard of the get-storagereliabilitycounter command. I've been running it against the cluster (which I rebuilt from scratch) and this is what I get:

I've run it against all of the drives and they all show similar results, it doesn't *appear* any of them are failing. Any other troubleshooting tips would be welcome! I rebuilt the cluster from scratch again with the same results, the only thing I did different this time was to stick with the built in drivers instead of updating them first, no change of course.

cesmith9999 · Nov 23, 2018

Please run the command again and add

| fl *

There are more counters than what is shown.

Chris. One counter is max latency

Jeff Robertson · Nov 23, 2018

cesmith9999 said:
Please run the command again and add

| fl *

There are more counters than what is shown.

Chris. One counter is max latency

Ok, ran it again with | fl* but the results look the same, not sure what that means:

I did do some more digging and it looks like I have an oddball Toshiba drive, it is a 512/512 vs 4096/4096 like the rest of the drives:

I've heard that can make a difference but I wouldn't expect it to bring performance down to where it's at.

Jeff Robertson · Nov 26, 2018

Did some more experimenting. Rebuilt the cluster again with just the Toshiba drives plugged in, then just the P3700s, then both. Performance is still low as expected. I may have to just run with it if I can't figure it out soon, I've got a couple dozen VMs that I need to load onto it. I did load up windows admin center to get some graphs, I checked each drive individually, the P3700s have an average write latency of about 60u and the Toshibas about 85u. All 12 of them seem to be consistent, no weird spikes or glitches so I think all of the drives are performing up to spec. Here is a graph from one of the drives, they all look similar:

The only concerning thing I found was the volume write latency, it seems to fluctuate between 600u and 800u:

Maybe this is normal? Seems high to me and may indicate more of a network issue than a drive issue?

I'm starting to run out of ideas to try, I'm at spot where I can completely wipe the whole thing so if you have any ideas, even oddball ones this is a good time to try!

Jeff Robertson · Nov 26, 2018

So I threw some old emulex 10Gbps cards in hoping they supported RDMA, they didn't. But I did test again while one node was down and managed more than double the 4K writes, to between 22-25.

Rand__ · Nov 26, 2018

While one node was down?
And half the speed when it needs to sync over network then?

S2D gurus, performance again.

Active Member

Member

Attachments

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Member

Active Member

Member

Well-Known Member

Active Member

Active Member

Well-Known Member

Active Member

Active Member

Active Member

Well-Known Member