I have a 6-node cluster of servers that are all configured nearly identically. I've installed Scale IO 2.0 and seem to get pretty below standard performance, or, at least lower than I would expect from a solid state storage pool.
So each node is basically an X9DRD-7LN4F-JBOD board with LSI 2308 in IT mode, x2 E5-2670 chips, and 64GB of RAM.
The primary circuit for the SAN is a 10 gigabit network, on a Quanta LB6M switch. Each node has a ConnectX-3 EN dual port card. I've configured jumbo frames at the switch and NIC and confirmed it with pings to appropriate IPs.
So when I have my 6 SSDs together in a storage pool and added to Server 2012 Failover Clustering as a Cluster Shared Volume, my 'test' is to install a VM onto the solid state storage pool. I was figuring that at a minimum I could at least get up to the write speed of one disk during an install, which is about 480~MB/s or so. Unfortunately when I start installing an OS to a VM at best I get like 130 MB/s write bandwidth, each node contributing about like 15-20MB/s or so. Pretty weak stuff.
I've referred to EMC's performance tuning guide but I don't think they've released an updated version for 2.0 yet. So the guide for 1.32 is a bit outdated in a few ways, firstly is that the "num_of_io_buffers" command option I think is removed from the SCLI and I can't find it in the GUI, so it must be related to the SDS/MDM/SDC performance setting in the GUI which I've set to "High" for all nodes/services.
Secondly the guide says to configure the SDS cfg file with special paramters if it's running on flash based storage, when I did that the SDS' were not able to connect to the MDMs, so that obviously doesn't work. There's also a recommendation to remove AckDelay from tcpip in the registry, however, this doesn't apply to Server 2012 R2 anymore. The last recommend is to change the values of DWORDs in the 'scini' registry which weren't there for me, so, I added them, but I'm not sure it made any difference.
I'm hoping there's someone else out there who has a similar setup or has troubleshooted similar performance issues before who can help point me in the right direction. My next step is to remove Hyper-V virtual network adapters and multiplexor drivers and see if that's causing overhead latency somewhere. Cause I'm really not sure what else to set or do. When I remove an SSD from the pool/cluster and test it with something like CrystalDiskMark I get the advertised speeds from the manufacturer. The SSDs in question are 960GB SanDisk Cloudspeed Ascends.
Thanks for any insight.
So each node is basically an X9DRD-7LN4F-JBOD board with LSI 2308 in IT mode, x2 E5-2670 chips, and 64GB of RAM.
The primary circuit for the SAN is a 10 gigabit network, on a Quanta LB6M switch. Each node has a ConnectX-3 EN dual port card. I've configured jumbo frames at the switch and NIC and confirmed it with pings to appropriate IPs.
So when I have my 6 SSDs together in a storage pool and added to Server 2012 Failover Clustering as a Cluster Shared Volume, my 'test' is to install a VM onto the solid state storage pool. I was figuring that at a minimum I could at least get up to the write speed of one disk during an install, which is about 480~MB/s or so. Unfortunately when I start installing an OS to a VM at best I get like 130 MB/s write bandwidth, each node contributing about like 15-20MB/s or so. Pretty weak stuff.
I've referred to EMC's performance tuning guide but I don't think they've released an updated version for 2.0 yet. So the guide for 1.32 is a bit outdated in a few ways, firstly is that the "num_of_io_buffers" command option I think is removed from the SCLI and I can't find it in the GUI, so it must be related to the SDS/MDM/SDC performance setting in the GUI which I've set to "High" for all nodes/services.
Secondly the guide says to configure the SDS cfg file with special paramters if it's running on flash based storage, when I did that the SDS' were not able to connect to the MDMs, so that obviously doesn't work. There's also a recommendation to remove AckDelay from tcpip in the registry, however, this doesn't apply to Server 2012 R2 anymore. The last recommend is to change the values of DWORDs in the 'scini' registry which weren't there for me, so, I added them, but I'm not sure it made any difference.
I'm hoping there's someone else out there who has a similar setup or has troubleshooted similar performance issues before who can help point me in the right direction. My next step is to remove Hyper-V virtual network adapters and multiplexor drivers and see if that's causing overhead latency somewhere. Cause I'm really not sure what else to set or do. When I remove an SSD from the pool/cluster and test it with something like CrystalDiskMark I get the advertised speeds from the manufacturer. The SSDs in question are 960GB SanDisk Cloudspeed Ascends.
Thanks for any insight.