Expanders, dual ports and bandwidth OH MY

WeatherDave

New Member
May 4, 2017
7
1
3
46
Build Name: Dave's TestBed.
OS: Server 2016 Enterprise
CPU (per node): Dual 8 core (16 thread) Xeon v1
Motherboard: Reclaimed IBM M4 servers.
Chassis:
Drives: See below
RAM: 128G
Add in Cards (per node): LSI 9361-8i, M5110 (host OS), Mellanox VPI 40Gb cards.
Other Bits: DAS SAS3 Expanders.

My co-worker and I have been playing around with a 2 node 2016 HyperV Cluster here at work. We’ve got it running on a couple of older (reclaimed) Lenovo Servers (see above). For our limited testing, they work great, and lets us play and learn (tear apart, rebuild, relearn). Our goal is to eventually use this testing playground as proof-of-concept for a 2-3 node, 20 to 30 Hyper-V cluster, some of which would probably be 20-50 user databases.

We want to see what this thing will run like with Storage Spaced Direct, running an approach similar to a hyperconverged solution. We’d use the 40Gb Mellanox connection for the entirety of node-to-node communication and each node would have a SAS3 DAS unit. Something small, like the 12 bay, 2U Supermicro JBOD SAS3 units (826BE1C / 826BE2C). Storage would look like this:
* 8 x 6TB Seagate SAS3 drives (ST6000NM0105, 4kN, 7200 RPM (storage tier)
* 4 x MicronS630DC (2-3 DWPD) SAS3 960Gb SSD (caching tier)
* Total Capacity at build: 18TB.

We’re over-sizing the cache a *bit* due to our understanding that both reads and writes would be cached with this storage device layout (correct me if I’m wrong). However, I’ve gotten out of my depth due to ignorance of intricacies of SAS. I’ve read far too many web pages, wiki articles and spec sheets. So if you don’t mind a few questions:

1) Is there any chance we’d actually see improved performance of that *theoretical* 24Gb bandwidth (via the dual-port SAS3 devices) the vendors are fond of stating? I guess my major concern is those Supermicro Chassis. I've read they use re-branded LSI components, but that doesn't mean I understand all those intricacies. Given the $$$ involved, I'm just trying to make sure what I get from that setup beats a bunch of NL-SAS drives with standard SATA SSD's.

2) Our LSI cards. Nice cards, served us well as RAID cards, but are we wasting their potential in a S2D configuration? Would it make more sense to use something a lot more stripped down such as LSI SAS3008 based cards (i.e. Lenovo N2215)? Again, my concern is that stripped down controllers will lead to lowered performance, driving down the performance delta of the dual-port SAS3 channel versus NL-SAS and SATA SSD's.

Thank you all in advance for your help, suggestions and experience.

Dave
 

K D

Well-Known Member
Dec 24, 2016
1,426
305
83
30041
a 2-3 node, 20 to 30 Hyper-V cluster, some of which would probably be 20-50 user databases.
Not sure I follow this. Is it 2-3 node or 20-30? Based on what you describe, for each node you will have
1. Head with your mobo and an external hba
2. Jbod box in probably an 826 chassis with an expander backplane.

for storage spaces, an LSI 3008 based hba should work well. You can plan for one HBA for your SSDS that you have in your head node and one external hba for your spinners in your jbod. For For 8 spinners even a SAS2 HBA should have more than enough bandwidth.
 

WeatherDave

New Member
May 4, 2017
7
1
3
46
KD,

Thank you for the once over. You confirm what I had suspected. And yes, you were correct that this was a 2-3 node, not a 20-30. Should have clarified that a bit.

Dave
 

WeatherDave

New Member
May 4, 2017
7
1
3
46
Just a followup in case anyone else looked at this post (not that it was that great).

Wasn't able to utilize this solution. Seems these SuperMicro JBOD's units are not SES (SCSI Enclosure Services) capable. Note, these are the JBOD chassis (826_JBOD), not the normal chassis (which in case some do support SES (like the TQ chassis)). A quick message to support@ confirmed the lacking feature.

Almost pulled the trigger on purchase, but came across a message on another site that stated a July update (KB4025339) fixed a failed SES validation check. In a case mentioned there, someone with Dell 730XD's had to replace both HBA and Backplane. Cluster was valid before the update, and not after!