EMC KTN-STL3 15 bay chassis

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

AlistairM

New Member
Nov 2, 2023
9
0
1
unless you've got monetary issues with downtime I wouldn't go for multipathing, especially with a copy on write filesystem. And if I did have hefty SLA's, I go for clustering first since you cover the server going down in addition to the data path.

get rid of the HBA to top shelf link as well. If you want more bandwidth, make the top shelf attach only to port B of the HBA, and the remaining two shelves chained on port A
Updated image for anyone else who finds this helpful:
1698972586617.png
 

BrassFox

New Member
Apr 23, 2023
19
6
3
Would someone be able to run a sanity check over these items I'm about to purchase to make sure they would work together please?

3 times EMC KTN-STL3
45 times Seagate Exos X18 Enterprise 16TB SAS
2 times LSI 9200-8e 6Gbps
8 times SFF-8088 to SFF-8088
3 times 100-560-184 Rail Kit (I have a 29 inch deep rack, I'm not sure if these are compatible or not).


I plan to link the JBODs together to my existing NAS as such:

View attachment 32535

Assuming this is all good, if I were to continue extending the setup in the same way, adding more JBODs to the daisy chain, how many would I theoretically be able to add?
Use an LsI 9206-16e HBA and forget the speed penalty/bottlenecks of daisy-chaining anything. Each of those cards can run four disk shelves wired directly at the same speeds that the shelves can run at. Would need 8644-8088 cables for those, one per shelf.
 
  • Like
Reactions: AlistairM

AlistairM

New Member
Nov 2, 2023
9
0
1
Use an LsI 9206-16e HBA and forget the speed penalty/bottlenecks of daisy-chaining anything. Each of those cards can run four disk shelves wired directly at the same speeds that the shelves can run at. Would need 8644-8088 cables for those, one per shelf.
Brilliant idea, I'll do that, thank you.

As a side note, is there any concern of bottle-necking the hard drives? 15 drives per 6Gb/s port, each drive peaks at 258MiB*1.048 / 1000 * 8 = 2.16 Gb/s.
2.16 * 15 = 32.4 Gb/s > 6Gb/s

The 32 Gb/s is a theoretical maximum, but even so...
 

bonox

Member
Feb 23, 2021
87
20
8
that card has 8 channels across the two ports, so 48Gb/s. If you only use one port you're still at 24Gb/s aggregate capacity. You can choke it artificially if you create pools that manage to use only one channel, but if you're using everything, you'll see much more than 6Gb/s.

You're likely to choke the pcie 2.0 x8 bus that card sits on before the card itself exceeds the bandwidth at which it can talk to the drives. The pcie2 x8 16 port cards certainly will.
 
Last edited:
  • Like
Reactions: AlistairM

AlistairM

New Member
Nov 2, 2023
9
0
1
that card has 8 channels across the two ports, so 48Gb/s. If you only use one port you're still at 24Gb/s aggregate capacity. You can choke it artificially if you create pools that manage to use only one lane, but if you're using everything, you'll see much more than 6Gb/s.

You're likely to choke the pcie 2.0 x8 bus that card sits on before the card itself exceeds bandwidth. The pcie2 x8 16 port cards certainly will.
"The LSI SAS 9206-16e HBA provides sixteen lanes of 6Gb/s SAS storage connectivity and is matched with eight lanes of PCIe 3.0 8Gb/s host connectivity."

So if I have this right, each physical connection between the JBOD and the card has 4 lanes. Each of those lanes can carry 6 Gb/s, so 24Gb/s per physical connection to a JBOD.
Since I can attach 4 JBODs, this would give me, in aggregate, 4*24=96Gb/s. However this is bottle-necked by the 8 channels of PCIe at 8Gb/s, so this would be reduced to, in aggregate, 8*8=64Gb/s.
Since I'm using 3 JBODs instead of 4 does the 3*24=72Gb/s get spread over the 8 lanes of PCIe bottle-neck? Or does it get spread over 2*3=6 lanes of PCIe?
 

bonox

Member
Feb 23, 2021
87
20
8
the sff 8088 cables you'll be using carry four sas lanes, meaning 24Gb/s. If you connect that (singly or via daisy chaining) to one or more shelves, then that 24Gb/s is shared by however many drives are hanging off it. (45 in your original proposal)

If you have four ports, you've got 16 sas lanes, and if you divide them all equally, amongst the shelves, you'll see the 24x4=96Gb/s total bandwidth from card to shelves, less whatever the PCIe bus caps out at. So for 3 shelves, you'd share 24Gb/s to two shelves (30 drives) and 24Gb/s to one shelf (15 drives).

My post above was based on your 8 sas lane 9200-8e cards, which have x8 pcie connectors. If the 16e card has a x8 pcie connector you've face the same restrictions as the -8e card. If it has a x16 interface you'll get more total capacity. Same if you move from the SAS2008 pcie2.0 chipset to the 2308 pcie3.0, (or any of the newer 3000,4000,5000 series).

That being said, in practice you won't notice any performance hit from any of the proposed configs, except if you're only doing this to write tiny amounts of data to the outside of the disks for benchmark numbers. Once the pool has aged a bit and you're writing small files or dealing with seeks to put new copies of data in 'old' parts of the disks, your disk IOPS will constrain you more than any of the interfaces.
 
Last edited:
  • Like
Reactions: AlistairM

AlistairM

New Member
Nov 2, 2023
9
0
1
the sff 8088 cables you'll be using carry four sas lanes, meaning 24Gb/s. If you connect that (singly or via daisy chaining) to one or more shelves, then that 24Gb/s is shared by however many drives are hanging off it. (45 in your original proposal)

If you have four ports, you've got 16 sas lanes, and if you divide them all equally, amongst the shelves, you'll see the 24x4=96Gb/s total bandwidth from card to shelves, less whatever the PCIe bus caps out at. So for 3 shelves, you'd share 24Gb/s to two shelves (30 drives) and 24Gb/s to one shelf (15 drives).

My post above was based on your 8 sas lane 9200-8e cards, which have x8 pcie connectors. If the 16e card has a x8 pcie connector you've face the same restrictions as the -8e card. If it has a x16 interface you'll get more total capacity. Same if you move from the SAS2008 pcie2.0 chipset to the 2308 pcie3.0, (or any of the newer 3000,4000,5000 series).

That being said, in practice you won't notice any performance hit from any of the proposed configs, except if you're only doing this to write tiny amounts of data to the outside of the disks for benchmark numbers. Once the pool has aged a bit and you're writing small files or dealing with seeks to put new copies of data in 'old' parts of the disks, your disk IOPS will constrain you more than any of the interfaces.
Thanks for the clarification, that makes a lot of sense.

You're likely to choke the pcie 2.0 x8 bus that card sits on before the card itself exceeds the bandwidth at which it can talk to the drives. The pcie2 x8 16 port cards certainly will.
Ah, so that's why you recommended a pcie 3.0 card instead of a pcie 2.0 card, is that right? So the pcie 3 bus would be used instead of the pcie 2 bus?
 

bonox

Member
Feb 23, 2021
87
20
8
I didn't recommend any card, but i'm very happy with my older pcie2.0 (ie 9200 type cards).

I personally run 60 3.5" disks and 25 2.5" disks in 5 shelves off one of those old cards and since I don't run mirrors everywhere, i'm much more constrained in IOPS than card/bus bandwidth. You should calculate what you're trying to do/store/achieve and base your purchases off that. The older cards are not only cheap, they're supremely reliable and have good driver support. Doesn't mean the newer ones are not equally good but people like me aren't made of money and enjoy spending it without good reaosn.

I'd spend money on the newer ones if I have plans to use flash rather than rust, but i'm in the game for capacity rather than speed.
 
Last edited:

AlistairM

New Member
Nov 2, 2023
9
0
1
I didn't recommend any card, but i'm very happy with my older pcie2.0 (ie 9200 type cards).

I personally run 60 3.5" disks and 25 2.5" disks in 5 shelves off one of those old cards and since I don't run mirrors everywhere, i'm much more constrained in IOPS than card/bus bandwidth. You should calculate what you're trying to do/store/achieve and base your purchases off that. The older cards are not only cheap, they're supremely reliable and have good driver support. Doesn't mean the newer ones are equally good but people like me aren't made of money and enjoy spending it without good reaosn.
Ah sorry, I confused you for BrassFox. Thank you for your help, I'll bear what you said in mind.
 

Fiberton

New Member
Jun 19, 2022
17
1
3
Would someone be able to run a sanity check over these items I'm about to purchase to make sure they would work together please?

3 times EMC KTN-STL3
45 times Seagate Exos X18 Enterprise 16TB SAS
2 times LSI 9200-8e 6Gbps
8 times SFF-8088 to SFF-8088
3 times 100-560-184 Rail Kit (I have a 29 inch deep rack, I'm not sure if these are compatible or not).


I plan to link the JBODs together to my existing NAS as such:

View attachment 32535

Assuming this is all good, if I were to continue extending the setup in the same way, adding more JBODs to the daisy chain, how many would I theoretically be able to add?
I own 6 of these. I use a 9400-16e but the below picture is what would 100% work Inner port are input and outer are output for daisychaining these. Or you could link each card to 1 input which using sas drives would speed it up. I am getting 1100mbps writes on 15 drive raidz2 using TrueNAS SCALE
 

Attachments

Last edited:
  • Like
Reactions: AlistairM

Fiberton

New Member
Jun 19, 2022
17
1
3
unless you've got software that support multi-pathing, the right side of the connection tree is pointless. The Cards will say in their literature something like "Supports up to 254 devices" or some such, which means keep adding shelves up to that point, after which that's what you'd use the second card for. BTW, with your proposed setup, apps like Truenas will show you have 90 disks installed.
If you do a storcli /c0 show mine shows 180 drives... I only have 90 drives in it. I use TrueNAS Scale :) Do know any TrueNAS Scale install after 22.12.1 the fans on the unit connected directly to the HBA might have their fans ramp up. I am assuming it is an SES management signal from the new debian kernel I think is causing an issue. I am trying to figure that out. Running on 22.12.1 now and smooth as butter. The Disk arrays on the daisy chain are not affected. TrueNAS Scale is a solid platform to run these units on. I have put in a JIRA request to IXsystems to see what may be the issue.
 

Attachments

Last edited:

BrassFox

New Member
Apr 23, 2023
19
6
3
Ah sorry, I confused you for BrassFox. Thank you for your help, I'll bear what you said in mind.
Yea the 9206-16e is pcie3.0 x8. It is two 2308 SAS2 controllers on one card, with a third chip that acts like a PCH or chipset traffic cop. They're cheap for what they are, can find ised ones for $40 and new ones for about $100. I have three of these. They will need cooling fans.
I use lsiutil option 13 to set my cards to negotiate Sata2 speeds (3.0 Gbps) per each disk, as spinners can't use Sata3/6.0 anyway. Because unless you rig up the Megaraid Storage Manager program via java (which is a huge pain) and look, you would never know this: when filled, those shelves will negotiate some of the disks down to Sata1 speed, and that will be very noticeably slower overall. They get much quicker after you dictate/force the higher read/write speeds. You can get them all to negotiate at 6.0 but that isn't worth the trouble when they cant run at 3.0 past their cache.
Also don't neglect to set your queue depth to 32, most of these cards come set up for SAS drives with higher queue depths of I think 256.

Anyway- your bottleneck is indeed the SAS2 cable and that HBA gives four full speed cables, which I like to run one per shelf for obvious reasons. You could go with a higher rated card, but the shelf can’t use that extra speed.
 
Last edited:
  • Like
Reactions: AlistairM

BrassFox

New Member
Apr 23, 2023
19
6
3
Yea the 9206-16e is pcie3.0 x8. It is two 2308 SAS2 controllers on one card, with a third chip that acts like a PCH or chipset traffic cop. They're cheap for what they are, can find ised ones for $40 and new ones for about $100. I have three of these. They will need cooling fans.
I use lsiutil option 13 to set my cards to negotiate Sata2 speeds (3.0 Gbps) per each disk, as spinners can't use Sata3/6.0 anyway. Because unless you rig up the Megaraid Storage Manager program via java (which is a huge pain) and look, you would never know this: when filled, those shelves will negotiate some of the disks down to Sata1 speed, and that will be very noticeably slower overall. They get much quicker after you dictate/force the higher read/write speeds. You can get them all to negotiate at 6.0 but that isn't worth the trouble when they cant run at 3.0 past their cache.
Also don't neglect to set your queue depth to 32, most of these cards come set up for SAS drives with higher queue depths of I think 256.

Anyway- your bottleneck is indeed the SAS2 cable and that HBA gives four full speed cables, which I like to run one per shelf for obvious reasons. You could go with a higher rated card, but the shelf can’t use that extra speed.
Something else to note, if more speed is a concern: within the 15 slots are dividers that chop it up into three groups of five. The expanders have three channels, one per each of those groups of five. Those dividers are not just for show. So if you run 12 disks per shelf, or nine, or any multiple of three less than 15, then physically divide them evenly between the three groups. They’ll run measurably quicker in this way, compared to a full shelf. Another trick, if you're storing large media files: format them all to 64kb rather than the usual 4 or 8kb sectors. This dramatically reduces the needed IOPS for large files, which can be another speed bottleneck. There is a guy here who doesn't believe it, but regardless it works and is a well-known trick too.
 

Fiberton

New Member
Jun 19, 2022
17
1
3
in ZFS I use a 1M record size for datasets that will hold large files. ZFS normal record size is 128kb. Works like a charm.
 

BrassFox

New Member
Apr 23, 2023
19
6
3
in ZFS I use a 1M record size for datasets that will hold large files. ZFS normal record size is 128kb. Works like a charm.
That’s probably why my experiences vary from some of those here, as I’ve not made the leap to ZFS yet. I’m not even sure how to do that (and keep my all stuff active through the switch) without buying enough storage drives to hold another copy, or maybe by taking my working copy offline and having only the backup during the switch. So far I haven't found a compelling reason to go through all the trouble. My non-ZFS setup works well, I don't try to fix what isn’t broken. I may yet make a slow migration to ZFS one day though.

I still wonder why ZFS wouldn't also benefit from using larger cluster sizes from the start, but I don't have any experience with that file system so I don't know.

Reducing your physical storage bottlenecks is certainly a speed benefit however, no matter what file system you’re running, so minimizing how many drives talk through each SAS cable wherever you can (by adding more HBA channels, and not daisy-chaining, for example) must have a real world speed benefits and are easy things to do with cheap and available HBA cards that double up on SAS processors.
 

Fiberton

New Member
Jun 19, 2022
17
1
3
That’s probably why my experiences vary from some of those here, as I’ve not made the leap to ZFS yet. I’m not even sure how to do that (and keep my all stuff active through the switch) without buying enough storage drives to hold another copy, or maybe by taking my working copy offline and having only the backup during the switch. So far I haven't found a compelling reason to go through all the trouble. My non-ZFS setup works well, I don't try to fix what isn’t broken. I may yet make a slow migration to ZFS one day though.

I still wonder why ZFS wouldn't also benefit from using larger cluster sizes from the start, but I don't have any experience with that file system so I don't know.

Reducing your physical storage bottlenecks is certainly a speed benefit however, no matter what file system you’re running, so minimizing how many drives talk through each SAS cable wherever you can (by adding more HBA channels, and not daisy-chaining, for example) must have a real world speed benefits and are easy things to do with cheap and available HBA cards that double up on SAS processors.
For sure this does help. Also lowers latency. Using the 9400-16e with 2 daisy chains is much better than one HBA with all 6 daisy chained. Another thing to do is stagger the ports because of how quad cards lay things out. Connector 0 and 1 are on the same sas core. Connector 2 and 3 are on another sas core. So connector 0 and 2 to the first Enclosure and connector 1 and 3 to another. You are splitting the sas cores. Between two daisy chains. Thing to do would be for me to get a few more cards. The two connector cards have one sas core while the quad have two. These closures are quite fast. I am doing a scrub on one of my enclosures. In ZFS terms that is scanning everything and checking to make sure every byte of data is correct. " 77.3T issued at 1.90G/s, 122T total 0B repaired, 63.51% done, 06:39:19 to go " Scanning / checking it at 1.9G/s.
 

BrassFox

New Member
Apr 23, 2023
19
6
3
For sure this does help. Also lowers latency. Using the 9400-16e with 2 daisy chains is much better than one HBA with all 6 daisy chained. Another thing to do is stagger the ports because of how quad cards lay things out. Connector 0 and 1 are on the same sas core. Connector 2 and 3 are on another sas core. So connector 0 and 2 to the first Enclosure and connector 1 and 3 to another. You are splitting the sas cores. Between two daisy chains. Thing to do would be for me to get a few more cards. The two connector cards have one sas core while the quad have two. These closures are quite fast. I am doing a scrub on one of my enclosures. In ZFS terms that is scanning everything and checking to make sure every byte of data is correct. " 77.3T issued at 1.90G/s, 122T total 0B repaired, 63.51% done, 06:39:19 to go " Scanning / checking it at 1.9G/s.
That speed that you’re enjoying probably has more to do with the striping benefits inherent with RAID5 or 10 or whatever you've set up there, than it does with the ZFS file system itself. Not to sound like I’m knocking ZFS, because I do understand its unique merits. But the speed increases come from the RAID arrangement and that’s not exclusive to ZFS.

How did you get started with it? Or more accurately: what’s the easy path to getting started with ZFS?
 

Fiberton

New Member
Jun 19, 2022
17
1
3
That speed that you’re enjoying probably has more to do with the striping benefits inherent with RAID5 or 10 or whatever you've set up there, than it does with the ZFS file system itself. Not to sound like I’m knocking ZFS, because I do understand its unique merits. But the speed increases come from the RAID arrangement and that’s not exclusive to ZFS.

How did you get started with it? Or more accurately: what’s the easy path to getting started with ZFS?
I started using FreeBSD around 1998 and eventually OpenZFS came to BSD about 10 years later. The easiest way is to use something like TrueNAS Scale ( linux ) or Core ( FreeBSD ).
 

EugenPrusi

New Member
Nov 6, 2023
2
0
1
Thanks for the information! I'm also interested in learning more about the easy path to getting started with ZFS.
Do you have any other recommendations for resources on getting started with ZFS?
 

Fiberton

New Member
Jun 19, 2022
17
1
3
Thanks for the information! I'm also interested in learning more about the easy path to getting started with ZFS.
Do you have any other recommendations for resources on getting started with ZFS?

If you want to know more about ZFS and how it works meaning nuts and bolts Ars Technica wrote an article about three years ago that was quite good. I will link it here. Understanding ZFS storage. That is a Big Gulp of information but to keep it simple for starters is you can simply use TrueNAS Scale if you also want to run containers and virtual machines aswell. Basically appliance software on your NAS. Far as getting started you can purchase something from IX systems or build something of your own. One of the most simple ways to me is just purchasing a older Dell Poweredge. Something like the R730 or R730XD with V4 processors like 2650 V4 which are 90watt 10 cores and a HBA330 controller inside of it. You can keep it simple and just use a old machine of yours and put some drives in it. You can use something like labgopher to find older Poweredge deals. It gathers all the ebay data and presents it in a way you can filter out what you want. Just some ideas. I personally just use a Poweredge. keeps it simple.