SAS2 expanders $60 (IBM, LSI chip, Intel alternative)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
I tested every single FW from 510A to 634A, and found:
  • 510A was the only one that got dual-link (2 x 6Gb) performance
    • And therefore I believe also the only one that could instead run 20 drives in a single HBA-cable config
  • 510A was the only one with the staggered-power-on problem.
That was the real annoyance: there was no FW that had fixes for both problems. Seems like the first update after 510A (602A I guess?) both fixed the power-on issue, and introduced the dual-link problem, and subsequent fixes made no further difference.

I studied all the changelogs as you have, but could never find any correlation between the listed fixes and the problems we had. I seem to recall there's nothing in 602A's changelog that suggests it would fix the major SATA power-on detection issue of 510A, but yet it does fix that. It's all quite frustrating really.

Here's links to the Linux .bin for FW 510A, 602A, 632A, 633A and 634A (which I already uploaded in an earlier thread as well). These are all the FWs I currently have already downloaded, from when I experimented with all this back in April/May 2017:
The first file is called version 1.01 - they changed the numbering later. Inside the zip you will see reference to '440A'. However I am almost certain that once installed on our expander, it will be 510A. I found another document that refers to Rev A and Rev B cards, with versions 440A and 510A respectively. So I think this FW results in 510A on our card, which is a Rev B.

As for the staggered power situation I have done some tests and the expander can be powered from a small 12v PSU. I tried running the expander from a Seagate Expansion drive PSU rated at 1.5a. On power up the expander peaked at 790ma so it would be possible to plug in the expander and just leave it powered up. If I read you right, this might be a simpler solution.
Yeah that might well work. To be honest I don't think I tested that. In that scenario, the HBA and the disks are getting turned off, but the expander remains on. So it doesn't re-initialise. Unless the initialisation is also affected by the HBA, which is the only possible problem I could imagine with that plan.

So yeah good idea and definitely worth testing. Although sadly it wouldn't help in cases of power outages, which is the #1 problem I have at the moment: now that my server is all set up and running, the only time my server gear normally goes down is when an RCD trips in my power cupboard, or else there's an area-wide power cut, and in that scenario it doesn't matter how the expander is powered because everything is going down at once.

I have been considering investing in a UPS to avoid such outages, and I think I really should at some point - regardless of whether I also replace the IBM expander with one that doesn't have this annoying problem.

Let us know how you get on with a 12V PSU, if you do try that. I suppose if one had some electronics experience and wanted to be fancy, its power requirements are low enough that you could make a home-made UPS just to keep the expander powered with 12V :) Unlike a normal UPS you wouldn't need a DC-AC conversion, you could just stick together a bunch of Lithium batteries and a 12V regulator and send 12V direct into the Molex connector of a PCIe slot adapter. Or just use a lead-acid 12V battery, like a real UPS does. Might be a bit overkill just to keep an expander running, but might also be a fun project :)

EDIT: or actually, they make power banks that can output 12V (and 19V and other voltages) that would probably do the job. As long as it can be plugged into the mains to keep it topped up, while also providing 12V to the expander.
 
Last edited:
  • Like
Reactions: iamtelephone

TrevInCarlton

New Member
Sep 19, 2018
17
4
3
Nottingham, UK
I tested every single FW from 510A to 634A, and found:
  • 510A was the only one that got dual-link (2 x 6Gb) performance
    • And therefore I believe also the only one that could instead run 20 drives in a single HBA-cable config
  • 510A was the only one with the staggered-power-on problem.
That was the real annoyance: there was no FW that had fixes for both problems. Seems like the first update after 510A (602A I guess?) both fixed the power-on issue, and introduced the dual-link problem, and subsequent fixes made no further difference.

I studied all the changelogs as you have, but could never find any correlation between the listed fixes and the problems we had. I seem to recall there's nothing in 602A's changelog that suggests it would fix the major SATA power-on detection issue of 510A, but yet it does fix that. It's all quite frustrating really.

Here's links to the Linux .bin for FW 602A, 632A, 633A and 634A (which I already uploaded in an earlier thread as well). These are all the FWs I currently have already downloaded, from when I experimented with all this back in April/May 2017:
The one currently missing is 510A, which I remember I had more trouble finding because it was older. I don't know why I didn't store it with all the others, but when I find it (or just re-download it from IBM), I will update this post with a link to that as well.



Yeah that might well work. To be honest I don't think I tested that. In that scenario, the HBA and the disks are getting turned off, but the expander remains on. So it doesn't re-initialise. Unless the initialisation is also affected by the HBA, which is the only possible problem I could imagine with that plan.

So yeah good idea and definitely worth testing. Although sadly it wouldn't help in cases of power outages, which is the #1 problem I have at the moment: now that my server is all set up and running, the only time my server gear normally goes down is when an RCD trips in my power cupboard, or else there's an area-wide power cut, and in that scenario it doesn't matter how the expander is powered because everything is going down at once.

I have been considering investing in a UPS to avoid such outages, and I think I really should at some point - regardless of whether I also replace the IBM expander with one that doesn't have this annoying problem.

Let us know how you get on with a 12V PSU, if you do try that. I suppose if one had some electronics experience and wanted to be fancy, its power requirements are low enough that you could make a home-made UPS just to keep the expander powered with 12V :) Unlike a normal UPS you wouldn't need a DC-AC conversion, you could just stick together a bunch of Lithium batteries and a 12V regulator and send 12V direct into the Molex connector of a PCIe slot adapter. Or just use a lead-acid 12V battery, like a real UPS does. Might be a bit overkill just to keep an expander running, but might also be a fun project :)
 

TrevInCarlton

New Member
Sep 19, 2018
17
4
3
Nottingham, UK
If I do find performance is an issue I may well need to go back to the 510A but at this point I am hoping I can solve the problems by different combinations of expanders and HBAs, I have 2 of each so I could use the 2 outputs from the LSI SAS 9207-8E to feed the 2 expanders each driving 1 Freenas array of 8 SATA drives. That still leaves me 8 drive capacity from my Supermicro X11SSM-F. I may also bring into the equation my IBM M1015 which I could just use to drive my second 8 drive array, and the 9207 for the other array doing away with the expanders altogether, so plenty of testing to do till I find the best options.

I have a lot of electronics experience built up over the years, had a look on ebay, plenty of 12v UPS devices around without the need to go to a full blown UPS. I had a huge APC UPS when I lived in Spain due to the poor quality electric over there and constant power cuts. You should be able to make up a small 12v UPS quite cheaply, If I needed the staggered power up I would make one.
 
  • Like
Reactions: TheBloke

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
I've just updated my earlier post with what I am 99% sure is a link to the 510A FW. I already had it downloaded, but didn't upload it earlier because the FW file is referred to as 440A, not 510A. But I've done some more Googling and found a document that indicates that 440A and 510A are likely the same FW version, and it depends on the Revision of the card. Rev A = 440A, Rev B = 510A. And we have a Rev B card.

So I am pretty sure that this '1.01 FW' will be 510A for us. I will test it myself later today on my spare expander to be certain.

If I do find performance is an issue I may well need to go back to the 510A but at this point I am hoping I can solve the problems by different combinations of expanders and HBAs, I have 2 of each so I could use the 2 outputs from the LSI SAS 9207-8E to feed the 2 expanders each driving 1 Freenas array of 8 SATA drives. That still leaves me 8 drive capacity from my Supermicro X11SSM-F. I may also bring into the equation my IBM M1015 which I could just use to drive my second 8 drive array, and the 9207 for the other array doing away with the expanders altogether, so plenty of testing to do till I find the best options.
Yeah, if you're only running 8 drives per expander, I don't think performance will be an issue (with HDDs anyway.) I am running 16 x 2TB SATA3 drives from my expander, and 2 x cables provided a worthwhile performance boost versus 1 cable. But if I only had 8 drives, I'm pretty sure a single link would have delivered the max performance of the drives.

So basically you'll end up with the same as I have: 8 drives per HBA cable. Just you'll use two expanders to achieve it instead of one.

Now you mention it, I could have done that as well, as I also ended up with two expanders. Can't now think why I didn't try that, as I would have room in my external enclosure to run two expanders at once.

I'll be interested to hear your benchmark results in the various configs, and I might go back and play with my setup as well.

I have a lot of electronics experience built up over the years, had a look on ebay, plenty of 12v UPS devices around without the need to go to a full blown UPS. I had a huge APC UPS when I lived in Spain due to the poor quality electric over there and constant power cuts. You should be able to make up a small 12v UPS quite cheaply, If I needed the staggered power up I would make one.
Ah cool, thanks. I'll have a look at those.
 

TrevInCarlton

New Member
Sep 19, 2018
17
4
3
Nottingham, UK
I've just updated my earlier post with what I am 99% sure is a link to the 510A FW. I already had it downloaded, but didn't upload it earlier because the FW file is referred to as 440A, not 510A. But I've done some more Googling and found a document that indicates that 440A and 510A are likely the same FW version, and it depends on the Revision of the card. Rev A = 440A, Rev B = 510A. And we have a Rev B card.

So I am pretty sure that this '1.01 FW' will be 510A for us. I will test it myself later today on my spare expander to be certain.



Yeah, if you're only running 8 drives per expander, I don't think performance will be an issue (with HDDs anyway.) I am running 16 x 2TB SATA3 drives from my expander, and 2 x cables provided a worthwhile performance boost versus 1 cable. But if I only had 8 drives, I'm pretty sure a single link would have delivered the max performance of the drives.

So basically you'll end up with the same as I have: 8 drives per HBA cable. Just you'll use two expanders to achieve it instead of one.

Now you mention it, I could have done that as well, as I also ended up with two expanders. Can't now think why I didn't try that, as I would have room in my external enclosure to run two expanders at once.

I'll be interested to hear your benchmark results in the various configs, and I might go back and play with my setup as well.



Ah cool, thanks. I'll have a look at those.
My initial task is to move around 25TB from one pool to the other. When I did some preliminary tests with all the drives connected to the one expander I only managed to achieve 148MB/s. This was from within the Server and not dependant on any network connections. Still a lot of testing to do, I have a direct 10Gb connection from Windows and Mac into the server, have managed an easy 400MB/s file transfer into one of the volumes from an old SSD. Using BlackMagic I am getting 850+MB/s write and just over 700MB/s read which is way beyond my Synology which this setup is replacing. That is clocking a really pathetic 95MB/s write, 100MB/s read which is not even maxing out the 1Gb network. Stay tuned ........
 
  • Like
Reactions: TheBloke

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
My initial task is to move around 25TB from one pool to the other. When I did some preliminary tests with all the drives connected to the one expander I only managed to achieve 148MB/s. This was from within the Server and not dependant on any network connections. Still a lot of testing to do, I have a direct 10Gb connection from Windows and Mac into the server, have managed an easy 400MB/s file transfer into one of the volumes from an old SSD. Using BlackMagic I am getting 850+MB/s write and just over 700MB/s read which is way beyond my Synology which this setup is replacing. That is clocking a really pathetic 95MB/s write, 100MB/s read which is not even maxing out the 1Gb network. Stay tuned ........
Ah very nice. I just tried BlackMagic from MacOS 10.13 to my server and got 600MB/s writes and 900MB/s reads over a single 10Gbe Intel X520 (patched to enable use of SmallTree drivers; I imagine you use something similar? I'm not aware of many 10Gbe NICs with MacOS support.)

I tend to do my benchmarking using iozone. Peak performance using iozone over the network from my MacOS workstation is about 800MB/s writes and 935MB/s reads. In Windows 10, using 2 x 10Gbe X520 NICs with SMB 3.1.1's MultiChannel, I got up to 2000MB/s writes and 2122MB/s reads. All those tests were connecting to Samba 4.7.4 running on my Solaris server. However in both cases it was the network that was the bottleneck.

When I do disk/filesystem benchmarks, ie comparing expander cabling config, I run iozone directly on the server. I don't have the figures handy for when I did that back in April/May last year, but I believe with both expander lanes working I peaked around 3GB/s writes and about 2.5GB/s reads.

Not that I've since then achieved anything like that in real-life, day-to-day performance, but it gives a top-line for the available performance, and for comparing different configs. When I do a copy within the pool, eg rsync -avhP /data/dataset1/ /data/dataset2/ I tend to average around 450MB/s. zpool scrub averages around 1.5GB/s I seem to recall.

My ZFS pool is 27 x 2TB SATA3 7200 RPM drives, configured as 3 x 9-drive RAIDZ2 VDEVs. So a total of 21 data drives, 6 parity drives. The drives are: 16 x Toshiba DT01ACA200 and 11 x Seagate ST2000DM001. HBAs are all LSI. 16 x Toshiba drives are in a separate enclosure and connect through the IBM expander to an LSI 9207-8e (2308 chipset). The other 12 drives (including one Toshiba hotspare) are inside the server's 2U case. 8 of them connect direct to an LSI 9201-8i, and the other 4 connect direct to an LSI 9201-4i (both 2008 chipset).

I did moot getting rid of the 9201-4i and running all 12 internal drives from the 9201-8i via another expander, but with the case being 2U I couldn't easily fit another IBM without cutting more holes in the case. I could do it if I got that half-height Intel expander, but I'm not sure it's worth it; the only reason I'm considering it at all is that the 9201-4i is running in a 4x PCIe 1.1 slot which is slow as dirt. But because it only has three active drives on it, I think it just about has sufficient bandwidth to do its job. But I may still splash out £65 and wait a couple of weeks to get an Intel from China, to test both against the IBM and possibly so I can stop using my PCIe 1.1 slot.

Before then, I'm tempted to go back and do some fresh benchmarks with the IBM, and in particular try 2 x IBM expanders for the 16 external drives and see if I can then run on 634A without any downsides. I'm still bemused as to why I didn't try that, it seems so obvious.. o_O
 
Last edited:

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
When I replaced the expander with one that I suspect does have later firmware (not been able to confirm version)
By the way, if you're still having trouble identifying the FW on this card, here's two ways:
  • When booted into an Linux install (eg for FW upgrading) just run lsscsi, which will list all connected SAS/SATA devices, and list the FW version alongside the name of the IBM Expander.
  • In FreeNAS, you can install and use the LSI tool sas2ircu. EDIT: Looks like it's been included in FreeNAS, since 9.3. So you should already have it.
    • Run: sas2ircu LIST to list the installed LSI adapters
    • Then: sas2ircu X DISPLAY # where X is the number (as shown by LIST) of the adapter to which the expander is cabled.
    • Look for output like the following:
      Manufacturer : IBM-ESXS
      Model Number : SAS EXP BP
      Firmware Revision : 510A
 
Last edited:
  • Like
Reactions: iamtelephone

TrevInCarlton

New Member
Sep 19, 2018
17
4
3
Nottingham, UK
By the way, if you're still having trouble identifying the FW on this card, here's two ways:
  • When booted into an Linux install (eg for FW upgrading) just run lsscsi, which will list all connected SAS/SATA devices, and list the FW version alongside the name of the IBM Expander.
  • In FreeNAS, you can install and use the LSI tool sas2ircu. EDIT: Looks like it's been included in FreeNAS, since 9.3. So you should already have it.
    • Run: sas2ircu LIST to list the installed LSI adapters
    • Then: sas2ircu X DISPLAY # where X is the number (as shown by LIST) of the adapter to which the expander is cabled.
    • Look for output like the following:
      Manufacturer : IBM-ESXS
      Model Number : SAS EXP BP
      Firmware Revision : 510A
Thanks for that. I did identify the FW with Linux but it is nice to know some of this is built into Freenas.
 

TrevInCarlton

New Member
Sep 19, 2018
17
4
3
Nottingham, UK
Ah very nice. I just tried BlackMagic from MacOS 10.13 to my server and got 600MB/s writes and 900MB/s reads over a single 10Gbe Intel X520 (patched to enable use of SmallTree drivers; I imagine you use something similar? I'm not aware of many 10Gbe NICs with MacOS support.)

I tend to do my benchmarking using iozone. Peak performance using iozone over the network from my MacOS workstation is about 800MB/s writes and 935MB/s reads. In Windows 10, using 2 x 10Gbe X520 NICs with SMB 3.1.1's MultiChannel, I got up to 2000MB/s writes and 2122MB/s reads. All those tests were connecting to Samba 4.7.4 running on my Solaris server. However in both cases it was the network that was the bottleneck.

When I do disk/filesystem benchmarks, ie comparing expander cabling config, I run iozone directly on the server. I don't have the figures handy for when I did that back in April/May last year, but I believe with both expander lanes working I peaked around 3GB/s writes and about 2.5GB/s reads.

Not that I've since then achieved anything like that in real-life, day-to-day performance, but it gives a top-line for the available performance, and for comparing different configs. When I do a copy within the pool, eg rsync -avhP /data/dataset1/ /data/dataset2/ I tend to average around 450MB/s. zpool scrub averages around 1.5GB/s I seem to recall.

My ZFS pool is 27 x 2TB SATA3 7200 RPM drives, configured as 3 x 9-drive RAIDZ2 VDEVs. So a total of 21 data drives, 6 parity drives. The drives are: 16 x Toshiba DT01ACA200 and 11 x Seagate ST2000DM001. HBAs are all LSI. 16 x Toshiba drives are in a separate enclosure and connect through the IBM expander to an LSI 9207-8e (2308 chipset). The other 12 drives (including one Toshiba hotspare) are inside the server's 2U case. 8 of them connect direct to an LSI 9201-8i, and the other 4 connect direct to an LSI 9201-4i (both 2008 chipset).

I did moot getting rid of the 9201-4i and running all 12 internal drives from the 9201-8i via another expander, but with the case being 2U I couldn't easily fit another IBM without cutting more holes in the case. I could do it if I got that half-height Intel expander, but I'm not sure it's worth it; the only reason I'm considering it at all is that the 9201-4i is running in a 4x PCIe 1.1 slot which is slow as dirt. But because it only has three active drives on it, I think it just about has sufficient bandwidth to do its job. But I may still splash out £65 and wait a couple of weeks to get an Intel from China, to test both against the IBM and possibly so I can stop using my PCIe 1.1 slot.

Before then, I'm tempted to go back and do some fresh benchmarks with the IBM, and in particular try 2 x IBM expanders for the 16 external drives and see if I can then run on 634A without any downsides. I'm still bemused as to why I didn't try that, it seems so obvious.. o_O
The following tests were carried out copying folders of around 1.5TB. Each folder containing sub folders each containing a handful of small files along with 1 large .MKV file of about 8GB. All file copies were carried out using the Freenas CLI. All tests are in a real world situation and relative for my own system comprising Supermicro X11SSM-F-O with 36GB NCC ram, ZFS Pool 16x Seagate and WD Green drives configured as 2 x 8-drive RAIDZ1 vdevs. These drives connect through LSI-9207, IBM-M1015, and 2 x IBM Expanders. I tried virtually every combination to try to establish if any particular arrangement was better or worse than any other. Also to try to identify if the 6xxx expander firmware is responsible for a drop in performance.
TEST 1, 1 Lead from - 9207 -> EXPANDER -> ALL 16 Drives (vdev 1 & 2)
TEST 2, 2 Leads from - 9207 -> EXPANDER -> ALL 16 Drives (vdev 1 & 2)
TEST 3, 1 Lead from - 9207 -> EXPANDER -> 8 Drives (vdev 1) and
1 Lead from -9207 -> 2nd EXPANDER -> 8 Drives (vdev 2)
TEST 4, 1 Lead from - 9207 -> EXPANDER -> 8 Drives (vdev 1) and
IBM-M1015 -> 8 Drives (vdev 2)
TEST 5, 2 Leads from - 9207 -> EXPANDER -> 8 Drives (vdev 1) and
IBM-M1015 -> 8 Drives (vdev 2)
TEST 6, 1 Lead from - 9207 -> 4 Drives + 1 Lead from - 9207 -> EXPANDER -> 4 Drives (vdev 1) and
IBM-M1015 -> 8 Drives (vdev 2)
TEST 7, IBM-M1015 -> 8 Drives (vdev 2) and 8 internal SATA ports (vdev 1)
TEST 8, 2 Leads from IBM-M1015 -> ALL 16 Drives (vdev 1 & 2)

The first TEST 1 resulted in a total throughput of 8.32Gb/s and the last TEST 8 resulted in a total throughput of 7.8Gb/s. I noticed a slight drop in performance after each test which I put down to the pool filling up. TEST 6 should have been a test without any Expander but I am missing a 8088 to 4 x SATA breakout lead, this is on order so I might re run this test when the lead turns up from China. TEST 7 was interesting, I did think that this would yield the best results but looking at the reporting graphs within Freenas showed a very up and down data transfer that overall was no better than using the expanders. I have finished up using the TEST 5 configuration as this gives me additional drive capacity

Having found what I feel is the best config internally on the server I have carried out further tests from my 2012 MacMini. I have a Thunderbolt 2 cable which connects to an Atikio Thunder2 PCIe enclosure (PCIe 2.0 x 4 lane) I have a SolarFlare SFN6122F duel 10 Gbit SFF+ adaptor in the Thunder2 box. This connects directly to a Melanox Connect-X adapter in the server. In the MacMini I have a 500GB Crucial MX500 SSD. For the price I have found this to be a very good SSD with the added advantage of power fail protection. I copied a folder containing 4 files totalling 58.24GB to each vdev on the server and also from each vdev back to the MX500 ssd. While all the internal copy tests on the server yielded very similar results the same could not be said copying over the 10gb network link. Copying to vdev 1 connected to the 9207 via the expander took 2min 18sec and it was the same copying back to the Mac. Copying to vdev 2 connected directly to the IBM-M1015 took 1min 54sec, 24 seconds quicker than the route through the expander. Copying from vdev 2 to the Mac took 2min 13sec, just 5 seconds quicker. BlackMagic speed tests were identical through both routes to the drives, yielding 575MB/s write and 840MB/s read, so no degraded performance through the expander in this test.

It is worth noting that the IBM-M1015 is older than the LSI-9207. The IBM is based on PCIe 2.0 8 lane whereas the 9207 is based on PCIe 3.0 8 lane

Based on all my tests I would have to conclude that in my particular setup I do not feel that the Expander FW 634a results in any drop in performance. I hope my tests helps anyone trying to work out which way to go.
 

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
Thanks for the results, @TrevInCarlton

My first question is: if you only have 16 drives, and you have 2 x HBAs with 2 x SFF-8087 ports each.. why do you need an expander at all? Are you planning to add more drives in the near future?

If you're not, I'd have to question even using the expander. Just connect the 16 drives to the two HBAs. I wouldn't add an extra component and two extra cables unless it was going to be required imminently.

Based on all my tests I would have to conclude that in my particular setup I do not feel that the Expander FW 634a results in any drop in performance. I hope my tests helps anyone trying to work out which way to go.
Yeah definitely. What I think this shows is that in this test you were bottlenecked by drive performance well before the expander or the HBAs was a factor. In fact strangely you got higher performance from test 1, where all 16 drives are connected via a single cable to the 9207, than the later tests where you spread the load across three or four ports on two HBAs. So I suspect the tests weren't entirely comparable.

I'm a bit confused about your pool layout. The first few times I read your post I got the impression you had two pools, each pool containing 1 VDEV of 8 drives? Rather than a single pool with 2 VDEVs and 16 drives? And that your test was copying data from one of the pools to the other? Hence why the destination pool was filling up over the course of the tests?

If that's not the case - if you do only have one pool - then you must have been copying data from the pool, to another dataset on the same pool, and then not deleting that new dataset afterwards? Is that right?

Either way, it seems that in your tests you had drives both reading and writing at the same time. If two pools, then one set of drives was always writing, the other reading. If one pool, then all drives were reading and writing at the same time. In the latter case, this will be significantly slower than a read-only or write-only test, because HDDs are not very good at reading and writing at the same time (limited IOPS).

I did my benchmarking using a synthetic benchmark which first wrote, then read, then wrote, then read, etc. Therefore I got figures for maximum possible write-only performance and maximum possible read-only performance, and my benchmarks were most indicative of a situation where either multiple clients are all writing data to the NAS, or all reading from it, rather than simultaneously reading and writing, or copying data around within the pool itself.

In general I would strongly suggest starting with synthetic benchmarks when comparing configurations, because they're much easier to repeat exactly. I did my tests using iozone, ensuring the pool was always identical before and after the test because iozone deletes all its test data at the end of the test. So this rules out what you said about the pool getting increasingly more full as you progressed; which as you say, can affect performance. It also rules out variances in the data being copied, if you're copying different sets of data in each test.

Once I had those comparative figures, I'd then try real-life tests. This can then give a clue as to bottlenecks, eg if the real-life test performs significantly worse in certain operations, it can point to problems or optimisation opportunities. Ultimately what we're looking for is "is setup X faster than setup Y"; I achieve that first by seeing if setup X is faster in principle, ie in a synthetic test ruling out as many other factors as possible. If it isn't, it can be immediately discarded. If it is, then it can either just be implemented (if there's no downsides), or else checked again in a real-life test and then a decision made based on the pros and cons.

There's a few ways in which your tests might not give you the whole story. If you were reading-from and writing-to the same pool at the same time, then this is much slower than a read-only or write-only test. You may have got a fraction of the performance you would have seen in a test that purely wrote data, then purely read that data. Therefore maybe a read-only test (representative of sending data over the network to one or more clients) would show that actually the HBA configuration (cabling etc) is a bottleneck when the drives only have to read data. And ditto a write-only test (representative of eg one or more network clients backing up to the NAS.)

Now maybe this sort of operation - copying from one place in the pool to another - is representative of what you will do day-to-day. Fair enough. Nonetheless it does mean you don't know for certain if you'll have any bottleneck from a particular HBA/cabling configuration. Maybe you don't ever expect to that be a real-life factor - eg if you won't have multiple clients reading from the NAS over 10Gbe, it probably won't show up. But even without multiple high-speed LAN clients, there could be some other examples where it might show; a ZFS scrub for example is mostly reads, and will likely perform quite differently to your read&write tests. I'd certainly be amazed if a ZFS scrub done in Test 1 performed the same as any of the others.

A single HBA cable for 16 drives should definitely be a bottleneck unless the drives are really slow. One 6Gb/s port = 768MB/s maximum bandwidth; 768MB/s bandwidth divided by 16 drives = only 48MB/s per drive.

Anyway, with only 16 drives total and two HBAs, you don't even need to use an expander anyway, so you're definitely with FW 634A for now. Even if you do end up with 16 drives on the expander to the 9207 over two cables (which I found showed lower max performance in 634A than 510A), if you'll confident you'll primarily be copying within the pool, or only to a single 10Gbe client, again you'll likely be fine.

As a side note, I must say I was surprised by your figure of 8Gb/s from a 1-cable test (Test 1), as this is greater than the 6Gb/s maximum of a single port. If you have compression enabled on your pool, this could account for it. 8Gb/s of actual data could compress down to <6Gb/s of read/written data. However the big MKV files you mention are definitely incompressible, and you implied that these files made up the bulk of the data? So that doesn't seem to explain it either. Nor do you have enough RAM for caching to be a significant factor when copying 1.5TB of data. I'm a bit confused by that.

It is worth noting that the IBM-M1015 is older than the LSI-9207. The IBM is based on PCIe 2.0 8 lane whereas the 9207 is based on PCIe 3.0 8 lane
Yes, and I see your motherboard does have PCIe 3.0 slots, unlike mine. However I can't see that making any difference in your setup. PCIe 2.0 provides around 500MB/s of bandwidth per lane, meaning an 8-lane card provides 2000MB/s of bandwidth in both directions (theoretical maximum: 4,000MB/s, from transferring 2,000MB/s in both directions at once.) PCIe 3.0 effectively doubles that, so you're looking at 4,000MB/s up and down from a PCIe 3.0 8x card. But your setup won't get close to saturating a PCIe 2.0 connection. Maybe if you eventually had 16 drives on a single card it could possibly be a factor - if your drives can sustain more than 125MB/s reads or writes. If you ever do upgrade to 24 drives, put 16 on the 9207 and 8 on the IBM M1015 and that'll be fine.

I don't even have any 3.0 slots, but again it doesn't affect me much. If I wanted to run all 28 of my drives from a single card, I would definitely need 3.0. But my setup of 12 internal and 16 external drives doesn't lend itself to that. So I'm using multiple HBAs. I do have 16 drives on one card and may be approaching the limits of PCIe 2.0 bandwidth, but I don't believe I'm being bottlenecked by it. I think last year I did benchmark adding another HBA (temporarily removing my 10GBe NIC), and found it didn't increase overall pool performance. I'm certainly going to benchmark that again soon to be sure, not that there's much I can do about it without huge cost (getting PCIe 3.0 would require replacing the server, or at least its motherboard + CPUs.)
 
Last edited:

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
I plan to re-do all my benchmarks in the next day or two. Particularly as I inevitably crumbled to temptation and bought that Intel expander :) I found someone in the UK selling it a bit cheaper, and they took an offer for another £5 off. It arrived today, so I plan to try it out soon.

I'll try both synthetic and real-life data tests, comparing variations on:
  1. IBM expander with 510A FW
  2. IBM expander with 634A FW
  3. Two IBM expanders with 634A FW
  4. Intel expander
  5. No expanders
In particular to confirm I can reproduce the bottleneck I saw in the 634A FW before, confirm it's not a problem if two expanders are used, and confirm that the Intel expander is a suitable substitute.
 
  • Like
Reactions: epicurean

Stefan75

Member
Jan 22, 2018
96
10
8
48
Switzerland
Wanted to try sg_write_buffer for windows but sg_scan is not detecting the device or not properly
"sg_scan -s"
SCSI5:1,254,0 claimed=0 pdt=1fh dubious RAID DummyDevice 0001
Just tried sg_scan, it can see my 2 expanders...
SCSI0:0,51,0 claimed=0 pdt=dh IBM-ESXS SAS EXP BP 634A
SCSI0:0,52,0 claimed=0 pdt=dh IBM-ESXS SAS EXP BP 634A
 

TrevInCarlton

New Member
Sep 19, 2018
17
4
3
Nottingham, UK
Thanks for the results, @TrevInCarlton

My first question is: if you only have 16 drives, and you have 2 x HBAs with 2 x SFF-8087 ports each.. why do you need an expander at all? Are you planning to add more drives in the near future?

If you're not, I'd have to question even using the expander. Just connect the 16 drives to the two HBAs. I wouldn't add an extra component and two extra cables unless it was going to be required imminently.

I'm a bit confused about your pool layout. The first few times I read your post I got the impression you had two pools, each pool containing 1 VDEV of 8 drives? Rather than a single pool with 2 VDEVs and 16 drives? And that your test was copying data from one of the pools to the other? Hence why the destination pool was filling up over the course of the tests?

If that's not the case - if you do only have one pool - then you must have been copying data from the pool, to another dataset on the same pool, and then not deleting that new dataset afterwards? Is that right?

A single HBA cable for 16 drives should definitely be a bottleneck unless the drives are really slow. One 6Gb/s port = 768MB/s maximum bandwidth; 768MB/s bandwidth divided by 16 drives = only 48MB/s per drive.

Anyway, with only 16 drives total and two HBAs, you don't even need to use an expander anyway, so you're definitely with FW 634A for now. Even if you do end up with 16 drives on the expander to the 9207 over two cables (which I found showed lower max performance in 634A than 510A), if you'll confident you'll primarily be copying within the pool, or only to a single 10Gbe client, again you'll likely be fine.

As a side note, I must say I was surprised by your figure of 8Gb/s from a 1-cable test (Test 1), as this is greater than the 6Gb/s maximum of a single port. If you have compression enabled on your pool, this could account for it. 8Gb/s of actual data could compress down to <6Gb/s of read/written data. However the big MKV files you mention are definitely incompressible, and you implied that these files made up the bulk of the data? So that doesn't seem to explain it either. Nor do you have enough RAM for caching to be a significant factor when copying 1.5TB of data. I'm a bit confused by that.
Thanks for all your valued input. My setup consists of 2 x 8 Bay Silverstone cases + The main case a Fractal Node 804. This gives me a total capacity of 26 x 3.5" and 10 x 2.5" drives. I am using 16 x 4tb & 6 x 8tb (yet to be installed) and 1 SSD for Freenas Jails. I chose to go with 2 separate volumes because I am only using RaidZ1 and that is by far safer in the event of vdev failure. I do plan to change this to RaidZ2 once my system is finished with all the files in place and enough backups to do it. That puts my drive count up to 25. I could get a another HBA or SATA adaptor but want to leave one spare slot in the motherboard. Having gone to great lengths to get the expanders from China, dealing with the ones that did not work and then buying another which also did not work and finally updating the firmware to get it working (510a just does not work on my set up) , I feel that I want to keep one expander connected, that gives me a total drive capacity of 36 drives or 32 using 2 leads to the expander. I may want to plugin a few drives to allow for local backups. If i did do away with the expander it would mean the the 9 drive vdev would have to be split across the HBA and the motherboard, not sure that that in itself would not impact on performance.

I will run more "clinical" benchmark tests when I have the time but at the moment I am trying to copy 200k+ small files which is a slow process unless using SSD's (future expansion)
 

gregsachs

Active Member
Aug 14, 2018
589
204
43
If you don't need sas2/sas3, look for the quantum jbod boxes I linked elsewhere-$99 shipped, complete on ebay right now. 12x3.5, dual p/s, in a standard supermicro case.
 

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
Hi all

I've been spending the last couple of days doing tests and benchmarks. I'm going to write it all up when I have completed all the test I want to do. But I thought I'd post immediately regarding an interesting thing I just discovered:

The drive-detection problem with 510A is definitely related to the drives it connects to!

Up until now I had only used the IBM Expander with my external enclosure, containing 16 x Toshiba DT01ACA200 (2TB SATA3). Or to be precise, a mixture of those and Hitachi HDS72302, which are basically the same drive re-branded.

With that setup I consistently see only two drives detected unless I do the staggered power-on, or re-connect the cables after the expander has initialised. The two drives it does see are always connected to expander port BP 1.

Today for the first time I tried using the 510A with the 12 drives that are internal to my server chassis. This is one more Toshiba DT01ACA200, and 11 x Seagate ST2000DM001 (also 2TB SATA3).

In this setup, booting normally showed that all 11 Seagate drives are always detected, but the Toshiba is never detected unless connected to lane 1 or two of expander port BP1 (confirmed by trying the Toshiba in each of the four ports of the BP connected to port BP1.) This matches the finding with my external enclosure, where only two drives detect, and always on expander port BP1.

So this problem with 510A seemingly depends on something specific to different drive models. I can't tell if the backplane is also a factor, as I use the same backplanes in both internal and external enclosures (Chenbro 80H10321516A1, 1 x SFF-8087 to 4 x SAS/SATA). All my drives are 2TB, but @TrevInCarlton sees the same problem with 4 and 6TB drives.

EDIT: I've modified the following based on the next post from @BLinux . I've removed my theories as the connection between this issue and the problem he has, as they're now clearly unrelated.

I no longer really have any idea what might be behind this issue, besides it being related to drive model. My original theory of it being timing-dependent now has no real evidence, as it really appears to depend only on the combination of certain drives with certain lanes: my Seagates work in any lane, my Toshibas only work in lanes 1 & 2, and never in any other. But I suppose theories don't really matter anyway: the user just needs to test it with their drives and hope for the best. Or just upgrade to 634A, which in most cases will provide enough performance.

This means that I could now use the IBM with FW 510A without staggered power on, so long as it's for my 12 internal drives, and as long as I never want to have more than two Toshibas amongst those 12 drives. I do plan to start using an expander for these drives, as it means I can drop my 9201-4i in a PCIe 1.1 x4 slot, which only has 25% of the bandwidth of my PCIe 2.0 x8 slots and which my tests have shown can be a bottleneck even if I only have three active drives on it.

I will post again in the next day or two with benchmark results (510A vs 634A), some details on the Intel Expander, and further thoughts on the IBM in real-life usage.
 
Last edited:
  • Like
Reactions: iamtelephone

BLinux

cat lover server enthusiast
Jul 7, 2016
2,682
1,089
113
artofserver.com
@TheBloke Hate to throw a wrench into your thinking, but like you, I spent some time this weekend with this IBM expander, which I now have 3 and got some more interesting news. First, hardware setup:

Dell H200 flashed to IT mode P20 -> 2x SFF-8087 -> IBM SAS expander inputs -> 4x SFF-8087->4xSATA to -> Supermicro BPN-SAS-836TQ backplane.

HDD models are HUS726040AL5210 (HGST 4TB SAS). Note: these are NOT SATA.

I have 3 IBM expanders, and I had each with different firmwares: 510A, 602A, 634A.

with 510A + 602A, I would randomly not see 1 out of 4 SAS lanes in BP2 port. it DID NOT matter if I connected the drives during boot up, or after the IBM expander was powered on. So, this is significantly different behavior than what @TheBloke was seeing. every reboot was a gamble, and actually for the testing during the last 2 days, it was more failures than successes. This resulted in only 15 out of 16 HDDs being detected during most boot ups. It didn't matter if I left the backplane disconnected and connected to the expander after everything powered on or not; some times it would work, but most of the time only 3 out 4 SAS lanes on BP2 port worked. Also of interest is that it was always either the FIRST or FOURTH SAS lane in the BP2 port that had problems, NEVER the second or third.

with 634A firmware, the above problem went away completely (thanks to @TheBloke for the detailed instructions); so there does seem to have been some fix in the updated firmware for my particular issue. when I updated the 2 other expanders from 510A/602A->634A, they both started working properly too. So, my conclusion is that definitely 634A firmware is the more "functional" firmware.

Now, on the question about performance degradation with 634A, in my setup (CentOS 7.5 with mdadm RAID-0 or ZFS raidz2), I did not see any significant performance between (A) 2x LSI SAS2008 controllers vs (B) 1x LSI SAS2008 + IBM Expander w/ 634A firmware:

16xHDD RAID-0:
- seq. reads was (A=2139MB/s) vs (B=2059MB/s)
- seq. writes was (A=1676MB/s) vs (B=1676MB/s)
2x8xHDD ZFS 2xRAIDZ2:
- seq. reads was (A=1604MB/s) vs (B=1670MB/s)
- seq. writes was (A=712MB/s) vs (B=704MB/s)

So, I didn't compare 510A vs 634A, but comparing 634A against 2x LSI SAS2008 HBAs shows no significant performance loss, so I can't imagine that 510A would be faster than 2x LSI SAS2008 HBAs. Testing was done with same exact iozone test for both A and B. In the end, I'm actually very happy right now because 634A firmware fixes my problem and shows no performance loss. I have not tested using one of the INPUT ports as a BP port to HDDs/backplanes, so not sure about that yet.
 
  • Like
Reactions: iamtelephone

TheBloke

Active Member
Feb 23, 2017
200
40
28
44
Brighton, UK
Thanks for the details, @BLinux .

My first thought when I read your post was "ahhh, you don't have SATA drives!" Meaning I'd think it was almost expected that you would potentially not see issues that others have, and instead have different issues; based on my experience of expanders in general, they seem to often be quite separate.

But I just went back to your earlier post when you first mentioned your 1-dropped-lane problem, and there it says you had tried both SAS and SATA drives, and got the same issue in both cases?

So now I don't know. But yes it's clearly a different problem than the one that I and @TrevInCarlton report, and my earlier theories about timing and maybe your drives being 'right on the edge' is likely all bunk. If your problem is SAS specific then that makes most sense. But regardless if it happens with SATA drives too, it's obviously a different issue. Anyway, I'm glad 634A resolves it for you.

I'll edit my post before this one to remove the references to your dropped lane as any evidence of what I'm seeing.

Thanks too for the benchmarks. This touches of something I'm going to say when I post my full benchmarks: that although I can definitely see a difference between 510A and 634A - confirmed both in max iozone throughput, and clearly visible in per-drive iostat output, it's at an upper end of the performance range, meaning many people won't even notice the difference. Including myself now that my pool is 50% full, compared to 10% when I tested last year. I now see it in some tests but not others, where last year it was consistent across all tests. It's quite likely I wouldn't be able to notice it in normal day-to-day usage.

So if anyone is holding off buying the IBM for fear of being stuck between the staggered power-on issue, or lower performance, I think it's safe to say that it's quite possible you will be effected by neither. The staggered power-on may not even affect you, dependent on your drives and even if it - or other issues - do affect you, it's quite likely that running 16 X HDDs from the IBM won't provide a noticeable day-to-day performance drop, so you can just upgrade to 634A regardless.

However if some SSDs are thrown into the mix, I would expect it to be much more noticeable. Which gives me an idea for another test I need to do. I was planning to add another SSD to my server anyway, to be used as a dedicated L2ARC (I previously carved off a bit of my 1.2TB Warp Drive for that, but plan to have it be separate in future.) I have a 500GB SanDisk SSD going spare. It's only a mid-level consumer drive, but it'll turn out 450+MB/s in sequential tests. I'm going to add that to the server using my one free lane (free if my hot spare is removed anyway) and do benchmarks comparing 510A, 634A and the Intel expander with that in place. Most likely I'll make a new pool out of it, then concurrently test read/writes to both my normal 27 x HDD pool, and the 1 x SSD pool, with the SSD being on one lane of the expander.

I'll also see if I can do any tests with another OS, so I can 100% rule out issues I see as being Solaris specific. Unfortunately my pool can't be imported on any other OS as Solaris ZFS and OpenZFS have diverged, so I could only do read-only tests. But that might be enough to show the difference, and especially with an SSD involved. I'll try and concoct something that uses dd to read from all the raw devices at once.

I'll have concrete figures and thoughts on them in the next 2 days or so.
 
Last edited:

BLinux

cat lover server enthusiast
Jul 7, 2016
2,682
1,089
113
artofserver.com
Thanks for the details, @BLinux .

My first thought when I read your post was "ahhh, you don't have SATA drives!" Meaning I'd think it was almost expected that you would potentially not see issues that others have, and instead have different issues; based on my experience of expanders in general, they seem to often be quite separate.

But I just went back to your earlier post when you first mentioned your 1-dropped-lane problem, and there it says you had tried both SAS and SATA drives, and got the same issue in both cases?
Just to add clarification, your recollection is true. When I got my 1st IBM expander, the lab machine where I've been doing all this testing did have a set of 500GB SATA drives. By the time I got my 2nd and 3rd IBM expander, I had switched to a set of 4TB SAS drives because I was troubleshooting my bht script to handle SAS drives and so the testing from the last couple of days has been with SAS drives only. It might do me some good to swap back to the SATA drives and see if there's a difference.
 

gerome

New Member
Dec 29, 2018
4
2
3
I have bought the IBM ServeRAID SAS Expander Adapter 46M0997 for 16€ from a Chinese shop. Very cheap now.

Motherboard: Asus P9D-E4L
HDD: 8 x HGST Deskstar NAS 4TB, SATA 6Gb/s (Kit Model H3IKNAS40003272S, Drive Model: HDN724040AL)
Controller A: Fujitsu D2616, SAS2108 (similar to LSI MegaRAID 9260-8i)
Controller B: Avago MegaRAID SAS 9361-8i

When I connected the disks via the expander, I had the detection issue. Only two drives are shown. Firmware was 510A. Tested with both controllers.

Flashed as explained by TheBloke (many thanks for all your work). Works perfectly now with the 634A Firmware.

I used an installed Ubuntu 18.04.5. lsscsi and sg3-utils as provided by the default Ubuntu repos.

With the Fujitsu Controller I had no luck identifying the Expander. lsscsi did only show "disk" entries, no "enclosures". The expander was not listed. lsscsi -H listed the controller as scsi host.

With the 9361-8i it worked. The expander was shown and I could identify the /dev/sgX device.