Slow PCI-E performance on Supermicro X8DAH

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

azev

Well-Known Member
Jan 18, 2013
768
251
63
Hi Guys,

Spent all weekend working on this new build, got a Supermicro X8DAH motherboard with intel 5520 chipset and dual IOH-36D PCI-E hub.The mobo has the latest bios and ipmi firmware, dual x5650 cpu & 192Gb ram 16x16gb dimm, multiple emulex 10gb adapter & adaptec raid adapter.
Anyway to cut to the chase, I am getting really bad network performance with this system which I associated with the PCI-E bottleneck. I have a multiple of similar setup on a X8DT3 motherboard that could easily get line rate performance on the 10gb adapter.
The main differences between this new system and the existing one is that intel turbo boost does not kicked in to max cpu speed like it does on system with X8DT3 mobo. (power setting set to max and bios setup is similar)
Max saturation I can get on a dual port card is about 5Gbps while on system with X8DT3 i was able to get close to 18Gbps when saturating both links on the same card.
Since I have multiple dual 10Gb nic on this system, I tried running multiple parallel iperf testing with very disturbing result. It would seem that the whole PCI-E bus is limited to 5Gbps of throughput.
I've also tried testing with different brand 10Gb nic (emulex, intel, qlogic) with very negligible result.
Just curious if anyone here has any system with dual intel IOH-36D PCI-E and able to do some testing for comparison. Anyway all testing was done using Windows 2012R2 updated to the latest patch, just like the other systems I am using to run iperf from.
Let me know if you guys also have any suggestion on what other test I should do ?? I wonder if having the dual IOH-36D PCI-E is truly what is causing this performance issue.

Thanks
 

PigLover

Moderator
Jan 26, 2011
3,184
1,545
113
You may be struggling against classic NUMA issues.

That particular motherboard has some of the PCIe slots connected to 1 5520 IOP, which is connected to one of the CPUs. The other slots are connected to the other 5520 IOP on the other CPU. They are all connected together (and to the system memory) via the QPI interconnect. This can - in degenerate cases - result in memory transfers with LOTS of extra hops and latency.

Go to SMs site and pull the manual for the X8DAH. Find the block diagram of how everything is tied together. IIRC, PCIe slots 1-3 are one one IOP/CPU and PCIe slots 4-8 are on the other IOP/CPU.

Now, even with the X5650, QPI speeds should allow transfers of >25GB/s across the CPUs so the speeds you are seeing are still too low to explain easily. But try this: put all of your NICs on one CPUs PCIe slots and see if anything changes. Even better, put them all on CPU0, disable CPU1, - guaranteeing that no NUMA issues can exist - and then see what happens.
 

azev

Well-Known Member
Jan 18, 2013
768
251
63
Good Idea, all this time I've been trying to distribute the load evenly between all the different pci card that I installed on the system.
Do you mind explaining how to disable cpu1 without removing the CPU from the socket ?

Thanks
 

PigLover

Moderator
Jan 26, 2011
3,184
1,545
113
I'm actually not sure you can. I was thinking of pulling it when I wrote that. Its possible you may be able to disable it in the BIOS, but I don't know for sure.
 

azev

Well-Known Member
Jan 18, 2013
768
251
63
So I did lots more testing once I got home today, and did exactly what you suggested.
I basically have 2 x dual 10Gb ports nic on slot 1 & 3 and adaptec raid card on slot 2.

The result is somewhat promising, I am getting much better performance on the network card. However the PCI-E saturation is even more apparent now. The nics 4 ports are configured in 4 different subnet and during iperf testing I can get 10Gbps throughput on each of the nic when tested individually. However, if I ran the test on 2 port simultaneously the result is absolute crap, total of less than 5Gbps.
The more simultaneous test I ran the worse the total combined throughput. With all 4 nic running iperf, i get a total of only 2Gbps.
Another interesting thing to see is running iperf on one of the nic to get 10Gbps throughput and run some kind of disk benchmark like Crystaldiskmark or Atto; you can see the iperf slowed down while the test ran down to 2Gbps. When the disk performance testing is done, the throughput is back up to 10Gbps.

This is super weird, not sure whats going on. On other server with single IOH, i am getting a total throuhgput of about 25Gbps.
I am wondering why this system have very little pci-e bandwidth which doesnt make sense. What is the point of having 2 IOH chip if you cant utilize the extra pci-e lanes ??