qpi vs pcie lanes in multi processor system

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

AllenAsm

Member
Jul 30, 2018
55
3
8
I have some older servers that I've added some NVME pcie cards to. The systems are X9DR3(i)-LN4F+ and are updated to the latest 3.4 bios/firmware. Currently with pcie3.0 I have an 8 lane pcie 3.0 card with two 1tb nvmes running at full speed (benched them at the same time and got 3.5gbps from each simultaneously or 7gbps total). I want to use these servers in an s2d cluster so I need to add another identical card. My question though is that the other available pcie 8x slot is wired to cpu2 where as the first is wired to cpu1. Is this going to create a massive amount of traffic on the qpi bus when doing heavy read/writes? Should I move other things around so both cards are married to the cpu1 pci lanes?
 

abq

Active Member
May 23, 2015
675
204
43
@AllenAsm, did you get any inputs or advice on keeping all traffic on CPU1, vs across Intel QPI bus between the 2 CPUs? My impression is it is not usually a concern, unless really high data rate cards (Optane, 100gb ethernet, etc) and high aggregate bandwidths.

I am no expert & still learning, but I read on line that you 'only' get about 7.2-8.0 GT/s with QPI for E5 V2 gen CPUs. So I think fastest Intel 8.0 GT/s QPI is equivalent to 16GB/s data rate for E5 V2 gen. ...BTW, PCIe 3.0 is 8.0 GT/s, or 1GB/s per 1 PCIe lane , or 16GB/s for 16 PCIe lane slots. I think quad channel DDR4 is about 4 times that (64GB/s). ...all number approximate, and I think 128b/130b encoding (definitely better than old 8b/10b encoding). Hope that helps a little for now, and hope someone knowledgeable can explain when to worry, and when not to worry about this;).
 

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
Short of not installing a second CPU (or never assigning tasks to the CPU that's not attached to the nvme's pcie slot), there's no good way to avoid using the QPI bus for data transfer. Using both slots will be fine.

For deep learning with multiple GPUs, it's best to have them all on the same pcie root complex and avoid qpi transfers. For other use cases I don't see a strong motivation to optimize around avoiding the qpi bus.
 
  • Like
Reactions: abq

Dreece

Active Member
Jan 22, 2019
503
160
43
On those older xeons, latency will really tank, do some comparison benchmarks and you shall see what I mean by 'tank'.

However for general use it should not be an issue, the bandwidth shall still be there, but then again the point of NVME is 'fast low-latency storage'... really depends on your use-case and if your workload can afford high latency.
 

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
On those older xeons, latency will really tank, do some comparison benchmarks and you shall see what I mean by 'tank'.

However for general use it should not be an issue, the bandwidth shall still be there, but then again the point of NVME is 'fast low-latency storage'... really depends on your use-case and if your workload can afford high latency.
High latency is relative. Are we talking milliseconds, microseconds, nanoseconds?

For a NIC or NVMe you probably won't notice a real world performance difference if the added QPI latency is under 0.1ms. Lower is better of course, but some use cases are more or less sensitive than others.
 

Dreece

Active Member
Jan 22, 2019
503
160
43
As stated, relative to workload.

However it would not be prudent of I to presume the workload is not above and beyond basic requirements due to old tech, thus should not be an issue.

Old Sandy and Ivy were a nightmare in the QPI department compared to Haswell and later, I once had detailed files, it was specific to large in-memory datasets.

Don't worry OP, you should be fine.
 

AllenAsm

Member
Jul 30, 2018
55
3
8
I'm putting 40gb qsfp+ cards in all of these servers and they also have nvme cards so bus saturation/speed is starting to become a concern.