Tesla P100 x 8 Linpack testing on SYS-4028GR-TXRT

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Lukas Goe

New Member
Jan 15, 2018

I found your result here during my research and I am very interested in how exactly you achieved it. I have similar hardware, but my Gflops can't even get close to yours. I would be very grateful, if you could give me some additional informations or maybe even post your config files (HPL.dat) here.

Which CUDA-Linpack version are you using? The only one I found and use seems rather old: hpl-2.0_FERMI_v15.

I work on a cluster with 7 gpu nodes, each node got the following hardware:

2x Intel Xeon E5-2640 v4
8x DDR4-2400 8 GB Memory
Intel X10DGQ Board
4x Tesla P100 16GB HBM2

CUDA 9.0

The best I could achieve yet was roughly 3500 Gflops - with all 28 GPUs.. I think the benchmark isn't using the GPUs at all because nvidia-smi shows barely any usage (~45W/300W, 0% GPU-Util, ~2400 MiB Mem) and those 14 Xeons should be able to get close to 3500 Gflops on their own as far as I know. There is no warning or error whatsoever and everything always ends with PASSED.

I would be very happy about any advice.


Staff member
Dec 21, 2010
Hey @dhenzjhen on all of the SXM2 servers, the V100 needs a different tray because NVLink is 300gb/s instead of 80gb/s on the P100 variants, correct?

Can the P100 GPUs be used in the V100 300gb/s tray?