CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 /usr/share/doc/nvidia-cuda-toolkit/examples/bin/x86_64/linux/release/p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA RTX PRO 6000 Blackwell Workstation Edition, pciBusID: 1, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 5090, pciBusID: 11, pciDeviceID: 0, pciDomainID:0
Device: 2, NVIDIA GeForce RTX 5090, pciBusID: 61, pciDeviceID: 0, pciDomainID:0
Device: 3, NVIDIA GeForce RTX 5090, pciBusID: 71, pciDeviceID: 0, pciDomainID:0
Device: 4, NVIDIA GeForce RTX 5090, pciBusID: 81, pciDeviceID: 0, pciDomainID:0
Device: 5, NVIDIA GeForce RTX 5090, pciBusID: 91, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=0 CAN Access Peer Device=2
Device=0 CAN Access Peer Device=3
Device=0 CAN Access Peer Device=4
Device=0 CAN Access Peer Device=5
Device=1 CAN Access Peer Device=0
Device=1 CAN Access Peer Device=2
Device=1 CAN Access Peer Device=3
Device=1 CAN Access Peer Device=4
Device=1 CAN Access Peer Device=5
Device=2 CAN Access Peer Device=0
Device=2 CAN Access Peer Device=1
Device=2 CAN Access Peer Device=3
Device=2 CAN Access Peer Device=4
Device=2 CAN Access Peer Device=5
Device=3 CAN Access Peer Device=0
Device=3 CAN Access Peer Device=1
Device=3 CAN Access Peer Device=2
Device=3 CAN Access Peer Device=4
Device=3 CAN Access Peer Device=5
Device=4 CAN Access Peer Device=0
Device=4 CAN Access Peer Device=1
Device=4 CAN Access Peer Device=2
Device=4 CAN Access Peer Device=3
Device=4 CAN Access Peer Device=5
Device=5 CAN Access Peer Device=0
Device=5 CAN Access Peer Device=1
Device=5 CAN Access Peer Device=2
Device=5 CAN Access Peer Device=3
Device=5 CAN Access Peer Device=4
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.
P2P Connectivity Matrix
D\D 0 1 2 3 4 5
0 1 1 1 1 1 1
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 1
4 1 1 1 1 1 1
5 1 1 1 1 1 1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1 2 3 4 5
0 1496.69 42.63 42.68 42.81 43.21 43.07
1 42.63 1550.15 42.68 42.66 43.14 43.06
2 42.69 42.57 1553.23 42.70 43.10 43.13
3 42.75 42.72 42.66 1553.18 43.00 42.93
4 42.97 42.85 42.89 42.89 1553.23 43.43
5 43.01 42.89 42.91 42.95 43.73 1553.23
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1 2 3 4 5
0 1493.83 56.57 56.55 56.55 55.85 55.86
1 56.54 1537.89 56.55 56.57 55.71 55.63
2 56.58 56.58 1534.87 56.56 55.56 55.85
3 56.55 56.55 56.54 1543.97 55.83 55.82
4 55.54 55.59 55.50 55.49 1537.89 56.55
5 55.60 55.62 55.63 55.63 56.58 1543.97
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1 2 3 4 5
0 1483.79 56.50 56.59 56.77 56.92 57.14
1 56.21 1538.60 56.55 56.54 56.82 56.67
2 56.27 56.47 1539.36 56.72 56.89 57.12
3 56.40 56.58 56.21 1540.12 56.99 56.81
4 56.75 56.81 56.73 56.89 1540.88 56.85
5 56.71 56.85 57.05 56.87 56.77 1539.36
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1 2 3 4 5
0 1483.81 111.33 111.39 111.39 110.88 110.88
1 111.38 1534.80 111.38 111.38 55.36 110.01
2 111.38 111.34 1534.07 111.39 110.76 110.90
3 111.38 111.38 111.34 1538.60 110.80 110.80
4 110.73 110.86 110.89 110.91 1537.85 111.39
5 110.92 110.83 110.93 110.91 111.39 1537.07
P2P=Disabled Latency Matrix (us)
GPU 0 1 2 3 4 5
0 2.07 14.34 14.30 14.30 14.29 14.29
1 14.30 2.07 14.32 14.32 14.32 14.32
2 14.32 14.31 2.07 14.32 14.32 14.32
3 14.32 14.32 14.34 2.07 14.33 14.33
4 14.32 14.34 14.31 14.23 2.07 14.33
5 14.30 14.32 14.30 14.22 14.32 2.07
CPU 0 1 2 3 4 5
0 2.35 6.88 6.77 6.41 5.68 5.93
1 6.65 2.39 7.07 6.95 6.09 6.15
2 6.70 6.86 2.40 6.62 5.87 6.13
3 6.43 6.71 6.74 2.29 5.69 5.92
4 5.90 6.23 6.18 5.89 2.03 5.46
5 6.12 6.42 6.44 6.15 5.43 2.16
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1 2 3 4 5
0 2.07 0.37 0.36 0.43 0.36 0.36
1 0.46 2.07 0.45 0.38 0.38 0.38
2 0.39 0.37 2.07 0.37 0.38 0.37
3 0.37 0.38 0.36 2.07 0.37 0.37
4 0.38 0.43 0.44 0.37 2.07 0.38
5 0.38 0.37 0.37 0.44 0.37 2.07
CPU 0 1 2 3 4 5
0 2.36 1.69 1.64 1.64 1.65 1.75
1 1.79 2.45 1.75 1.87 1.89 1.88
2 1.80 1.73 2.49 1.78 1.78 1.82
3 1.70 1.65 1.66 2.30 1.67 1.71
4 1.47 1.50 1.46 1.45 2.07 1.46
5 1.59 1.54 1.54 1.52 1.53 2.15