pancho@fedora:~/cuda-samples/build/Samples/5_Domain_Specific/p2pBandwidthLatencyTest$ ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 4090, pciBusID: e, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 4090, pciBusID: 11, pciDeviceID: 0, pciDomainID:0
Device: 2, NVIDIA GeForce RTX 5090, pciBusID: 5, pciDeviceID: 0, pciDomainID:0
Device: 3, NVIDIA GeForce RTX 5090, pciBusID: 18, pciDeviceID: 0, pciDomainID:0
Device: 4, NVIDIA A40, pciBusID: d, pciDeviceID: 0, pciDomainID:0
Device: 5, NVIDIA RTX A6000, pciBusID: 12, pciDeviceID: 0, pciDomainID:0
Device: 6, NVIDIA GeForce RTX 3090, pciBusID: a, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=0 CANNOT Access Peer Device=2
Device=0 CANNOT Access Peer Device=3
Device=0 CANNOT Access Peer Device=4
Device=0 CANNOT Access Peer Device=5
Device=0 CANNOT Access Peer Device=6
Device=1 CAN Access Peer Device=0
Device=1 CANNOT Access Peer Device=2
Device=1 CANNOT Access Peer Device=3
Device=1 CANNOT Access Peer Device=4
Device=1 CANNOT Access Peer Device=5
Device=1 CANNOT Access Peer Device=6
Device=2 CANNOT Access Peer Device=0
Device=2 CANNOT Access Peer Device=1
Device=2 CAN Access Peer Device=3
Device=2 CANNOT Access Peer Device=4
Device=2 CANNOT Access Peer Device=5
Device=2 CANNOT Access Peer Device=6
Device=3 CANNOT Access Peer Device=0
Device=3 CANNOT Access Peer Device=1
Device=3 CAN Access Peer Device=2
Device=3 CANNOT Access Peer Device=4
Device=3 CANNOT Access Peer Device=5
Device=3 CANNOT Access Peer Device=6
Device=4 CANNOT Access Peer Device=0
Device=4 CANNOT Access Peer Device=1
Device=4 CANNOT Access Peer Device=2
Device=4 CANNOT Access Peer Device=3
Device=4 CAN Access Peer Device=5
Device=4 CAN Access Peer Device=6
Device=5 CANNOT Access Peer Device=0
Device=5 CANNOT Access Peer Device=1
Device=5 CANNOT Access Peer Device=2
Device=5 CANNOT Access Peer Device=3
Device=5 CAN Access Peer Device=4
Device=5 CAN Access Peer Device=6
Device=6 CANNOT Access Peer Device=0
Device=6 CANNOT Access Peer Device=1
Device=6 CANNOT Access Peer Device=2
Device=6 CANNOT Access Peer Device=3
Device=6 CAN Access Peer Device=4
Device=6 CAN Access Peer Device=5
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.
P2P Connectivity Matrix
D\D 0 1 2 3 4 5 6
0 1 1 0 0 0 0 0
1 1 1 0 0 0 0 0
2 0 0 1 1 0 0 0
3 0 0 1 1 0 0 0
4 0 0 0 0 1 1 1
5 0 0 0 0 1 1 1
6 0 0 0 0 1 1 1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1 2 3 4 5 6
0 1036.83 16.32 24.58 24.58 16.28 16.28 10.68
1 16.33 999.68 24.58 24.58 16.28 16.28 10.67
2 23.32 23.32 1783.68 33.13 23.17 23.17 14.15
3 23.33 23.33 33.01 1775.57 23.16 23.17 14.14
4 16.32 16.33 24.35 24.37 643.80 16.29 10.69
5 16.32 16.32 24.39 24.39 16.27 765.93 10.71
6 10.66 10.94 14.85 15.02 10.64 10.60 903.70
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1 2 3 4 5 6
0 1039.59 26.36 24.59 24.60 16.28 16.28 10.65
1 26.36 1017.25 24.57 24.58 16.28 16.28 10.68
2 23.25 23.33 1763.54 57.28 23.16 23.20 14.16
3 23.26 23.33 57.25 1763.61 23.18 23.20 14.06
4 16.30 16.33 24.37 24.36 644.86 26.36 26.36
5 16.29 16.32 24.39 24.39 26.36 766.68 26.36
6 10.98 10.79 14.70 15.00 26.37 26.36 904.75
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1 2 3 4 5 6
0 1047.25 18.94 29.60 29.62 18.76 18.95 11.90
1 18.94 1002.25 29.55 29.66 18.68 18.92 11.88
2 27.33 27.36 1763.45 34.63 27.23 27.21 19.40
3 27.36 27.40 34.45 1777.52 27.27 27.27 19.38
4 18.84 18.89 29.51 29.48 647.53 18.95 11.81
5 18.78 18.91 29.49 29.56 18.82 770.84 11.78
6 11.97 11.87 19.84 19.67 11.82 11.74 910.28
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1 2 3 4 5 6
0 1046.55 52.17 29.51 29.60 18.95 18.96 11.88
1 52.18 995.22 29.56 29.62 18.87 18.83 11.87
2 27.31 27.41 1761.46 110.85 27.23 27.20 19.49
3 27.28 27.37 110.85 1753.56 27.24 27.21 19.41
4 18.73 18.84 29.45 29.57 647.53 52.18 52.18
5 18.83 18.92 29.49 29.56 52.17 770.65 52.19
6 11.93 11.92 19.77 19.62 52.19 52.16 909.75
P2P=Disabled Latency Matrix (us)
GPU 0 1 2 3 4 5 6
0 1.42 16.46 14.35 14.35 16.65 15.06 15.14
1 14.52 1.36 14.43 14.43 15.82 14.46 15.18
2 14.34 14.35 2.07 14.37 14.36 14.35 14.44
3 14.41 14.41 14.35 2.07 14.35 14.35 14.37
4 14.71 14.97 14.34 14.38 1.77 16.56 14.26
5 14.25 14.36 14.49 14.39 14.25 1.79 15.17
6 15.45 17.45 14.34 14.62 14.26 15.48 1.67
CPU 0 1 2 3 4 5 6
0 1.42 4.25 4.16 4.14 3.97 4.15 4.14
1 4.21 1.37 4.13 4.12 3.93 4.12 4.14
2 4.23 4.14 1.55 4.12 3.92 4.13 4.16
3 4.18 4.11 4.11 1.57 3.93 4.14 4.14
4 4.04 4.01 4.01 4.00 1.30 4.01 4.01
5 4.13 4.12 4.10 4.11 3.91 1.37 4.11
6 4.10 4.11 4.10 4.11 3.89 4.12 1.35
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1 2 3 4 5 6
0 1.41 1.42 14.38 14.56 15.09 14.26 14.34
1 1.42 1.42 14.72 14.42 17.54 14.25 14.33
2 14.34 14.34 2.07 0.36 14.35 14.36 14.36
3 14.34 14.33 0.36 2.07 14.35 14.35 14.37
4 15.66 15.73 14.36 14.36 1.74 1.60 1.64
5 15.26 14.44 14.39 14.49 1.59 1.72 1.59
6 15.18 14.24 14.38 14.38 1.54 1.53 1.64
CPU 0 1 2 3 4 5 6
0 1.41 1.11 4.17 4.13 3.94 4.13 4.13
1 1.18 1.38 4.16 4.12 3.92 4.11 4.12
2 4.19 4.15 1.58 1.09 3.93 4.08 4.11
3 4.17 4.13 1.11 1.58 3.94 4.12 4.14
4 4.03 3.99 3.99 4.03 1.31 1.02 1.02
5 4.20 4.14 4.15 4.15 1.11 1.37 1.09
6 4.12 4.10 4.11 4.12 1.08 1.09 1.38