Have anyone tested this?
Intel® Memory Latency Checker v3.1a | Intel® Software
Seems really cool. Here you can also see how much overhead a NUMA setup adds.
Result server: Dual Xeon L5630 6x4GB PC3-10600R
Result workstation: 4670K @ 3.8ghz 2x4GB PC3-10600
Intel® Memory Latency Checker v3.1a | Intel® Software
Seems really cool. Here you can also see how much overhead a NUMA setup adds.
Result server: Dual Xeon L5630 6x4GB PC3-10600R
Code:
Intel(R) Memory Latency Checker - v3.1a
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1
0 89.4 132.4
1 131.9 88.4
Measuring Peak Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 25363.8
3:1 Reads-Writes : 25131.9
2:1 Reads-Writes : 25691.8
1:1 Reads-Writes : 26879.7
Stream-triad like: 27770.1
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0 1
0 14952.5 9618.8
1 9602.5 14959.1
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 108.46 24999.7
00002 108.40 25011.0
00008 108.55 25055.5
00015 108.61 25074.7
00050 108.57 25059.3
00100 103.01 20258.9
00200 94.35 10744.6
00300 92.59 7317.7
00400 91.58 5675.6
00500 91.04 4687.5
00700 90.42 3553.8
01000 90.01 2703.8
01300 89.81 2248.1
01700 89.63 1887.0
02500 89.46 1512.8
03500 89.36 1285.7
05000 89.30 1115.4
09000 89.23 938.9
20000 89.16 817.5
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 36.1
Local Socket L2->L2 HITM latency 41.1
Remote Socket LLC->LLC HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 136.5
1 136.2 -
Remote Socket LLC->LLC HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 104.8
1 104.8 -
Code:
Intel(R) Memory Latency Checker - v3.1a
Measuring idle latencies (in ns)...
Memory node
Socket 0
0 57.5
Measuring Peak Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 19785.9
3:1 Reads-Writes : 19022.1
2:1 Reads-Writes : 18747.3
1:1 Reads-Writes : 18501.3
Stream-triad like: 18768.9
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Memory node
Socket 0
0 19679.4
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 174.56 19525.6
00002 177.05 19538.7
00008 163.46 19516.8
00015 158.24 19420.0
00050 135.28 19040.6
00100 114.88 18011.3
00200 86.43 12093.3
00300 80.91 8866.8
00400 75.58 7052.5
00500 71.22 5764.8
00700 73.08 4488.5
01000 67.16 3512.8
01300 68.03 2912.0
01700 67.68 2492.1
02500 65.76 1996.4
03500 71.90 1606.3
05000 70.80 1378.7
09000 65.97 1260.1
20000 66.85 1088.3
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 19.8
Local Socket L2->L2 HITM latency 23.1
Last edited: