I ran the analysis on all cores, which should load the system 100% but when log in to each compute nodes then command top give me strange information.
For example the max CPU on 'compute-node-01' is 361% where the max CPU on the 'compute-node-02' is 702%. I expect 2400% on the each node because each node has two Xeon Gold 6136 where each of them has 12 cores.
Without knowing how your software behaves, it's difficult for us to make a call on this. As MBastian says, it'd be a good idea to verify CPU performance with software that everyone can compare to. Is the hardware on all nodes the same? Same CPU and motherboard topology, SMT on in both systems, both BIOS and OS set to the same power-saving settings?
Outside of that there's a wealth of synthetic and real-world benches you can try using. I usually use ffmpeg for testing myself but if you've got pxz installed here's a fairly easy method to max out my 16 threads by compressing data from /dev/urandom:
Code:
cat /dev/urandom|pxz -T 16 -cv - > /dev/null
Load looks like this in htop once it's amassed enough data to occupy all 16 threads:
Code:
0 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 8 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
1 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 9 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
2 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 10 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
3 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 11 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
4 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 12 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
5 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 13 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
6 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 14 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
7 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 15 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
Mem[||||||||||||||||||||||||||||||||||||||||| 3.79G/62.8G] Tasks: 121, 107 thr, 282 kthr; 16 running
Swp[| 3.25M/2.00G] Load average: 8.27 2.54 0.88
Uptime: 201 days(!), 01:59:42
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
22282 effrafax 20 0 2522M 1872M 1904 R 1600 2.9 11:51.53 pxz -T 16 -cv -
22287 effrafax 20 0 2522M 1872M 1904 R 104. 2.9 0:44.46 pxz -T 16 -cv -
22295 effrafax 20 0 2522M 1872M 1904 R 102. 2.9 0:44.47 pxz -T 16 -cv -
22297 effrafax 20 0 2522M 1872M 1904 R 102. 2.9 0:44.33 pxz -T 16 -cv -
22289 effrafax 20 0 2522M 1872M 1904 R 102. 2.9 0:44.43 pxz -T 16 -cv -
22293 effrafax 20 0 2522M 1872M 1904 R 102. 2.9 0:44.49 pxz -T 16 -cv -
22286 effrafax 20 0 2522M 1872M 1904 R 102. 2.9 0:44.30 pxz -T 16 -cv -
22290 effrafax 20 0 2522M 1872M 1904 R 100. 2.9 0:44.50 pxz -T 16 -cv -
22292 effrafax 20 0 2522M 1872M 1904 R 100. 2.9 0:44.20 pxz -T 16 -cv -
22298 effrafax 20 0 2522M 1872M 1904 R 100. 2.9 0:44.51 pxz -T 16 -cv -
22284 effrafax 20 0 2522M 1872M 1904 R 100. 2.9 0:44.37 pxz -T 16 -cv -
22285 effrafax 20 0 2522M 1872M 1904 R 100. 2.9 0:44.47 pxz -T 16 -cv -
22296 effrafax 20 0 2522M 1872M 1904 R 100. 2.9 0:44.36 pxz -T 16 -cv -
22288 effrafax 20 0 2522M 1872M 1904 R 100. 2.9 0:44.52 pxz -T 16 -cv -
22291 effrafax 20 0 2522M 1872M 1904 R 100. 2.9 0:44.30 pxz -T 16 -cv -
22294 effrafax 20 0 2522M 1872M 1904 R 98.4 2.9 0:44.15 pxz -T 16 -cv -
22283 root 20 0 8844 4748 3416 R 2.0 0.0 0:01.10 htop
Unlike stress, this will show you the parent process; as you can see from the above, it's able to hit 1600% CPU without issue and htop gives you a nice visual indication of what each CPU is up to, so it's relatively easy to spot whether your system's hitting the limits or not.
Code:
stress --cpu $(cat /proc/cpuinfo | grep ^processor| wc -l)
Useless use of cat award!
Code:
grep ^processor /proc/cpuinfo|wc -l
...is a cleaner way of saying the command substitution. Since it's just grabbing the total number of CPUs though you might want to consider using: