Why is only 1 cpu used?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
I don't understand why in some multithreaded apps, only 1 cpu appears to be used? This is from cpu-z bench which we know is multithreaded...

when I run cinebench, all the threads are used.

I wonder if this has something to do with the fact that I am only using single channel memory for each cpu right now on my asus z10pe-d16... I have tried 2 different cpu types and same thing. they are all same stepping, but are both es/qs...

wondering if it's MB or is it because it's ES/QS cpu or is it because I am using only single channel ram for each cpu....

Anyone seen this before?

1cpu.jpg
 

ruffy91

Member
Oct 6, 2012
71
11
8
Switzerland
It seems CPU-Z is not NUMA aware. RAM is only accessible by it's own CPU. As a consequence the OS schedules the task only on the Cores which have direct access to the RAM where the data of the process lies.
You can set NUMA node interleaving to on in the BIOS, this will change the behaviour so that all process memory will be spread on all CPUs. This will have a very bad impact on performance but should allow non NUMA aware software to run on all CPUs.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
so would this problem be resolved if I fill out all the memory channels on the MB? right now, I have 1x 16GB dimm for each 1 of the cpu. would it help to have 4 dimm or quad channel mem access for each cpu?

I tried single channel on asus z9pe-d16 with dual 2696 v2 and this using only 1 cpu didn't seem to happen even when I had only 1 dimm per cpu. I don't know if there's any difference with regard to numa, etc between sandy/ivy vs haswell/broadwell...

I notice this in cpuz and also userbenchmark. cinebench seems not bothered. y cruncher still use all cpu and threads except it changed the type of multithread workload spawning scheme....

I can't wait for the rest of my dimm to get here to see if using quad channel would make any difference....
 

William

Well-Known Member
May 7, 2015
789
252
63
66
Your are correct about CPUz, it is not NUMA aware.
This is a known issue from when that bench first came out.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
Your are correct about CPUz, it is not NUMA aware.
This is a known issue from when that bench first came out.
but why did this not happen on asus z9pe-d16 with dual 2696 v2? I ran 1 dimm per cpu on that as well as 4 dimms per cpu on that and it was using all cores fine.

only noticed this on the z10pe-d16... so you dont think even if i use 4 dimm per cpu it would help on the z10?

Is there something different about numa between sandy/ivy compared to haswell/broadwell?
 

William

Well-Known Member
May 7, 2015
789
252
63
66
I just ran it on both Z10PE-D16 WS and SM X11DPi-N boards and it appears to use all cores/threads.
If I remember correctly when they first came out with this bench it only worked for single socket systems. Its been a long time that I have run it, it appears they have fixed it.

I wouldn't worry about it, its not a benchmark anyone really uses TBH so its not updated very often.

Do other programs use all cores/threads ?
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
I just ran it on both Z10PE-D16 WS and SM X11DPi-N boards and it appears to use all cores/threads.
If I remember correctly when they first came out with this bench it only worked for single socket systems. Its been a long time that I have run it, it appears they have fixed it.

I wouldn't worry about it, its not a benchmark anyone really uses TBH so its not updated very often.

Do other programs use all cores/threads ?
did you use single channel or quad channel memory? what dual cpu did you use? ES/QS or production?

i used the latest cpuz 1.81.

It is still a concern because it also affect a few other benchmark programs like userbenchmark and y cruncher...
 

William

Well-Known Member
May 7, 2015
789
252
63
66
I used the latest also. My systems are fully loaded with RAM, all slots used.

What motherboard are you using ?
Mine E5-2699v3's are ES QS from Intel. The Gold 3134's are also ES.
I highly doubt single channel or quad channel will make a dif in this bench, effecting how many cores are used.

If you have this issue on other benchmarks I suspect something is wrong with your MB, BIOS or CPU's.
Use HWiNFO to get spec's from your CPU's and check to make sure you have the latest BIOS.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
Latest z10 bios.. tried with dual 2686v3 qs as well as early step dual qhuz v4 es...

Any possiblility it could be my mb being an early ver? I heard the early ver didnt support v4 cpu but the dual es v4 do run in my mb so I dont think this is the cause.... I cant think of any particular bios setting.
 

William

Well-Known Member
May 7, 2015
789
252
63
66
Mine Z10PE-D16 WS was one of the first version boards, hot off the first reviews. With BIOS updates it supported V4's just fine. I had the 22 core chips running in it at one time.

BTW its still running strong and using it right now.

Run HWiNFO and lets see what stepping those chips are.
 

William

Well-Known Member
May 7, 2015
789
252
63
66
Yeah ok, I see now... ES2 is an L0 stepping. Production stepping is M0
You do not have QS steppings, just another strange way Intel does these things. Could have been for a OEM or something.

I don't know man, your running 2 of these or just one ?
Either way it wouldn't make a dif for what cores get used for that benchmark.

I suspect something is wrong with these CPU's.
Do you have another board to try it on ?
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
Yeah ok, I see now... ES2 is an L0 stepping. Production stepping is M0
You do not have QS steppings, just another strange way Intel does these things. Could have been for a OEM or something.

I don't know man, your running 2 of these or just one ?
Either way it wouldn't make a dif for what cores get used for that benchmark.

I suspect something is wrong with these CPU's.
Do you have another board to try it on ?
Look also at my 2686 v3... they are SR1XD, not early step and getting the same thing of 1 cpu being used in some benches...

i haven't got any other 2011 v3 mb to try. i suspect it's not the cpu.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
I changed to dual 2696 v3 and also using quad channel memory set up now and still the same only 1 cpu being used on cpuz and userbenchmark... ;(
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
Sounds like a board issue then... very strange.
I am wondering if it could possibly be a bios setting?

how do I find out what's the version of MB?

I don't understand why this is happening... why does cinebench run all threads/cores, but some other programs only run exactly half threads/cores.... if the mb is broken then it should just crashes or affect all programs?
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,197
443
83
49
what is the cpuz multithread bench score you are getting on your dual 2699 v3 system?

btw, I just had a talk with alex yee, y cruncher programmer. He told me this thing about windows processor groups:


My question: reason I ask is b/c when I run cpu z and userbenchmark on my dual 2696 v2 it uses all cores/threads but when I run those two benches on dual 2696 v3, it only uses exactly only the threads on one cpu...

His answer:

"Windows has a concept called, "Processor Groups". Each group has a limit of 64 logical cores. So on systems with 64 groups or less, there is only one group. On the 2696v3, there are 72. So there are two groups of 36 cores each - each with the cores from each NUMA node.

By default, programs will only be able to use only 1 group. In order to use multiple groups, the program needs to be specially programmed for it. based on your description, Cinebench does, but UserBenchmark doesn't.

Both y-cruncher's Push Pool and Cilk Plus frameworks are able to use multiple processor groups. But Cilk Plus is better optimized for it. So it defaults to Cilk Plus if there are multiple processor groups.

y-cruncher also has other frameworks, but those aren't processor group aware. So they will only be able to use one processor on your 2696v3 system."

Interesting stuffs... That would explain why cpuz and userbenchmark didn't have any problems with using all threads on a dual 2696 v2 (24c, 48threads), but when I run them on 2696 v3 (36c, 72 threads) and qhuz v4 (40c, 80 threads), only the threads in 1 cpu are used.
 

William

Well-Known Member
May 7, 2015
789
252
63
66
Well this is strange. I swear I ran this before and it used all cores... but it was an older CPUz.
I just downloaded the newest version and I get this...

CPUz Bench.jpg