E5v4 Xeon vs EPYC AI acceleration? (CPU utilization differences between platforms)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gsrcrxsi

Active Member
Dec 12, 2018
293
96
28
another data point. it might not be AMD vs intel directly.

I had another user try it out on a newer Intel System: 2x Xeon Gold 5218, 64GB, 2x V100, so Cascade Lake.

this system exhibited the same high CPU use as the AMD systems. but the Intel Broadwell/Haswell/Ivy Bridge stuff all uses low CPU utilization, but the overall task runs about the same time (usually ~11-12hrs for my systems, running 4-5 concurrently).
 

Keith Myers

Active Member
Oct 10, 2020
146
30
28
Could it be something as simple as those Intel Broadwell/Haswell/Ivy Bridge cpus all have L4 caches and your AMD cpus and the more recent Intel Xeon Gold cpus do not?
 

gsrcrxsi

Active Member
Dec 12, 2018
293
96
28
not all Broadwell chips have the 128MB L4 eDRAM. Most do not. That’s only on a few iris pro models.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,159
1,549
113
but the Intel Broadwell/Haswell/Ivy Bridge stuff all uses low CPU utilization
you can check something. looks to me the Broadwell/Haswell/Ivy Bridge stuff can do some other stuff beside your AI stuff.
run i.e. CPU-Z stress cpu and see what happens. i bet the low utilization is a faulty information.
why not utilize to 100% and do things faster ?
what are they doing if not utilize at 100% then ? someone must limiting to low utilization.
 
Last edited:

gsrcrxsi

Active Member
Dec 12, 2018
293
96
28
It’s not wrong information unless top/htop doesn’t report process CPU use properly on Xeon v4 only? But I’ve never heard of that. Other loads will show full 100% load on the Broadwell CPU.

and the Broadwell system actually processed the workfaster, despite lower utilization and being a much older CPU. Same OS, same application, same GPU, just different motherboard/CPU/mem
 
Last edited:

gsrcrxsi

Active Member
Dec 12, 2018
293
96
28
why lower ? should be 100% until the work is done !!!
Well that’s kind of the point of this thread. To try to find out why the application act so differently on this system. The application/script is doing reinforcement learning with Pytorch. It won’t use all of the CPU or the GPU as it goes back and forth between the CPU and GPU during training of the 32x agents. But I’m trying to find out what is making these newer systems use so much more CPU, which in effect limits how many tasks I can run in parallel.

a single Xeon E5-2697Av4 has no problem servicing 5 tasks at once with about40-50% total CPU use. But even an EPYC 7V12 with 4x the number of cores is pegged with only 10 tasks. It doesn’t make any sense.
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
i probably wouldn't feel comfortable posting it publicly since it's someone else's scientific research, but I can PM it to you.

or since you're familiar with BOINC, if you have a Linux/Nvidia system, you can attach to GPUGRID and download a task. you'll get sent the python code (~500 lines) and a python environment archive (2.7GB) with the task.
I wouldn't worry about it, its an open platform. But if you feel like it please pm.
Only have a docker boinc setup with linux + nv; and sadly gpugrid doesn't like docker env.


From what you described, this indeed has to do something with how they open thread pool array on bigger systems.
( Even if i manage to help/change the script, you will need to give it to gpugrid staff for review so they can apply it to their repo)
 

gsrcrxsi

Active Member
Dec 12, 2018
293
96
28
( Even if i manage to help/change the script, you will need to give it to gpugrid staff for review so they can apply it to their repo)
yes for sure! they have seemed open to changes in the past

sending you a PM

I'm not sure if there is any limit in system size where the issue becomes present. I did compare this on a Ryzen 5950X also, which has the same 16c/32t as my Xeon. The 5950X had the same high CPU use issue as my other AMD EPYC systems which is why I thought it was an AMD vs Intel thing until another user observed high CPU use on a Cascade Lake Xeon also. That Cascade Lake system was running in docker though (a rented instance from vast.ai) but not sure if that matters.
 
Last edited:

gsrcrxsi

Active Member
Dec 12, 2018
293
96
28
just an update for everyone.

@CyklonDX generously helped look over the code and determined it's a software issue in the code causing unchecked CPU use. we were able to modify the code to put hard limits on the number of cores used, which both reduced the amount of total CPU used, and sped up task execution by almost double.

I think the fact that each process uses a little more CPU% on the newer CPUs isn't a problem and just the better IPC allowing more work per cycle. the problem was the unchecked spawning of processes to fiill the whole CPU when it wasnt necessary.

I originally thought this was some kind of Intel vs AMD thing which is why I started the topic in the CPU hardware forum, but it's actually an "old CPU vs new CPU" thing with a software bug being a major contributing factor.

can read more at the gpugrid project forums if you're interested
https://gpugrid.net/forum_thread.php?id=5233&nowrap=true#59693