E5v4 Xeon vs EPYC AI acceleration? (CPU utilization differences between platforms)

gsrcrxsi · Jan 6, 2023

another data point. it might not be AMD vs intel directly.

I had another user try it out on a newer Intel System: 2x Xeon Gold 5218, 64GB, 2x V100, so Cascade Lake.

this system exhibited the same high CPU use as the AMD systems. but the Intel Broadwell/Haswell/Ivy Bridge stuff all uses low CPU utilization, but the overall task runs about the same time (usually ~11-12hrs for my systems, running 4-5 concurrently).

Keith Myers · Jan 6, 2023

Could it be something as simple as those Intel Broadwell/Haswell/Ivy Bridge cpus all have L4 caches and your AMD cpus and the more recent Intel Xeon Gold cpus do not?

RolloZ170 · Jan 6, 2023

Keith Myers said:
all have L4 caches

what ? only two Broadwell have L4

gsrcrxsi · Jan 6, 2023

not all Broadwell chips have the 128MB L4 eDRAM. Most do not. That’s only on a few iris pro models.

RolloZ170 · Jan 6, 2023

gsrcrxsi said:
but the Intel Broadwell/Haswell/Ivy Bridge stuff all uses low CPU utilization

you can check something. looks to me the Broadwell/Haswell/Ivy Bridge stuff can do some other stuff beside your AI stuff.
run i.e. CPU-Z stress cpu and see what happens. i bet the low utilization is a faulty information.
why not utilize to 100% and do things faster ?
what are they doing if not utilize at 100% then ? someone must limiting to low utilization.

gsrcrxsi · Jan 6, 2023

It’s not wrong information unless top/htop doesn’t report process CPU use properly on Xeon v4 only? But I’ve never heard of that. Other loads will show full 100% load on the Broadwell CPU.

and the Broadwell system actually processed the workfaster, despite lower utilization and being a much older CPU. Same OS, same application, same GPU, just different motherboard/CPU/mem

RolloZ170 · Jan 6, 2023

gsrcrxsi said:
and the Broadwell system actually processed the workfaster, despite lower utilization

why lower ? should be 100% until the work is done !!!

RolloZ170 · Jan 6, 2023

gsrcrxsi said:
Other loads will show full 100% load on the Broadwell CPU.

as it should be. something is sheduling your work to lower utilization.

gsrcrxsi · Jan 6, 2023

RolloZ170 said:
why lower ? should be 100% until the work is done !!!

Well that’s kind of the point of this thread. To try to find out why the application act so differently on this system. The application/script is doing reinforcement learning with Pytorch. It won’t use all of the CPU or the GPU as it goes back and forth between the CPU and GPU during training of the 32x agents. But I’m trying to find out what is making these newer systems use so much more CPU, which in effect limits how many tasks I can run in parallel.

a single Xeon E5-2697Av4 has no problem servicing 5 tasks at once with about40-50% total CPU use. But even an EPYC 7V12 with 4x the number of cores is pegged with only 10 tasks. It doesn’t make any sense.

CyklonDX · Jan 6, 2023

gsrcrxsi said:
i probably wouldn't feel comfortable posting it publicly since it's someone else's scientific research, but I can PM it to you.

or since you're familiar with BOINC, if you have a Linux/Nvidia system, you can attach to GPUGRID and download a task. you'll get sent the python code (~500 lines) and a python environment archive (2.7GB) with the task.

I wouldn't worry about it, its an open platform. But if you feel like it please pm.
Only have a docker boinc setup with linux + nv; and sadly gpugrid doesn't like docker env.

From what you described, this indeed has to do something with how they open thread pool array on bigger systems.
( Even if i manage to help/change the script, you will need to give it to gpugrid staff for review so they can apply it to their repo)

gsrcrxsi · Jan 7, 2023

CyklonDX said:
( Even if i manage to help/change the script, you will need to give it to gpugrid staff for review so they can apply it to their repo)

yes for sure! they have seemed open to changes in the past

sending you a PM

I'm not sure if there is any limit in system size where the issue becomes present. I did compare this on a Ryzen 5950X also, which has the same 16c/32t as my Xeon. The 5950X had the same high CPU use issue as my other AMD EPYC systems which is why I thought it was an AMD vs Intel thing until another user observed high CPU use on a Cascade Lake Xeon also. That Cascade Lake system was running in docker though (a rented instance from vast.ai) but not sure if that matters.

gsrcrxsi · Jan 9, 2023

just an update for everyone.

@CyklonDX generously helped look over the code and determined it's a software issue in the code causing unchecked CPU use. we were able to modify the code to put hard limits on the number of cores used, which both reduced the amount of total CPU used, and sped up task execution by almost double.

I think the fact that each process uses a little more CPU% on the newer CPUs isn't a problem and just the better IPC allowing more work per cycle. the problem was the unchecked spawning of processes to fiill the whole CPU when it wasnt necessary.

I originally thought this was some kind of Intel vs AMD thing which is why I started the topic in the CPU hardware forum, but it's actually an "old CPU vs new CPU" thing with a software bug being a major contributing factor.

can read more at the gpugrid project forums if you're interested
https://gpugrid.net/forum_thread.php?id=5233&nowrap=true#59693

Search

E5v4 Xeon vs EPYC AI acceleration? (CPU utilization differences between platforms)

gsrcrxsi

Active Member

Keith Myers

Active Member

RolloZ170

Well-Known Member

gsrcrxsi

Active Member

RolloZ170

Well-Known Member

gsrcrxsi

Active Member

RolloZ170

Well-Known Member

RolloZ170

Well-Known Member

gsrcrxsi

Active Member

CyklonDX

Well-Known Member

gsrcrxsi

Active Member

gsrcrxsi

Active Member