Will CPUs ever come to parity with GPUs for ML/AI/compute?

thrasher · Jul 12, 2021

I've been stuck in this quagmire for quite some time.

Right now, the whole demand distribution for GPUs is out of whack:

Gamers want the new hotness with 12 GB of HBM2 to play Cyberpunk
Cryptominers want any and every GPU at any price to mine useless entropy (so they can buy more GPUs, of course)
Every one and their uncle wants consumer GPUs to train neural network slaves
Amazon wants billions of datacenter GPUs because every other serverless AI company is eager to get locked into an AWS CUDA contract for 10 years
Everyday PC users want basic GPUs just to use Excel
Teenage internet arbitrageurs with infinite credit use bidding bots and eBay to wipe all OEM cards at anything below 200% of MSRP off the market

And of course the supply chain for GPUs is also going nuts:

All the auto manufacturers broke their contracts with "force majeure" last year, so all the generic chip suppliers went into emergency shutdown and are still recovering from being offline
All the high-tech foundries were offline for a brief period last year but now are so busy with custom chips (e.g. Tesla, Google TPUs, Apple M1) they had to raise prices
Dies for memory and compute are getting absolutely enormous, so for a fixed number of 5" wafers there are fewer dies to go around

Meanwhile, modern CPUs with huge core counts and caches in the hundreds of MB are on eBay for $10-20 a core (e.g. Epyc). Ok, maybe some of them are broken. Used DDR4 ECC RAM from all the server companies that AWS ate is going for $3.50 / GB.

Do we eventually reach a point where it costs the same to deliver a given unit of neural network / simulation / AI compute in CPU or GPU form?

As a part-time developer, I say this partly out of hatred for the whole GPU co-processor architecture and dev/deployment process. If I want to use a GPU for machine learning or data science, I have two options:

Use "Somebody Else's Library," e.g. CUDA, cuDNN, PyTorch, Keras, you name it, and write everything in Python with very few knobs for optimization, and get vendor-locked on NVIDIA cards
Write a machine learning library of my own with the use of $INSERT_GPU_VENDOR's non-OSS tool (e.g. Vulkan, OpenCL, CUDA ) by pasting string literal blocks of shader code as variables in my code and hoping they execute correctly

I don't want to write non-verifiable shaders. I have no interest in entertaining NVIDIA (or Apple, or even AMD) and their shareholders while they dump money into more and more elaborate closed-source GPU architectures and pump out shipping containers of $1k nonrepairable bricks of plastic, PCB and silicon that get 20% bigger in every conceivable dimension yearly.

I have no interest in playing lame grimy open-world games that burn 600 watts; I already have one that burns 0 watts. It's called walk through "the average American alleyway."

Nor do I want some dumb FPGA co-processor. FPGAs have their place: hardware development and places where an ASIC is too expensive.

I'd like to (and currently am!) writing ML codes, of a sort, in conventional strongly typed languages compiled to CPU machine code. They're incredibly debuggable and verifiable. They scale to any CPU configuration or platform supported by GCC or clang, which means anything from a toaster to a microwave to a supercomputer can run them. There are almost 50 years of compileable math and physics libraries available for CPUs. I can expand the CPU RAM on my server without changing any other parameters; I can change the thermal paste on its CPU every couple years without violating the warranty or potentially breaking some delicate power supply chip; I can upgrade the CPU, and only the CPU, while generating minimal e-waste; and I can run any number of other tasks on the CPU at the same time, within the limits of the Linux task scheduler.

Other than memory bit width, (1024 vs 64) is there any specific reason why a high core count (talking >64c) CPU or cluster can't compete with a pile of GPUs on ML/AI tasks? Or even a single GPU?

Bert · Jul 12, 2021

Overhead of CISC instruction set?

thrasher · Jul 16, 2021

Bert said:
Overhead of CISC instruction set?

I guess maybe. Maybe it's just about parallelism - GPUs can process datasets that are just plain fatter.

RTM · Jul 16, 2021

I doubt it, well.... maybe... it depends...

As i see it CPU's and their capabilities is the product of the features everyone (more or less) needs.

I doubt we will see everyone needing to do training (thus I do not expect this to be implemented in CPU's), but inference seems quite likely (thus likely to be implemented). Of course I believe I have read somewhere (probably in one of the articles on the main site) that inference is being built-in already in newer designs.

i386 · Jul 18, 2021

Cpus are for general use, other "processors" or accelerators are for special purposes. They always existed besides cpus and "outperformed" them.

thrasher · Jul 18, 2021

I guess that's it. CPUs are jacks of all trades, but "masters of none."

Search

Will CPUs ever come to parity with GPUs for ML/AI/compute?

thrasher

New Member

Bert

Well-Known Member

thrasher

New Member

RTM

Well-Known Member

i386

Well-Known Member

thrasher

New Member