AMD EPYC 2 Rome What We Know Will Change the Game

Patrick Kennedy · Nov 6, 2018

With the AMD EPYC 2 Rome bombshell including 64 cores, PCIe 4.0 connectivity, and an improved NUMA architecture, we take stock of AMD EPYC 2 Rome versus Intel Cascade Lake-SP to see where the mainstream market will be in 2019

The post AMD EPYC 2 Rome What We Know Will Change the Game appeared first on ServeTheHome.

realtomatoes · Nov 7, 2018

yep, i needed to come here to *drool*

Edu · Nov 9, 2018

I seems the new architecture increases off-die bandwidth and reduced off-die latency. This will benefit non NUMA aware applications, like databases. However, since all DRAM access has to happen off-die, the latency for NUMA aware applications will certainly increase. Maybe, they are using something like TSMC's InFO to connect the dies, which would allow them more bandwidth.

MiniKnight · Nov 9, 2018

If you're building a new per socket licensing cluster next year, you'd be crazy not to buy Rome. That's insane.

zir_blazer · Nov 9, 2018

Edu said:
However, since all DRAM access has to happen off-die, the latency for NUMA aware applications will certainly increase. Maybe, they are using something like TSMC's InFO to connect the dies, which would allow them more bandwidth.

Was it confirmed that the Memory Controller is on the I/O chip? Somehow it seems horrible from a latency standpoint, but the TR-W series showed that pure compute cores could scale "well enough".
Also, this would make a single Rome fully UMA. Latency to memory should be uniform across all different dies, as all the CPU dies have to do one hop to the I/O die to get to the RAM. If anything, you still have the Thread to CPU allocation due to SMT, cache sharing, inter CCX latency, etc, but the RAM should be uniform.

Edu · Nov 9, 2018

zir_blazer said:
Was it confirmed that the Memory Controller is on the I/O chip? Somehow it seems horrible from a latency standpoint, but the TR-W series showed that pure compute cores could scale "well enough".
Also, this would make a single Rome fully UMA. Latency to memory should be uniform across all different dies, as all the CPU dies have to do one hop to the I/O die to get to the RAM. If anything, you still have the Thread to CPU allocation due to SMT, cache sharing, inter CCX latency, etc, but the RAM should be uniform.

Yes it was confirmed. Indeed RAM access will be more uniform, but that hardly matters. Really, you just want access to be fast as possible.

zir_blazer · Nov 9, 2018

Edu said:
Yes it was confirmed. Indeed RAM access will be more uniform, but that hardly matters. Really, you just want access to be fast as possible.

Technically that dramatically simplifies EPYC platform. Memory Channels and intersocket I/O are now single extremely wide ones instead of being a multitude of narrow paths where each die had multiple I/O controllers.

CPU Die -> I/O Die -> Memory Bank/CPU Die or CPU Die -> I/O Die -> I/O Die -> Memory Bank/CPU Die in Dual EPYC setups. Doesn't seems that bad. Reminds me of early Pentium 2/3 era Xeons where you had 4 Processors wired on parallel on the same FSB along with one or two Northbridges with their own Memory Controller and PCI Host Bridge.
Still, latency should be higher overall. There should be a few scenarios where you have a performance regression due to that...

Search

AMD EPYC 2 Rome What We Know Will Change the Game

Patrick Kennedy

Guest

realtomatoes

Active Member

Edu

Member

MiniKnight

Well-Known Member

zir_blazer

Active Member

Edu

Member

zir_blazer

Active Member