AMD EPYC 2 Rome What We Know Will Change the Game

  • Thread starter Patrick Kennedy
  • Start date

Edu

Member
Aug 8, 2017
54
8
8
30
I seems the new architecture increases off-die bandwidth and reduced off-die latency. This will benefit non NUMA aware applications, like databases. However, since all DRAM access has to happen off-die, the latency for NUMA aware applications will certainly increase. Maybe, they are using something like TSMC's InFO to connect the dies, which would allow them more bandwidth.
 

zir_blazer

Active Member
Dec 5, 2016
273
87
28
However, since all DRAM access has to happen off-die, the latency for NUMA aware applications will certainly increase. Maybe, they are using something like TSMC's InFO to connect the dies, which would allow them more bandwidth.
Was it confirmed that the Memory Controller is on the I/O chip? Somehow it seems horrible from a latency standpoint, but the TR-W series showed that pure compute cores could scale "well enough".
Also, this would make a single Rome fully UMA. Latency to memory should be uniform across all different dies, as all the CPU dies have to do one hop to the I/O die to get to the RAM. If anything, you still have the Thread to CPU allocation due to SMT, cache sharing, inter CCX latency, etc, but the RAM should be uniform.
 

Edu

Member
Aug 8, 2017
54
8
8
30
Was it confirmed that the Memory Controller is on the I/O chip? Somehow it seems horrible from a latency standpoint, but the TR-W series showed that pure compute cores could scale "well enough".
Also, this would make a single Rome fully UMA. Latency to memory should be uniform across all different dies, as all the CPU dies have to do one hop to the I/O die to get to the RAM. If anything, you still have the Thread to CPU allocation due to SMT, cache sharing, inter CCX latency, etc, but the RAM should be uniform.
Yes it was confirmed. Indeed RAM access will be more uniform, but that hardly matters. Really, you just want access to be fast as possible.
 

zir_blazer

Active Member
Dec 5, 2016
273
87
28
Yes it was confirmed. Indeed RAM access will be more uniform, but that hardly matters. Really, you just want access to be fast as possible.
Technically that dramatically simplifies EPYC platform. Memory Channels and intersocket I/O are now single extremely wide ones instead of being a multitude of narrow paths where each die had multiple I/O controllers.

CPU Die -> I/O Die -> Memory Bank/CPU Die or CPU Die -> I/O Die -> I/O Die -> Memory Bank/CPU Die in Dual EPYC setups. Doesn't seems that bad. Reminds me of early Pentium 2/3 era Xeons where you had 4 Processors wired on parallel on the same FSB along with one or two Northbridges with their own Memory Controller and PCI Host Bridge.
Still, latency should be higher overall. There should be a few scenarios where you have a performance regression due to that...