By the power of Cunningham's law, I will attempt to partially answer my own questions.
From the slides, it appears that the bus width of a GMI3 link is still 32 Byte for read (16 Byte for write). Just as in previous gens.
And the frequency is stated as "1.8GHz max". No idea what to make of "up-to-36Gbps, 20:1". Can anyone enlighten me on that?
If we roll with 32 Byte bus width and 1.8GHz, that's 57.6 gigaByte/s of read bandwidth for a single GMI3 link.
230.4 GB/s for 4 links
460.8 GB/s for 8 links
691.2 GB/s for 12 links
The theoretical memory bandwidth of 12xDDR5-4800 is 460.8GB/s. What a coincidence.
By my math, 8 GMI3 links would be just enough to match the full memory bandwidth, at least for reads.
The first question remains though. The article states "
The 4x CCD variants (up to 32 cores) have an interesting trick where they can get 2x the links to the IO die per CCD"
I am stumbling over "can" here, which also appears in the AMD slides. Are there Epyc 9004 CPUs with 4 CCDs, but only one GMI3 link per CCD?
I'm gonna take a wild guess here: the CPUs with only 64MB of L3 cache total have one GMI3 link per CCD.