All the LGA4677 Sapphire Rapids pictures I have

Syr

New Member
Sep 10, 2017
20
13
3
That core count doesnt sound right, since the top end SKU is supposed to be 56 cores, with a maximum total of 64 cores including the dark silicon. Each chiplet/tile has 16 cores possible on it. Maybe they meant 28 threads (as they show up to most software as 28 cores) on one tile, or its actually 2 tiles active with 14 each for a total of 28 physical cores an 56 threads?
The tiles themselves are so big because:
1. Intel uses a less dense transistor topology for much of their cpu core logic on Intel 10nm vs. what AMD uses on TSMC 7nm. This enables them to aggressively push clock speed on some SKUs. AMD meanwhile much prefers a dense, efficiency optimized topology for a significant majority of their core cpu logic, but trading off the ability to get high clocking bins as easily. Thus even with similar IPC, a sapphire rapids golden cove core takes up far more die area than a zen3 core.
2. Theres a lot of redundant dark silicon, not just for cpu cores but also for IO. This is technically yield-motivated, but the reason for it isn't to rush it out for investor relations sake (making a fully yielding 16core 10nm tile is entirely within intel's ability at this point), but rather to be able to supply adequate volume of high corecount chips to hyperscalar customers. There was some industry drama about inadequate volume for skylake's 'early shipment' SKUs.
3. Sapphire rapids has a lot more IO than Icelake (which has more IO than skylake/cascadelake/cooperlake). As of 2020H2, I had heard that plan was that the amount of IO of different types that you get will depend on the SKU you buy. There seemed to be talk of bringing back single-socket chips (think like the E5 16xx series) with near-Epyc-levels of PCIe IO, and supposedly dual socket configurations might have very slightly more IO than comparable dual socket Epyc configurations, though individually those dual socket SKUs have less IO than the single socket SKUs. So when you factor in all that redundant IO on top of whats actually enabled, thats a huge amount of die area for the uncore silicon. Supposedly the reason for this is that Intel evolved their intersocket interconnect to make use of toggled lanes like Epyc does, but it seems they are using silicon fuses to set the behavior, rather than being able to do it dynamically at bootup like Epyc does.
 

111alan

Member
Mar 11, 2019
98
29
18
Haerbing Institution of Technology
That core count doesnt sound right, since the top end SKU is supposed to be 56 cores, with a maximum total of 64 cores including the dark silicon. Each chiplet/tile has 16 cores possible on it. Maybe they meant 28 threads (as they show up to most software as 28 cores) on one tile, or its actually 2 tiles active with 14 each for a total of 28 physical cores an 56 threads?
The tiles themselves are so big because:
1. Intel uses a less dense transistor topology for much of their cpu core logic on Intel 10nm vs. what AMD uses on TSMC 7nm. This enables them to aggressively push clock speed on some SKUs. AMD meanwhile much prefers a dense, efficiency optimized topology for a significant majority of their core cpu logic, but trading off the ability to get high clocking bins as easily. Thus even with similar IPC, a sapphire rapids golden cove core takes up far more die area than a zen3 core.
2. Theres a lot of redundant dark silicon, not just for cpu cores but also for IO. This is technically yield-motivated, but the reason for it isn't to rush it out for investor relations sake (making a fully yielding 16core 10nm tile is entirely within intel's ability at this point), but rather to be able to supply adequate volume of high corecount chips to hyperscalar customers. There was some industry drama about inadequate volume for skylake's 'early shipment' SKUs.
3. Sapphire rapids has a lot more IO than Icelake (which has more IO than skylake/cascadelake/cooperlake). As of 2020H2, I had heard that plan was that the amount of IO of different types that you get will depend on the SKU you buy. There seemed to be talk of bringing back single-socket chips (think like the E5 16xx series) with near-Epyc-levels of PCIe IO, and supposedly dual socket configurations might have very slightly more IO than comparable dual socket Epyc configurations, though individually those dual socket SKUs have less IO than the single socket SKUs. So when you factor in all that redundant IO on top of whats actually enabled, thats a huge amount of die area for the uncore silicon. Supposedly the reason for this is that Intel evolved their intersocket interconnect to make use of toggled lanes like Epyc does, but it seems they are using silicon fuses to set the behavior, rather than being able to do it dynamically at bootup like Epyc does.
For the size I think it's completely possible for 28-core per die. Two of those dies combined seems to have a slightly larger size than the single Skylake XCC die, and 10nm is more than twice the density that of 14nm++. Even if there is a total of 128 PCI-e lanes it shouldn't take much area, as Skylake-XCC already have 64 per die(16 of them are dedicated to OPA so only 48 exposed). With the same ratio they can get you up to 256 PCI-e lane-worth of IO, but of course the socket and interconnection bandwidth won't allow it.

And for the cores, look at the die size of Cypress Cove and Icelake SP, it's not that large at all. Icelake-XCC gave you half more cores and much more Cache and IO while shrinking 50mm2 in size and wasting a lot of empty space. I think for the actual cores the size difference between SKL and ICL isn't as large as what some people want you to believe.

And since there's a huge mass produce variant that has 42 cores, and they're still making a lot of core count+iGPU variants, I think their yield rate should be close to 14nm at this point, or they will just make 1 design and block the bad part if they have a lot of waste to deal with.

For the IPC, as what I can tell from the 14-core 2GHz sample we have, Icelake is already far superior than Zen3 overall(vs 5800X). I don't think they'll change the architecture much in SPR, but they could add more cache including the addition of HBM. Actually the "AMD has better IPC" claim is pretty much a wrong misconception at this point. AMD does good in cache-only benchmarks like CPU-Z and Cinebench because they have a huge, ultra-aggressive cache and a cache structure like Haswell, and some people just hold those so dear and keep bringing it up like it's an industrial standard. And from the measured IPC stand point, Zen2 is far inferior than Skylake, and Zen3 is only 10% faster than Skylake per-core per-GHz in rendering, while Icelake is about 25-30% faster. For the reason, it's because IPC is an end result, of course it contains Front-end and Memory bound, which is what AMD is not as good now.
IPC.png
measured in Vtune and Uprof. 7zip compresses Red Alert 3 game file, and 3Dsmax runs an actual gym design scene with vray. No 5800x now because AMD still hasn't update Uprof to support it yet.
 
Last edited: