All the LGA4677 Sapphire Rapids pictures I have

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

111alan

Active Member
Mar 11, 2019
290
107
43
Haerbing Institution of Technology
All from my friend as he bought and delided one. We also heard that this unit currently only has 1 of 4 dies active with 28 cores. He's trying to find a eval board for this thing to test it himself.

Seems to be 4 HBM pads on the corners that aren't soldered though.
 

Attachments

Syr

Member
Sep 10, 2017
55
20
8
That core count doesnt sound right, since the top end SKU is supposed to be 56 cores, with a maximum total of 64 cores including the dark silicon. Each chiplet/tile has 16 cores possible on it. Maybe they meant 28 threads (as they show up to most software as 28 cores) on one tile, or its actually 2 tiles active with 14 each for a total of 28 physical cores an 56 threads?
The tiles themselves are so big because:
1. Intel uses a less dense transistor topology for much of their cpu core logic on Intel 10nm vs. what AMD uses on TSMC 7nm. This enables them to aggressively push clock speed on some SKUs. AMD meanwhile much prefers a dense, efficiency optimized topology for a significant majority of their core cpu logic, but trading off the ability to get high clocking bins as easily. Thus even with similar IPC, a sapphire rapids golden cove core takes up far more die area than a zen3 core.
2. Theres a lot of redundant dark silicon, not just for cpu cores but also for IO. This is technically yield-motivated, but the reason for it isn't to rush it out for investor relations sake (making a fully yielding 16core 10nm tile is entirely within intel's ability at this point), but rather to be able to supply adequate volume of high corecount chips to hyperscalar customers. There was some industry drama about inadequate volume for skylake's 'early shipment' SKUs.
3. Sapphire rapids has a lot more IO than Icelake (which has more IO than skylake/cascadelake/cooperlake). As of 2020H2, I had heard that plan was that the amount of IO of different types that you get will depend on the SKU you buy. There seemed to be talk of bringing back single-socket chips (think like the E5 16xx series) with near-Epyc-levels of PCIe IO, and supposedly dual socket configurations might have very slightly more IO than comparable dual socket Epyc configurations, though individually those dual socket SKUs have less IO than the single socket SKUs. So when you factor in all that redundant IO on top of whats actually enabled, thats a huge amount of die area for the uncore silicon. Supposedly the reason for this is that Intel evolved their intersocket interconnect to make use of toggled lanes like Epyc does, but it seems they are using silicon fuses to set the behavior, rather than being able to do it dynamically at bootup like Epyc does.
 

111alan

Active Member
Mar 11, 2019
290
107
43
Haerbing Institution of Technology
That core count doesnt sound right, since the top end SKU is supposed to be 56 cores, with a maximum total of 64 cores including the dark silicon. Each chiplet/tile has 16 cores possible on it. Maybe they meant 28 threads (as they show up to most software as 28 cores) on one tile, or its actually 2 tiles active with 14 each for a total of 28 physical cores an 56 threads?
The tiles themselves are so big because:
1. Intel uses a less dense transistor topology for much of their cpu core logic on Intel 10nm vs. what AMD uses on TSMC 7nm. This enables them to aggressively push clock speed on some SKUs. AMD meanwhile much prefers a dense, efficiency optimized topology for a significant majority of their core cpu logic, but trading off the ability to get high clocking bins as easily. Thus even with similar IPC, a sapphire rapids golden cove core takes up far more die area than a zen3 core.
2. Theres a lot of redundant dark silicon, not just for cpu cores but also for IO. This is technically yield-motivated, but the reason for it isn't to rush it out for investor relations sake (making a fully yielding 16core 10nm tile is entirely within intel's ability at this point), but rather to be able to supply adequate volume of high corecount chips to hyperscalar customers. There was some industry drama about inadequate volume for skylake's 'early shipment' SKUs.
3. Sapphire rapids has a lot more IO than Icelake (which has more IO than skylake/cascadelake/cooperlake). As of 2020H2, I had heard that plan was that the amount of IO of different types that you get will depend on the SKU you buy. There seemed to be talk of bringing back single-socket chips (think like the E5 16xx series) with near-Epyc-levels of PCIe IO, and supposedly dual socket configurations might have very slightly more IO than comparable dual socket Epyc configurations, though individually those dual socket SKUs have less IO than the single socket SKUs. So when you factor in all that redundant IO on top of whats actually enabled, thats a huge amount of die area for the uncore silicon. Supposedly the reason for this is that Intel evolved their intersocket interconnect to make use of toggled lanes like Epyc does, but it seems they are using silicon fuses to set the behavior, rather than being able to do it dynamically at bootup like Epyc does.
For the size I think it's completely possible for 28-core per die. Two of those dies combined seems to have a slightly larger size than the single Skylake XCC die, and 10nm is more than twice the density that of 14nm++. Even if there is a total of 128 PCI-e lanes it shouldn't take much area, as Skylake-XCC already have 64 per die(16 of them are dedicated to OPA so only 48 exposed). With the same ratio they can get you up to 256 PCI-e lane-worth of IO, but of course the socket and interconnection bandwidth won't allow it.

And for the cores, look at the die size of Cypress Cove and Icelake SP, it's not that large at all. Icelake-XCC gave you half more cores and much more Cache and IO while shrinking 50mm2 in size and wasting a lot of empty space. I think for the actual cores the size difference between SKL and ICL isn't as large as what some people want you to believe.

And since there's a huge mass produce variant that has 42 cores, and they're still making a lot of core count+iGPU variants, I think their yield rate should be close to 14nm at this point, or they will just make 1 design and block the bad part if they have a lot of waste to deal with.

For the IPC, as what I can tell from the 14-core 2GHz sample we have, Icelake is already far superior than Zen3 overall(vs 5800X). I don't think they'll change the architecture much in SPR, but they could add more cache including the addition of HBM. Actually the "AMD has better IPC" claim is pretty much a wrong misconception at this point. AMD does good in cache-only benchmarks like CPU-Z and Cinebench because they have a huge, ultra-aggressive cache and a cache structure like Haswell, and some people just hold those so dear and keep bringing it up like it's an industrial standard. And from the measured IPC stand point, Zen2 is far inferior than Skylake, and Zen3 is only 10% faster than Skylake per-core per-GHz in rendering, while Icelake is about 25-30% faster. For the reason, it's because IPC is an end result, of course it contains Front-end and Memory bound, which is what AMD is not as good now.
IPC.png
measured in Vtune and Uprof. 7zip compresses Red Alert 3 game file, and 3Dsmax runs an actual gym design scene with vray. No 5800x now because AMD still hasn't update Uprof to support it yet.
 
Last edited:

Bert

Well-Known Member
Mar 31, 2018
822
383
63
45
Hey checking out here, will Intel release any chiplet based Sapphire Rapids? I am still confused about intel's roadmap.

Btw, dump question but what is the benefit of chiplet (single package for multiple cpus) vs traditional multi socket approach ? Isn't latter better for cooling and power delivery, I/O etc. We are now getting brick size CPUs thanks to AMD, which are hard to handle and cool.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,154
1,547
113
Hey checking out here, will Intel release any chiplet based Sapphire Rapids? I am still confused about intel's roadmap.
SC21 Intel press deck FINAL   -page-011.jpg
Btw, dump question but what is the benefit of chiplet
the can rise the yield rate because each chip is smaller. and use different techn. for different type of chip(i.e. core-die, IO-die)
 

Wasmachineman_NL

Wittgenstein the Supercomputer FTW!
Aug 7, 2019
1,872
617
113
Hey checking out here, will Intel release any chiplet based Sapphire Rapids? I am still confused about intel's roadmap.

Btw, dump question but what is the benefit of chiplet (single package for multiple cpus) vs traditional multi socket approach ? Isn't latter better for cooling and power delivery, I/O etc. We are now getting brick size CPUs thanks to AMD, which are hard to handle and cool.
Chiplets increase yields significantly and in the case of RDNA3 or Zen 4: mixed nodes. IOD on 6nm and CCDs on 5nm.
 

Bert

Well-Known Member
Mar 31, 2018
822
383
63
45
Ok this is all about reducing costs by smaller dies and mixed lithography. Last time I checked threadripper cpus, they were not much cheaper but I got this and makes sense.

Why do they need to package them into a single one? Is there a performance benefit? I'd rather have 2 32 cores cpu where power delivery and cooling is simpler over a single cpu with 64 cores if the pricing is linear.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,154
1,547
113
Last time I checked threadripper cpus, they were not much cheaper but I got this and makes sense.
chepaer for the manufacturer. mostly not overhanded to the buyer/customer. nice for stockbroker...
Why do they need to package them into a single one?
if one core needs to access the memory of another socket: the bus interconnect over sockets is slooooooooooooooooooooooooooooooooow.
 

Bert

Well-Known Member
Mar 31, 2018
822
383
63
45
if one core needs to access the memory of another socket: the bus interconnect over sockets is slooooooooooooooooooooooooooooooooow.
But I read same problem for Chiplet design. Is the memory latency smaller with chiplets.?
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,154
1,547
113
But I read same problem for Chiplet design. Is the memory latency smaller with chiplets.?
shorter signals, much much faster and more lanes possible.
the first main issue with big DIEs is the temp.size changing. they are soldered on the PCB(BGA)
 
  • Like
Reactions: Bert

Bert

Well-Known Member
Mar 31, 2018
822
383
63
45

RolloZ170

Well-Known Member
Apr 24, 2016
5,154
1,547
113
OK now it all makes sense, thank you very much. Intel really messed this up; funny how they reacted first:
AMD does same with with Intel years ago because of the core2quad processors, two DIE on a single package...tit for tat
 
  • Like
Reactions: Bert

111alan

Active Member
Mar 11, 2019
290
107
43
Haerbing Institution of Technology
for completeness:
these processors do not have HBM. the PCB with HBM looks like this.
View attachment 25254
One of my friend is already testing it. No newer stepping though

Seems intel botched this generation, the L3 latency is abysmal, dragging up the memory latency with it. Think they are still fixing this thing, There is a E5 stepping sample which my friend still haven't got his hands on.
 

Attachments

Last edited:

111alan

Active Member
Mar 11, 2019
290
107
43
Haerbing Institution of Technology
AMD does same with with Intel years ago because of the core2quad processors, two DIE on a single package...tit for tat
The time is different. Back then the throughput of cores are way smaller, so it can get away with having slower memory. Also most applications can't use more than two cores, so the extra cores are mostly decorations anyway.

Now CPU core sections, with way more cores and way wider architecture and FP units, are hundreds of times more powerful than core 2 duo, but DDR4 is only less than 4 times the bandwith that of DDR2, and the latency improvements are even more minimal. That is the problem. Actually I think intel has the right to bad mouth about AMD's MCM solution. It's very primitive with no advanced packaging, and the use of SerDes design which means the loaded latency can sky rocket. The end result, for example, in realworld rendering scenario below, a 64-core 7B13 does not have any real advantage against a 32-core 8375C. This is even the advertised app AMD put up on their presentations, some other apps like modeling and sculpting are even worse, considered unusable by some designers here. In recent months the prices of 3rd gen Xeon models went up a lot here, some models like 8375c became hard to come by. Think that is the reason.
8375_vs_7B13.JPG

BTW the PTS link, still working on EPYC3, it refuses to run on the older ubuntu version I used test other CPUs. There is an overall score you can select to show.

 
  • Like
Reactions: Bert and RolloZ170

Bert

Well-Known Member
Mar 31, 2018
822
383
63
45
The time is different. Back then the throughput of cores are way smaller, so it can get away with having slower memory. Also most applications can't use more than two cores, so the extra cores are mostly decorations anyway.

Now CPU core sections, with way more cores and way wider architecture and FP units, are hundreds of times more powerful than core 2 duo, but DDR4 is only less than 4 times the bandwith that of DDR2, and the latency improvements are even more minimal. That is the problem. Actually I think intel has the right to bad mouth about AMD's MCM solution. It's very primitive with no advanced packaging, and the use of SerDes design which means the loaded latency can sky rocket. The end result, for example, in realworld rendering scenario below, a 64-core 7B13 does not have any real advantage against a 32-core 8375C. This is even the advertised app AMD put up on their presentations, some other apps like modeling and sculpting are even worse, considered unusable by some designers here. In recent months the prices of 3rd gen Xeon models went up a lot here, some models like 8375c became hard to come by. Think that is the reason.
View attachment 25331

BTW the PTS link, still working on EPYC3, it refuses to run on the older ubuntu version I used test other CPUs. There is an overall score you can select to show.

I always thought in real world applications with AMD would be negatively impacted by slower memory access. IIUC, AMD works around that problem by bringing in lots of L3 cache. It seems like AMD's strategy:
- Simpler dies that are fused to achieve high core count without increasing costs
- Lots of cache to take advantage of the die space coming with shrinkage

Intel's strategy:
- High end cpu design to have the best scalable CPUs with monolithic dies
- Use custom logic to accelerate specific workloads such as AVX512, accelerators for networking, database specific operations

Two strategy is aligned by each company's strength.

- Intel wants to bring innovative chip design that can be leveraged by highly customized software to lock in more and more software hence future sales with high margin profits. Intel is massive and can execute on designing complex chips.
- AMD wants to bring in pure power based on simple design and offer them at the acceptable profits to proliferate their offerings. AMD cannot, at least was not able to, design complex chips being a smaller company.

AMD's strategy can be a winning strategy but can be easily imitated by Intel. Eventually Intel will make chiplets work, perhaps they are almost there and AMD's only competitive advantage will be being nimble. If there is a race to the bottom, Intel now cannot win as AMD secured its finances in the last 2 years.
If Intel cannot convince hyper-scalers, and we know that Meta moved to AMD and several of them have AMD offerings, to stay on their platform and optimize their software for their platform, Intel will be in trouble. On the other hand, I can totally see Wintel coming back and Azure is taking advantage of Intel's highly customized cpus to offer cloud services at competitive price. On the other, I don't see such offerings from any of the cloud providers.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,154
1,547
113
Eventually Intel will make chiplets work, perhaps they are almost there and AMD's only competitive advantage will be being nimble.
Intel made chiplets before AMD. AMD makes chiplets because its easier to change one DIE vs. all. and AMD does not have a chip-fabric.
to use same core DIE for many processors is a advance at first, with many disadvantages bought with it.
the PCB substrate will be more complex and expensive. the interconnects core-DIE <> IO-DIE is a bottlenek.
AMD could have done AM5 CPUs good, but makes them cooler compat. to AM4 and worse the temps with a IHS thicker than required.
AMD goes too fast and makes BIG faults.