Scalable Xeons: AMD sure scared Intel into pulling out the big guns

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gigatexal

I'm here to learn
Nov 25, 2012
2,913
607
113
Portland, Oregon
alexandarnarayan.com
Competition sure is awesome. I am still reading up on all the great content that STH has thrown out and still going through the Anandtech article with the EPYC benchmarks vs. the new Skylake-xeons but a blurb in that article hit home at least for my work use-case: EPYC is best suited for high density VMs but not data synchronization tasks as one would see in a Database context which sucks since as a DBA I'd be cool to have a reason to get some EPYC systems in house.

Re the title of this thread: the omni-path 100Gbps networking per socket (so four sockets means a 400GBPs connection) just struck me as part of a multi-punch combo from Intel where AMD just tweaked Zen cores and threw 4 of them on silicon. Still they have compelling hardware, both sides do.
 

littleredwagen

New Member
Dec 8, 2016
11
2
3
42
That's the same argument I saw from Intel (including their ridiculous claim that all AMD did was glue together when Intel in the past did that with the C2Q for example) but as was posted on the internet and on reddit I will quote:



Intel Says AMD EPYC Processors "Glued-together" in Official Slide Deck
Intel isn't wrong. When it "glued" them together all communication if I remember had to go through either the memory or chipset. AMD's approach is why Memory Speed is so critical with Zen based CPUs since the "infinity fabric" is the system ram (simplified I know). NUMA aware software isn't is the answer since there is one or two sockets but there can be eight CCXs. Basically it is what they did, they took Zen based CCX dies and stuck more on one package. Delid Threadripper, or epyc you will see.

Epyc with IHS

Imgur: The most awesome images on the Internet
 

markarr

Active Member
Oct 31, 2013
421
122
43
No one was really disputing Intel was "wrong" just kinda hypocritical since they have done it in the past. There are going to be workloads that do not perform as well on this type architecture OLTP is most likely going to be one, but there are going to be workloads that going to excel with the extra cores and threads. NUMA aware software is the key though as it is the smarts that will allow the multi threads without having go between CCX's as much, as ryzen has shown that the linux already sees each CCX as a numa node.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Just as a heads-up, I was present when Intel presented the deck in question. When they did present, we already had Platinum 8180's and EPYC 7601's so I had a different experience seeing that deck. In fact, we had EPYC before Intel (at least officially.)

The deck is not necessarily wrong. I did not think it was fair to share it.

There are a few considerations where it actually misses what could have been the most damning parts of EPYC performance.

Intel is still building a very high platform on a single piece of silicon. AMD realized they cannot compete going that route at this juncture so they are instead using an easier manufacturing process.

If you have an application like web hosting where you simply need lots of RAM capacity and decently fast cores, EPYC is a good choice.

The absolute killer here is the single socket configuration with all of the PCIe lanes. For storage servers, that is going to be the endgame in 12 months. In our next round of EPYC coverage next week, we are going to go into why it is good, but not necessarily a slam dunk.
 

PigLover

Moderator
Jan 26, 2011
3,184
1,545
113
The only thing my team is considering EPYC for is for single socket, high PCIe configurations - though not storage server necessarily but really high throughput networking apps.

Sent from my SM-G950U using Tapatalk
 
  • Like
Reactions: gigatexal

ATS

Member
Mar 9, 2015
96
32
18
48
Patrick it's not about not being able to compete rather more of a focus on a modular design. The way IF works and CCX's are designed for Ryzen they can scale up and down accordingly. This also means that while Intel has monolithic processor dies and has to design different dies this comes at the drawback of yields. Whereas with AMD yields are over 80% and as far as I remember the BOM means a good margin can be extracted for TR and EPYC.
People keep posting that asspull number. No one that can talk has any idea what the yields are and those that know can't talk. Also the idea that large dies have poor yields is intrinsically false, fyi.
 

Davewolfs

Active Member
Aug 6, 2015
339
32
28
I think AMD will do well with TR. Good price fast speeds and ECC support. I don't see Intel having anything that is similar.
 

zir_blazer

Active Member
Dec 5, 2016
355
128
43
I think that the "glued" argument is a desperation attack. Remember back when AMD did the same with Barcelona, praising their monolithic Quad Core design while C2Q Kentsfield still kicked it where it hurts?
Moreover, AMD has quite an overengineered fabric there. Kentsfield was a MCM that had two independent Buses to the Chipset, as if it was a Dual Xeon of the era, so if one Core wanted to talk to the other, it had to go to the Chipset and back. AMD instead has on-package interdie communications. Looks like a rather smartly designed MCM to me, more so that Magny Cours Opterons that did the same with HyperTransport instead of Infinity Fabric.
The thing that leaves me something to be desired is that as far that I know, NUMA Nodes includes CPUs and RAM, but the concept should scale to include the integrated PCIe Controllers since 32 lanes are local to a single die only.

Besides, this generation Intel screwed up badly with their Xeon lineup. So far, it seems that there are some task where Skylake-E can mop the floor with both Epyc and its predecessors (Those that can use AVX 512), but for a whole bunch of things, is the first time in more than half a decade that Intel has serious competence, and instead of significantly increase price-performance like they did with the HEDT segment in response to Ryzen, Intel decided to increase their market segmentation strategy and milk more money out of their Xeon customers. If this was done a year ago people would have had to suck it up, but now you have a solid alternative...
 
  • Like
Reactions: Stux

Davewolfs

Active Member
Aug 6, 2015
339
32
28
I think that the "glued" argument is a desperation attack. Remember back when AMD did the same with Barcelona, praising their monolithic Quad Core design while C2Q Kentsfield still kicked it where it hurts?
Moreover, AMD has quite an overengineered fabric there. Kentsfield was a MCM that had two independent Buses to the Chipset, as if it was a Dual Xeon of the era, so if one Core wanted to talk to the other, it had to go to the Chipset and back. AMD instead has on-package interdie communications. Looks like a rather smartly designed MCM to me, more so that Magny Cours Opterons that did the same with HyperTransport instead of Infinity Fabric.
The thing that leaves me something to be desired is that as far that I know, NUMA Nodes includes CPUs and RAM, but the concept should scale to include the integrated PCIe Controllers since 32 lanes are local to a single die only.

Besides, this generation Intel screwed up badly with their Xeon lineup. So far, it seems that there are some task where Skylake-E can mop the floor with both Epyc and its predecessors (Those that can use AVX 512), but for a whole bunch of things, is the first time in more than half a decade that Intel has serious competence, and instead of significantly increase price-performance like they did with the HEDT segment in response to Ryzen, Intel decided to increase their market segmentation strategy and milk more money out of their Xeon customers. If this was done a year ago people would have had to suck it up, but now you have a solid alternative...
I'm curious to see how AVX512 compares to Cuda on something like TensorFlow.
 

mbello

New Member
Nov 14, 2016
17
8
3
42
EPYC is best suited for high density VMs but not data synchronization tasks as one would see in a Database context which sucks since as a DBA I'd be cool to have a reason to get some EPYC systems in house.
The DB benchmarks Anandtech ran are being heavily criticized, they could have favoured Intel's bigger caches. I do not think you can draw any conclusion about Epyc vs Xeon for database applications based on Anandtech's work.
Hopefully STH will come up with a better comparison soon.
 
  • Like
Reactions: eva2000

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
Depends a lot on the DB and it's usage and licensing, AMD is support massive memory that will be great for in memory DB.
If your buying per core license for oracle then you can't go past the frequency optimized parts from Intel normally but of course it remains to be seen how this plays out in the next 6 months or so I guess , we will be doing our internal testing soon would be oracle , mongodb, sqlserver mostly, the SAP HANA lives on IBM power and makes most sense there now on power 8 and in future on power 9 so won't be looking at that.
 

Datamax

New Member
Jul 25, 2017
6
2
3
39
Any DB benchmark unless super carefully constructed with insane variety is not going to really be a relevant benchmark to any real world application - at all. There are just too many different DBs, with different architectures, different levels of optimizations, different data structures etc. A good optimization can make or kill a database - and in both cases relies in completely different subset of performance.

At the end of the day, what matters, in sane businesses, is Capability vs TCO over whole lifetime. With sane business i mean those who actually need to compete on the market, instead of having somekind of monopoly or infinite pockets (govts, banks)

In many things it's the I/O these days. Just for kicks i did quick back of the napkin maths, and single dual socket EPyC system might be able replace 2 racks of servers for us, if somehow magically you could utilize all those PCI-E lanes to connect to storage. Afaik there is not yet means to do that tho. In this case it is hundreds of HDDs, with few dozen SSDs thrown in. Those servers are essentially high capacity file servers / content distribution servers. Compute is almost irrelevant in that use, it's all about that I/O.

Hell, you could even argue our system is just a big big database as well.
 
  • Like
Reactions: gigatexal

Patriot

Moderator
Apr 18, 2011
1,450
789
113
Any DB benchmark unless super carefully constructed with insane variety is not going to really be a relevant benchmark to any real world application - at all. There are just too many different DBs, with different architectures, different levels of optimizations, different data structures etc. A good optimization can make or kill a database - and in both cases relies in completely different subset of performance.

At the end of the day, what matters, in sane businesses, is Capability vs TCO over whole lifetime. With sane business i mean those who actually need to compete on the market, instead of having somekind of monopoly or infinite pockets (govts, banks)

In many things it's the I/O these days. Just for kicks i did quick back of the napkin maths, and single dual socket EPyC system might be able replace 2 racks of servers for us, if somehow magically you could utilize all those PCI-E lanes to connect to storage. Afaik there is not yet means to do that tho. In this case it is hundreds of HDDs, with few dozen SSDs thrown in. Those servers are essentially high capacity file servers / content distribution servers. Compute is almost irrelevant in that use, it's all about that I/O.

Hell, you could even argue our system is just a big big database as well.
That is what makes the $400 8c Epyc chip soo so crazy... still 128pcie lanes (112 usable) and still 2tb memory support.

If you don't need compute.... you still get the I/O and ram support.
 
  • Like
Reactions: Stux and gigatexal

mbello

New Member
Nov 14, 2016
17
8
3
42
Depends a lot on the DB and it's usage and licensing, AMD is support massive memory that will be great for in memory DB.
If your buying per core license for oracle then you can't go past the frequency optimized parts from Intel normally but of course it remains to be seen how this plays out in the next 6 months or so I guess , we will be doing our internal testing soon would be oracle , mongodb, sqlserver mostly, the SAP HANA lives on IBM power and makes most sense there now on power 8 and in future on power 9 so won't be looking at that.
For proprietary databases the benchmark will be interesting out of curiosity but of little practical value because noone believes AMD will have a chance vs. the frequency optimized parts from Intel.

I think it is more interesting to see how the open source databases (MySQL and Postgres) and NoSQL databases perform on each processor.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
That is what makes the $400 8c Epyc chip soo so crazy... still 128pcie lanes (112 usable) and still 2tb memory support.

If you don't need compute.... you still get the I/O and ram support.
I am still a bit intellectually torn on this one. Sure you have a $400 CPU but 8x low capacity 16GB DIMMs will cost 2x what the CPU will cost. The server barebones are probably $1600-2000 and up. Storage is at least $400 and can go way up. If you have enough devices for 96 or 112 PCIe lanes the difference between $400 and $800 is likely mid to low single digit percentage. At $800 you move to dual Xeon Silver realm with more cores, more memory channels, and etc.

If you do not need that much expansion capability, then it is likely easier to go to a single CPU and one NUMA node.

The 16 core parts make some sense to me. The higher-end parts certainly make sense.

At the very low end (think 2U 12x SATA and 10GbE), the Xeon Bronze is cheaper.
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
I am sure everybody will have a use case that's different.
With e5 v4 today essentially use 3 configs;
- 4-core x 2, 768gb ram (oracle)
- 14-core x 2, 512gb ram (esx, mongodb, Hadoop, VDI, etc)
- 8-core x 2, 512gb ram (sql server)


The 4 and 8 core systems being frequency optimized and optimized for our licensing.
I don't actually see how EYPC really fits for us yet, we never do far have maxed pcie lanes.
Will see soon when we evaluate the next generation systems.
 

gigatexal

I'm here to learn
Nov 25, 2012
2,913
607
113
Portland, Oregon
alexandarnarayan.com
I am still a bit intellectually torn on this one. Sure you have a $400 CPU but 8x low capacity 16GB DIMMs will cost 2x what the CPU will cost. The server barebones are probably $1600-2000 and up. Storage is at least $400 and can go way up. If you have enough devices for 96 or 112 PCIe lanes the difference between $400 and $800 is likely mid to low single digit percentage. At $800 you move to dual Xeon Silver realm with more cores, more memory channels, and etc.

If you do not need that much expansion capability, then it is likely easier to go to a single CPU and one NUMA node.

The 16 core parts make some sense to me. The higher-end parts certainly make sense.

At the very low end (think 2U 12x SATA and 10GbE), the Xeon Bronze is cheaper.
Highly clocked threadripper with ECC then, problem solved.
 

browned

New Member
Oct 5, 2016
8
0
1
47
We can make use of EPYC in our small VM environments, hyper-convergence setups. Getting enough NIC's and storage controllers into a 1U server rack has been proving difficult recently but I suspect EPYC systems will have the connectivity we need without having to add a second CPU. I am estimating more cores and some savings but we will have to wait and see what the vendors bring to market before we get to excited about it all.