Scalable Xeons: AMD sure scared Intel into pulling out the big guns

Discussion in 'Processors and Motherboards' started by gigatexal, Jul 11, 2017.

  1. gigatexal

    gigatexal I'm here to learn

    Joined:
    Nov 25, 2012
    Messages:
    2,683
    Likes Received:
    498
    Competition sure is awesome. I am still reading up on all the great content that STH has thrown out and still going through the Anandtech article with the EPYC benchmarks vs. the new Skylake-xeons but a blurb in that article hit home at least for my work use-case: EPYC is best suited for high density VMs but not data synchronization tasks as one would see in a Database context which sucks since as a DBA I'd be cool to have a reason to get some EPYC systems in house.

    Re the title of this thread: the omni-path 100Gbps networking per socket (so four sockets means a 400GBPs connection) just struck me as part of a multi-punch combo from Intel where AMD just tweaked Zen cores and threw 4 of them on silicon. Still they have compelling hardware, both sides do.
     
    #1
    Son of Homer and eva2000 like this.
  2. Nanotech

    Nanotech Active Member

    Joined:
    Aug 1, 2016
    Messages:
    595
    Likes Received:
    99
    That's the same argument I saw from Intel (including their ridiculous claim that all AMD did was glue together when Intel in the past did that with the C2Q for example) but as was posted on the internet and on reddit I will quote:

    Intel Says AMD EPYC Processors "Glued-together" in Official Slide Deck
     
    #2
    Son of Homer likes this.
  3. littleredwagen

    littleredwagen New Member

    Joined:
    Dec 8, 2016
    Messages:
    11
    Likes Received:
    2
    Intel isn't wrong. When it "glued" them together all communication if I remember had to go through either the memory or chipset. AMD's approach is why Memory Speed is so critical with Zen based CPUs since the "infinity fabric" is the system ram (simplified I know). NUMA aware software isn't is the answer since there is one or two sockets but there can be eight CCXs. Basically it is what they did, they took Zen based CCX dies and stuck more on one package. Delid Threadripper, or epyc you will see.

    Epyc with IHS

    Imgur: The most awesome images on the Internet
     
    #3
  4. Nanotech

    Nanotech Active Member

    Joined:
    Aug 1, 2016
    Messages:
    595
    Likes Received:
    99
    I'm aware of what it looks like delidded and what the die shots are like. I criticized Intel because they are claiming it is inferior both in marketing and in their comparison (which is what I pointed out and also provided a link). If you recall Intel has in the past used a similar approach with the C2Q processors for example. Doesn't mean that AMD's approach is inferior or makeshift particularly since the architecture is successful. You can refer to this:

    Intel Says AMD EPYC Processors "Glued-together" in Official Slide Deck
     
    #4
    Stux likes this.
  5. markarr

    markarr Active Member

    Joined:
    Oct 31, 2013
    Messages:
    391
    Likes Received:
    101
    No one was really disputing Intel was "wrong" just kinda hypocritical since they have done it in the past. There are going to be workloads that do not perform as well on this type architecture OLTP is most likely going to be one, but there are going to be workloads that going to excel with the extra cores and threads. NUMA aware software is the key though as it is the smarts that will allow the multi threads without having go between CCX's as much, as ryzen has shown that the linux already sees each CCX as a numa node.
     
    #5
  6. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,543
    Likes Received:
    4,467
    Just as a heads-up, I was present when Intel presented the deck in question. When they did present, we already had Platinum 8180's and EPYC 7601's so I had a different experience seeing that deck. In fact, we had EPYC before Intel (at least officially.)

    The deck is not necessarily wrong. I did not think it was fair to share it.

    There are a few considerations where it actually misses what could have been the most damning parts of EPYC performance.

    Intel is still building a very high platform on a single piece of silicon. AMD realized they cannot compete going that route at this juncture so they are instead using an easier manufacturing process.

    If you have an application like web hosting where you simply need lots of RAM capacity and decently fast cores, EPYC is a good choice.

    The absolute killer here is the single socket configuration with all of the PCIe lanes. For storage servers, that is going to be the endgame in 12 months. In our next round of EPYC coverage next week, we are going to go into why it is good, but not necessarily a slam dunk.
     
    #6
    Stux, eva2000 and Pepe like this.
  7. PigLover

    PigLover Moderator

    Joined:
    Jan 26, 2011
    Messages:
    2,771
    Likes Received:
    1,114
    The only thing my team is considering EPYC for is for single socket, high PCIe configurations - though not storage server necessarily but really high throughput networking apps.

    Sent from my SM-G950U using Tapatalk
     
    #7
    gigatexal likes this.
  8. gigatexal

    gigatexal I'm here to learn

    Joined:
    Nov 25, 2012
    Messages:
    2,683
    Likes Received:
    498
    In hat case if threadkiller supports ECC that might open up opportunities since it will be higher clocked.
     
    #8
  9. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,543
    Likes Received:
    4,467
    Threadripper supports 128GB LRDIMMs actually. 1TB/ socket if you wanted to spend $35K for RAM.
     
    #9
    Stux, eva2000, Evan and 1 other person like this.
  10. Evan

    Evan Well-Known Member

    Joined:
    Jan 6, 2016
    Messages:
    2,841
    Likes Received:
    423
    64g/128g lrdimm is still a massive premium over 32g rdimm unfortunately.
     
    #10
    gigatexal, eva2000 and Patrick like this.
  11. Nanotech

    Nanotech Active Member

    Joined:
    Aug 1, 2016
    Messages:
    595
    Likes Received:
    99
    Patrick it's not about not being able to compete rather more of a focus on a modular design. The way IF works and CCX's are designed for Ryzen they can scale up and down accordingly. This also means that while Intel has monolithic processor dies and has to design different dies this comes at the drawback of yields. Whereas with AMD yields are over 80% and as far as I remember the BOM means a good margin can be extracted for TR and EPYC.
     
    #11
  12. ATS

    ATS Member

    Joined:
    Mar 9, 2015
    Messages:
    96
    Likes Received:
    32
    People keep posting that asspull number. No one that can talk has any idea what the yields are and those that know can't talk. Also the idea that large dies have poor yields is intrinsically false, fyi.
     
    #11
  13. Davewolfs

    Davewolfs Active Member

    Joined:
    Aug 6, 2015
    Messages:
    329
    Likes Received:
    30
    I think AMD will do well with TR. Good price fast speeds and ECC support. I don't see Intel having anything that is similar.
     
    #12
  14. zir_blazer

    zir_blazer Active Member

    Joined:
    Dec 5, 2016
    Messages:
    216
    Likes Received:
    64
    I think that the "glued" argument is a desperation attack. Remember back when AMD did the same with Barcelona, praising their monolithic Quad Core design while C2Q Kentsfield still kicked it where it hurts?
    Moreover, AMD has quite an overengineered fabric there. Kentsfield was a MCM that had two independent Buses to the Chipset, as if it was a Dual Xeon of the era, so if one Core wanted to talk to the other, it had to go to the Chipset and back. AMD instead has on-package interdie communications. Looks like a rather smartly designed MCM to me, more so that Magny Cours Opterons that did the same with HyperTransport instead of Infinity Fabric.
    The thing that leaves me something to be desired is that as far that I know, NUMA Nodes includes CPUs and RAM, but the concept should scale to include the integrated PCIe Controllers since 32 lanes are local to a single die only.

    Besides, this generation Intel screwed up badly with their Xeon lineup. So far, it seems that there are some task where Skylake-E can mop the floor with both Epyc and its predecessors (Those that can use AVX 512), but for a whole bunch of things, is the first time in more than half a decade that Intel has serious competence, and instead of significantly increase price-performance like they did with the HEDT segment in response to Ryzen, Intel decided to increase their market segmentation strategy and milk more money out of their Xeon customers. If this was done a year ago people would have had to suck it up, but now you have a solid alternative...
     
    #13
    Stux likes this.
  15. Davewolfs

    Davewolfs Active Member

    Joined:
    Aug 6, 2015
    Messages:
    329
    Likes Received:
    30
    I'm curious to see how AVX512 compares to Cuda on something like TensorFlow.
     
    #14
  16. mbello

    mbello New Member

    Joined:
    Nov 14, 2016
    Messages:
    17
    Likes Received:
    7
    The DB benchmarks Anandtech ran are being heavily criticized, they could have favoured Intel's bigger caches. I do not think you can draw any conclusion about Epyc vs Xeon for database applications based on Anandtech's work.
    Hopefully STH will come up with a better comparison soon.
     
    #15
    eva2000 likes this.
  17. Evan

    Evan Well-Known Member

    Joined:
    Jan 6, 2016
    Messages:
    2,841
    Likes Received:
    423
    Depends a lot on the DB and it's usage and licensing, AMD is support massive memory that will be great for in memory DB.
    If your buying per core license for oracle then you can't go past the frequency optimized parts from Intel normally but of course it remains to be seen how this plays out in the next 6 months or so I guess , we will be doing our internal testing soon would be oracle , mongodb, sqlserver mostly, the SAP HANA lives on IBM power and makes most sense there now on power 8 and in future on power 9 so won't be looking at that.
     
    #16
  18. Datamax

    Datamax New Member

    Joined:
    Jul 25, 2017
    Messages:
    6
    Likes Received:
    2
    Any DB benchmark unless super carefully constructed with insane variety is not going to really be a relevant benchmark to any real world application - at all. There are just too many different DBs, with different architectures, different levels of optimizations, different data structures etc. A good optimization can make or kill a database - and in both cases relies in completely different subset of performance.

    At the end of the day, what matters, in sane businesses, is Capability vs TCO over whole lifetime. With sane business i mean those who actually need to compete on the market, instead of having somekind of monopoly or infinite pockets (govts, banks)

    In many things it's the I/O these days. Just for kicks i did quick back of the napkin maths, and single dual socket EPyC system might be able replace 2 racks of servers for us, if somehow magically you could utilize all those PCI-E lanes to connect to storage. Afaik there is not yet means to do that tho. In this case it is hundreds of HDDs, with few dozen SSDs thrown in. Those servers are essentially high capacity file servers / content distribution servers. Compute is almost irrelevant in that use, it's all about that I/O.

    Hell, you could even argue our system is just a big big database as well.
     
    #17
    gigatexal likes this.
  19. Patriot

    Patriot Moderator

    Joined:
    Apr 18, 2011
    Messages:
    1,278
    Likes Received:
    658
    That is what makes the $400 8c Epyc chip soo so crazy... still 128pcie lanes (112 usable) and still 2tb memory support.

    If you don't need compute.... you still get the I/O and ram support.
     
    #18
    Stux and gigatexal like this.
  20. mbello

    mbello New Member

    Joined:
    Nov 14, 2016
    Messages:
    17
    Likes Received:
    7
    For proprietary databases the benchmark will be interesting out of curiosity but of little practical value because noone believes AMD will have a chance vs. the frequency optimized parts from Intel.

    I think it is more interesting to see how the open source databases (MySQL and Postgres) and NoSQL databases perform on each processor.
     
    #19
  21. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,543
    Likes Received:
    4,467
    I am still a bit intellectually torn on this one. Sure you have a $400 CPU but 8x low capacity 16GB DIMMs will cost 2x what the CPU will cost. The server barebones are probably $1600-2000 and up. Storage is at least $400 and can go way up. If you have enough devices for 96 or 112 PCIe lanes the difference between $400 and $800 is likely mid to low single digit percentage. At $800 you move to dual Xeon Silver realm with more cores, more memory channels, and etc.

    If you do not need that much expansion capability, then it is likely easier to go to a single CPU and one NUMA node.

    The 16 core parts make some sense to me. The higher-end parts certainly make sense.

    At the very low end (think 2U 12x SATA and 10GbE), the Xeon Bronze is cheaper.
     
    #20
Similar Threads: Scalable Xeons
Forum Title Date
Processors and Motherboards Can you have different RAM capacity on different Xeon Scalable Sockets? Oct 14, 2019
Processors and Motherboards Upgrade from E5 v2: Ryzen, Epyc or Scalable? Oct 14, 2019
Processors and Motherboards 2nd Generation Intel Xeon Scalable and Optane DCPMM Launch Apr 2, 2019
Processors and Motherboards Intel Scalable single socket board with 12 dimms ? Apr 1, 2019
Processors and Motherboards E5-2699 V4 VS Scalable/Epyc Sep 10, 2018

Share This Page