overclocking Xeon E7 8894 v4 (x8) inside Lenovo X3950 X6 - "all core boost" ?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

MichalPL

Active Member
Feb 10, 2019
189
25
28
Since a while I am using HP DL580 G9 DDR4 and G8(+) DDR3 both with 8894 v4 CPU's (replaced from E7 8895 v2), but recently I found nice "old" (~3 years old) server - the X3950 X6 - eight CPU, and bought it.

And like always thinking about overclocking it :)

I have made some testing, mostly using only 4 CPUs (instead of 8) Passmark crash on 8 CPU's and 192 cores, the results are very similar to the HP DL580 G8/G9 (no difference between DDR3 and DDR4), maybe slightly better than HP.

Is it any method to overclock E7 8894 v4 on Lenovo X3950/X3850 X6 platform?
-like overclocking the 1660 v3 (to 5.1GHz with AIO) ;)
-or "all core boost" known from locked E5 when all cores can go as fast as single core boost?
-or overclocking 3200MHz SMI2 memory bus even slightly?

Single core here is 3.4GHz, and I am guessing more is possible
All core is boosting to 2.88GHz, all core SSE 2.69GHz


Any ideas ?

1664904601444.png

1664904701874.png
 
Last edited:

Jelle458

New Member
Oct 4, 2022
26
15
3
I've worked with the HP and Lenovo that you mention here too. HP has the option to run full performance as a power profile, but your Lenovo doesn't really have that option. The highest CPU's I've run on the Lenovo X6 is 8x E7-8880 V3 with 3TB DDR4 memory. I wanted to do exactly as you are doing but Lenovo doesn't offer any option to run full performance.
I guess these OEM brands won't go too much outside Intel's own spec.

But maybe you can trick Windows into activating all core boot? I think the program throttlestop might help you there, but it is designed for mobile.
I've tried it once on desktop and it worked pretty well, but not sure how it works with so many cores.
But maybe it can handle infinite CPU cores, and just make Windows ask for full single core boost on all cores? Could be worth a try.
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
I am doing strange things around C2/C3 and C6 states and P-states, that I don't fully understood and don't want to read full doc/pdf how to drive E7-8894v4 chips ;) (but still it's a lot of fun - doing it in my free time).


Both of them (HP and Lenovo) have kind of Performance mode, but using own naming, but they are not good :/
The Performance mode is not good because it's turning all cores to the maximum speed of the all core boost (in this case showing 2.89GHz)
and it's not boosting to 3.4GHz, what is funny it's also not dropping to 2.69 when using SSE (like intel PDF is telling).

The second mode, that favor performance is not bad because cores are boosting to 3.4GHz, but when all cores used it drops to 2.69GHz (should be 2.88/2.89).

What I am trying to do it to boost all cores @ 3.4 (plus ~15W extra per CPU) at idle and tying to setup C2/C3/C6/P settings to not drop lower (or drop after many many seconds) only when all cores are active ;) This is partly working, and for sure much faster than Performance or favor Performance modes.

But maybe there is another trick, 1 bit in the memory somewhere at windows boot to drive all of them at @3.4 all the time.

1665067345718.png

Just put all "books", and equipped all CPU in 8x DDR4 to enable "octa channel" so now it is: 8x 8x 32GB (4RDx4 LRDIMM), and I don't know why CL is 12 - should be 11 according to table.

Only benchmark that is working now is CPU-Z now :) (Passmark crashes, R23 work but don't support so many threads):
this is result with 3.4GHz at all core but not all the time ;)
1665065498831.png


btw. I bought server equipped with 8x 8880 v3 (I have replaced them by 8894v4), nice CPUs by speed very simiar to 8895v2 - only thing, single core performance is not great, but they are not designed for it.
 

bayleyw

Active Member
Jan 8, 2014
270
94
28
CPU-Z has to be the best scaling I've ever seen out of an octal-socket system - a reference 2699v4 is 9100 so you are basically getting perfect scaling.
 

Jelle458

New Member
Oct 4, 2022
26
15
3
This is some interesting data for sure. Useless to some degree but still very interesting. I never dug this deep into it as I only had a few hours to play with the x3950 X6 I had. It was not my private unit but a production one that I've build from a CTO (Configure-to-order). That meant BIOS had to be stuck but I did play with the settings, but I never got it running at all core single core boost frequency.

The HP one however, the DL580 ran for me pretty nice boosts up to what I think is called PL1. So a short cinebench run would be full power, but a longer prime95 run would go down in clock speeds.

I also see different BIOS versions behaving differently. For Lenovo you can use BoMC to get the latest, but older versions might be hard to find.

For HP I have some Service Pack for ProLiant, each pack has a different BIOS.

You could always try to upgrade the BIOS, if it isn't the newest, see if you can some better frequencies. Although in my experience the older BIOS are usually the fastest ones. If you can find a launch BIOS some of them just out of the box run full turbo all the time. Of course I didn't try that on x3950 X6 but when x3650 M5 came out this was the case. The SR650/630 also ran full turbo with the launch BIOS.

It's crazy why they won't just unlock the performance, we all know it's there, and Intel won't care if they go over spec, clearly since some launch BIOS just goes full performance.

It just shows what I always say at work; there should be one BIG RED button that says "I KNOW WHAT I AM DOING" and stuff should just... work.
 
  • Like
Reactions: chrgrose

MichalPL

Active Member
Feb 10, 2019
189
25
28
But maybe you can trick Windows into activating all core boot? I think the program throttlestop might help you there, but it is designed for mobile.
Some of the features works, but not able to setup all cores @ 3.4GHz, still same performance (single core worser a bit)

1665129827925.png
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
OK, summary - I done max what I was able to do, results are not bad, but it's not 3.4GHz all core all the time.


for compare AMD 5995WX:

CPU Test Suite Average Results for AMD Ryzen Threadripper PRO 5995WX
Integer Math624,537 MOps/Sec
Floating Point Math339,043 MOps/Sec
Find Prime Numbers727 Million Primes/Sec
Random String Sorting193 Thousand Strings/Sec
Data Encryption130,324 MBytes/Sec
Data Compression1,934 MBytes/Sec
Physics6,705 Frames/Sec
Extended Instructions121,881 Million Matrices/Sec
Single Thread3,265 MOps/Sec

And fastest tested dual Epyc:
CPU Test Suite Average Results for [Dual CPU] AMD EPYC 7763
Integer Math1,067,291 MOps/Sec
Floating Point Math599,643 MOps/Sec
Find Prime Numbers1,409 Million Primes/Sec
Random String Sorting392 Thousand Strings/Sec
Data Encryption252,888 MBytes/Sec
Data Compression3,389 MBytes/Sec
Physics15,874 Frames/Sec
Extended Instructions199,438 Million Matrices/Sec
Single Thread2,501 MOps/Sec


1666467544005.png


not bad - in important tests faster than dual Epyc :) (unfortunately not in single core - the most important one :/)

Platform:
X3950 X6 ~1700EUR
8x 8894v4 CPU ~900EUR
2TB DDR4 (64 x 32GB) ~3500EUR
NVMe: not decided yet, probably 10..12x Samsung 970 Evo Plus (PCIe 3.0) to achieve ~30GB/s R/W on SW raid.
Graphics: probably RTX 3090 24GB or 3060 12GB


BIOS setup:
1666468459345.png

1666468695650.png

1666468729619.png

Mem power management can be disabled - system will be slightly (maybe 0.5%) faster but power consumption will go up ~500W
 
Last edited:

chrgrose

Active Member
Jul 18, 2018
104
50
28
That's an amazing machine! A little odd that even Cinebench R23 can't handle 384 threads (that may mean that it wont be able to be run on Genoa 96 core! Unless it's a 8 socket issue...). I would probably buy one myself to play with if I had somewhere to 'put it'. How is power consumption on that thing at idle vs. full bore? I'd be afraid to blow a breaker in my old house. Does it sound like a jet engine per usual for servers of this class?

Also, where did you get 8x 8894 v4 for only 900 euro? Right now even the 8890v4's are about $160 each.
 
Last edited:

MichalPL

Active Member
Feb 10, 2019
189
25
28
Yes it's really good, better (but also similar) than HP DL580 G8/G9.

Cinebench R23 can't handle 384
It can't - it's still working and not crashing and doing some tasks between rendering on all threads but if I remember correctly the results it was:
on 96 cores (4 cpu installed): around 62 000 points
on 192 cores (8 cpu) around 57 000 points. (working on half of it, and strange association to the cores that use HT in the not proper way)

(that may mean that it wont be able to be run on Genoa 96 core! Unless it's a 8 socket issue...)
I think single CPU Genoa 96 will run properly, dual not.
The new passmark will run properly.

I would probably buy one myself to play with if I had somewhere to 'put it'.
Big and super heavy,
power consumption at idle depend strongly of the memory config - I didn't measure it yet but should be around 550W at idle (with 2TB RAM) and around 1800W at full.

I don't have plans to install things like this at home ;) It is for the company use done in "free time" as a "hobby project" to save some money (and having fun) that you can spend later to buy 30x 13900kf / 7950X ;) or 25x RTX 4080 or 3090.

At home (no time to finish it): super old NAS server based on 2x E5 v2 4GHz + 100GbE and 25 optical fibers OM4 and Celestica DX010 :)
Bought 10KWpeak "solar roof" half year ago (to not pay any single euro anymore to putin for the natural gas), so can spend some of the power budget to the servers - but not 192 cores @ 14nm and terabytes of DDR4 memory ;)

Does it sound like a jet engine per usual for servers of this class?
No.

Actually you can use it as a workstation at size of xerox alto ;)

Similar noise to Celestica DX010 (after boot)
quieter than HP DL580 G8/G9 in quiet mode a lot (i don't know ~8 times)
Quieter than Cisco 3064PQ or Cisco 3164Q.

Similar to not good RTX 3090 with single fan turbine at full load.

Also, where did you get 8x 8894 v4 for only 900 euro?
Server here:

Processors here:

It's local (in my country) company with used servers and servers parts (even don't have English webpage version - need google translate), but it was around 0.5 year ago and I bought 12 of them at once (to upgrade HP DL580 G8/G9) and back then they had plenty of them (unwanted CPU that nobody need ;) ) so I text them - they make amazing price so bought them. Now I saw they have just 8 - maybe $140...$160 will be possible, they listed them at around $305 net = not cheap anymore :/

RAM at random places with used servers parts.
 

chrgrose

Active Member
Jul 18, 2018
104
50
28
Big thanks for the info! Everything about it sounds impressive except for the idle power draw, which I doubt will be decreased much by changing the memory configuration. How many modules do you have installed? The Lenovo manual says that the idle power is 290 Watts at 'minimum 8-socket configuration', but perhaps that is with some low power chips, 1 dimm/socket and basically no storage.

Do you know if there is a significant benefit to the x3950 compared to linking two x3850 together? I've owned R920's in the past and was looking into R930's when I stumbled across the Lenovo machines and was super intrigued by the ability to make an 8 socket Xeon v4 configuration without breaking the bank. A relatively quiet machine is also a big plus!

Is there an option to put the machine to 'sleep' for super lower power draw, and wake it up without it going through a 15 minute boot cycle? The manual seems to say no.
 

chrgrose

Active Member
Jul 18, 2018
104
50
28
Actually I think I made a mistake: you can't just link x3850 x6 units together! This was a capability of the x3850 x5 units. The 8 socket configuration is done through the midplane in the x6 models and so you need to get the full 8U x3950 unit. oh well.
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
Still didn't have enough time to measure how much power it consumes from the wall - but I log in into stats and having quite precise info:

1669573660844.png

This is lower (main) node.
whole server has 2 nodes (in 1 chassis) each node 4 CPUs: upper and lower (they are almost identical - and each node looks like X3850 X6)

So second node have almost same power usage.

The screenshot is done when Windows Server 2022 is running and no tasks.

So the power consumption when idle is like this:
CPU: 113W *2 = 226W
Memory: 186W *2 = 372W

My CPUs is at idle but "overclocked" (UEFI settings that I copy paste few posts up) to 3.4GHz (and doing 3.4GHz almost all the time all cores), without it each CPU consume ~15W less per CPU.

Memory is in the mode:
"Memory Power Management: Automatic"
when disable any power saving on the memory, the system is about 0.5% faster, but consume ~400W more.


Let's do the math:
CPU at idle: 226-8*15=~105W I think this is the minimum when using 8x 8894v4

Memory:
372W / 64 Dimms = 5.8W per single DDR4 dimm

so yes you can use less of them, or smaller (this is 32GB variant) with less chips.

so in theory:

CPUs 105W + say... 150W for memory + 60W for fans and other things = 270W will be reasonable minimum for the whole system
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
The Lenovo manual says that the idle power is 290 Watts at 'minimum 8-socket configuration', but perhaps that is with some low power chips, 1 dimm/socket and basically no storage.
Sounds possible with 8894v4.
The thing is more channels the better, so 8 per CPU is perfect, 4 is still good, 2 is wasting the potential.

compared to linking two x3850 together?
I don't think so it's possible -

Is there an option to put the machine to 'sleep' for super lower power draw, and wake it up without it going through a 15 minute boot cycle?
I don't see "sleep" visible under windows by default, but I didn't dig deeper.

less ram - faster boot
If I remember correctly with 256GB it was about 4-5 minutes.

EDIT: my first tweaks was done on 4 cpus instead of 8 and minimal amount of ram so the whole cycle was about 3 min (If remember correctly). removing "books" (cassettes with CPU and memory) is so easy that you can remove 4 of them in 40 seconds.
 
Last edited:

chrgrose

Active Member
Jul 18, 2018
104
50
28
Did you ever configure the system with M.2 NVME SSD's? If so, how did you go about it? Any issues putting the OS on it? I recently purchased one to build out (so far only the x3950 x6 system, DDR4 compute books and RAM in the mail, still deciding on CPU's) and am finding that SAS 2.5" HDD's/SSD's just don't compete very well in GB/$.
 
Last edited:

MichalPL

Active Member
Feb 10, 2019
189
25
28
Did you ever configure the system with M.2 NVME SSD's? If so, how did you go about it? Any issues putting the OS on it? I recently purchased one to build out (so far only the x3950 x6 system, DDR4 compute books and RAM in the mail, still deciding on CPU's) and am finding that SAS 2.5" HDD's/SSD's just don't compete very well in GB/$.
Yes, it's on the NVMe. 1 for system (970 Evo +) , and 12 for data (older 970 Evo not plus), one Mellanox 2x100GbE, one Geforce 1050.
It's almost like a standard PC.

1673594370974.png

I have 2 "books" for SATA 2.5" drives (not using it, but tested and seems to be quite fast, ~7GB/s to cache per "book", 8x SSD probably can do 4.4GB/s, SAS I don't know but maybe 7GB/s - but Samsung 970 Evo + better and faster ;) ), so instead of replacing "sata books" I just installed NVMe's via $3 riser card (NVMe to PCIe, 4 lanes), and read manual how QPI links are connected here to the CPUs to balance drives between them (can lose ~8GB/s).

In previous server (HP DL580 G9) I was using riser card (~$80) with PCIe 3.0 switch chip (x16 to x4x4) because HP and Lenovo don't support PCIe bifurcation - but this card was too long (3mm, 0.1in to long...) to fit Lenovo so decided to put 13 drives in separately risers without any logic.

So yes, NVMe is supported - exactly like on X299/AM4/AM5/TR boards, but no PCIe bifurcation, evry nvme connected to the PCie is visible from the UEFI and visible in boot section when OS installed. (Yes you can also install OS from USB, and NVMe drives will be visible)

Best NVMe are 970 Evo + for PCIe 3.0 boards, unfortunatelly they are up to 2TB.
980 - slower
980 Pro - slower when connected to the PCIe 3.0 than 970 Evo and 970 Evo + (of course much faster on 4.0 ~6.8/5.5)
990 Pro - not tesed on PCIe 3.0 (on 4.0/5.0 it's ~7/6.5).

winners:
Samsung 970 Evo + (best)
Samsung PM891a (cheaper)
Samsung 970 Evo (slower and cheaper)
Samsung PM891 (cheapest)
other brands I don't now ;)

Risers:

This ($3) is always working fine (PCIe 3.0 and 4.0), remember when inserting into x8/x16 slot you are just using x4 speed, wasting the full potential of the port (but Lenovo have lot of them ;) )
1673594945572.png

this ($20) is working good when your motherboard suuport PCIe bifurcation (almost any board for AMD AM4/AM5 and Treadrippers, better boards for LGA2066, better boards for 2011/2011-3), not good for HP and Lenovo (system will see just 1st NVMe drive).
1673595085632.png

(~$80) PLX8747 PCIe 3.0, x16 to x4x4, works but too long for Lenovo (probably other models works) but adding minimal latency too:

with this one you can use any x16 port as x16 port, and use all drives at full speed all the time so ~14.5GB/s per x16 port.
1673595250284.png
 
Last edited:
  • Like
Reactions: wulfy23

chrgrose

Active Member
Jul 18, 2018
104
50
28
Thanks for the info. I think I will try using one or two riser cards for 1xM.2, as I don't need anything too crazy. Did you figure out the best slot to use for them? I'll also probably populate the front with some 12Gbs SAS HDD's.

I was thinking about something else lately also. You mentioned, I think, that you first got the system with DDR3 compute books and upgraded to DDR4 books for the v4 Xeons. Did you ever try using the v4's in the DDR3 books? v4 CPU's appear to not be officially supported in the Lenovo manual, but I noticed that the official specifications for the CPU's themselves DO support DDR3:


It might be really amazing if you could just slot v4's in the DDR3 books, since both the books and DDR3 ram are so much cheaper.
 
Last edited: