Dual XEON 2696 v4 Workstation Build Log & Benchmarks

traderjay

Active Member
Mar 24, 2017
195
41
28
35
UPDATE: ALL PROBLEM RESOLVED WOOOOHOOOOO!

Almost gave up and decided to plea for help online and an ex-lieutenant from the US Navy nuclear forces on anandtech saved my sorry butt. On a late Friday evening, he looked every every screenshot of my bios settings and told me to change two obscure XEON specific power saving features and it magically cleared up ALL my problems.

The following two are disabled on the BIOS:

- Disable C6 State Reporting
- Disable Spread Spectrum


My i7-3970x has served me well for the past 5 years and with the advent of 4K videos, better optimized multi-core applications, I've finally made the decision to upgrade my system. My new build consists of the following:

CPU - Dual Xeon E5 2696V4
RAM - 64 GB Crucial DDR4 PC-2666 (8x8GB)
Storage: 2X800GB HGST SAS SSD RAID 0 for application/scratch disk
OS - Samsung 1TB 960 Pro NVME
GPU - ASUS Strix GTX 1080 ti
Case - Lian-Li PC-V2120x
Motherboard - z10pe-d16 ws
PSU - Seasonic Titanium 1000W

This system will also be used to benchmark various machine vision applications such as 3D point cloud processing used in machine vision inspection systems for quality control. (ScanXtream)

My other workstation is a single E5 2696V3 and Vegas 15 and Adobe Media Encore can take advantage of all physical cores when encoding videos and I can't wait to see the additional improvement from my new workstation. If you guys have good benchmark ideas let me know and I'll keep this thread up-to-date.





Handbrake Benchmarks:
Benchmark your computer with Handbrake 1.01 and x265

Workstation 1:
CPU: Dual XEON 2696 v4
RAM: 64 GB Crucial DDR4 ECC PC2133 (8x8GB)
OS: Win10 Pro will Fall update
SSD: Samsung 850 pro 512GB

4K footage to 1080P H.265
encoded 1497 frames in 87.29s (17.15 fps), 4036.02 kb/s, Avg QP:26.24

4K to 4K Roku 2160p30 4K H.265
encoded 1497 frames in 169.78s (8.82 fps), 6897.08 kb/s, Avg QP:27.59


Workstation 1 (PC2400 RAM SPEED):
CPU: Dual XEON 2696 v4
RAM: 64 GB Crucial DDR4 ECC PC2400 (8x8GB)
OS: Win10 Pro will Fall update
SSD: Samsung 850 pro 512GB

4K footage to 1080P H.265
encoded 1497 frames in 86.65s (17.28 fps), 4036.02 kb/s, Avg QP:26.24

4K to 4K Roku 2160p30 4K H.265
encoded 1497 frames in 167.95s (8.91 fps), 6897.08 kb/s, Avg QP:27.59



Workstation 2:
CPU: XEON E5-2696 V3
RAM: 32GB DDR4 ECC PC2133 Crucial CL19
OS: Win 10 Pro with fall Creators Update

4K footage to 1080P H.265
encoded 1497 frames in 101.27s (14.78 fps), 4036.02 kb/s, Avg QP:26.24

4K to 4K Roku 2160p30 4K H.265
encoded 1497 frames in 211.79s (7.07 fps), 6897.08 kb/s, Avg QP:27.59

CINEBENCH & Adobe Media Encoder CC 2018:

 
Last edited:

Kneelbeforezod

Active Member
Sep 4, 2015
528
121
43
42
How does Handbrake handle all the cores? Once All the parts arrive I'll be setting up dual 2696V3s. My main usage is handbrake.
 

ServerSemi

Member
Jan 12, 2017
110
21
18
37
Do you have adobe premiere pro cc 2017? If you do please export a 1080p movie of about 5-10 minutes using the highest quality bitrate in h.264 to see how fast these two monsters cpu perform. Thanks.
 

manubit

New Member
Sep 14, 2017
27
6
3
Handbrake can't use all the cores unfortunately. I tested it with the 2696-v3 (18 core).
x264 or x265? x265 has much better core utilization from my experience on dual 10 cores. Of course it doesn't help if you have to use x264.
 

traderjay

Active Member
Mar 24, 2017
195
41
28
35
x264 or x265? x265 has much better core utilization from my experience on dual 10 cores. Of course it doesn't help if you have to use x264.
Will try X265 - thanks for the tip!

Do you have adobe premiere pro cc 2017? If you do please export a 1080p movie of about 5-10 minutes using the highest quality bitrate in h.264 to see how fast these two monsters cpu perform. Thanks.
Yep and will certainly try that - the CPUs are stuck in custom
 

Kneelbeforezod

Active Member
Sep 4, 2015
528
121
43
42
Handbrake can't use all the cores unfortunately. I tested it with the 2696-v3 (18 core).
Thanks for the info. I use Premiere now only sparingly just because handbraks is so much more convenient for cutting and getting news clips up fast. But i'm sure dual 2696s will be fast than my westmeres in handbrake.
 

Nanotech

Active Member
Aug 1, 2016
595
99
28
40
Not sure why you couldn't just overclock the 3970X and get more performance out of it. Going to a completely new platform involves overhauling the memory, motherboard and the processor. A good amount of applications are still highly single threaded even if they can take advantage of multiple cores and threads.
 

lni

Member
Aug 20, 2017
34
10
8
39
Not sure why you couldn't just overclock the 3970X and get more performance out of it. Going to a completely new platform involves overhauling the memory, motherboard and the processor. A good amount of applications are still highly single threaded even if they can take advantage of multiple cores and threads.
3970X is 6C/12T, dual 2696v4 gives you 44C/88T, not sure how you are going to overclock the 3970X to get anything remotely comparable to that.

I have a 2696v4 system at home, couldn't be happier about it.
 

Nanotech

Active Member
Aug 1, 2016
595
99
28
40
3970X is 6C/12T, dual 2696v4 gives you 44C/88T, not sure how you are going to overclock the 3970X to get anything remotely comparable to that.

I have a 2696v4 system at home, couldn't be happier about it.
I'm well aware of the specifics and details of what both processors are. I never said that overclocking a 3970X will match the multi-threaded performance of a 2696 V4 dual-socket setup. That being said the 3970X is a Sandy Bridge-E processor which is known to have excellent overclocking capability (especially the higher binned parts). Overclocking gives a free performance boost and it's definitely free (provided sufficient cooling) compared to what a complete overhaul does. Besides the 2696 V4 single threaded performance even with it's IPC improvements at 2.2Ghz is quite low compared to other Xeons and the 3970X still beats it in single threaded performance. Also this needs to be pointed out and mentioned but ES/QS and OEM/production Xeons aren't as attractive they are nowadays as they once were. With enough competition and variety Xeons are losing their appeal (especially because the dual and quad socket versions are locked).
 

wildpig1234

Well-Known Member
Aug 22, 2016
1,824
273
83
45
Threadripper is digging into 2nd hand 2011 platform ... but you can atill put together a 2011 thats equal to 1950x for cheaper
 

Aluminum

Active Member
Sep 7, 2012
431
45
28
2696v4 (Broadwell) single threads at 3.7Ghz, not really all that far behind a Sandy-E in the low 4s. Then for real MT code you get an extra 16 cores to work with beyond the 6 sandys.

Mine smokes my 8 core ivy at just about everything meaningful but crusty old programs and games at low resolutions I don't care about. (At 1440 and especially 4k almost always GPU limited)
 

Nanotech

Active Member
Aug 1, 2016
595
99
28
40
2696v4 (Broadwell) single threads at 3.7Ghz, not really all that far behind a Sandy-E in the low 4s.
Except at 3.7Ghz that's the maximum you can run a 2696 V4 on one or two cores. A 3970X can easily surpass low 4Gh'z on an overclock and the average according to reviews was at least 4.5Ghz+ for a 3960X. That's more than a significant gap that the 2696 V4 cannot compensate for even with a bclk overclock on it (which would be very little). Sure the 2696 V4 has 22c/44t but if an application cannot take advantage of more cores beyond a certain amount it will be single threaded performance that determines it in the end. IPC advantage cannot compensate for at least 800mhz in clock speeds between the 3970X overclocked and 2696 V4 locked to 3.7Ghz.
 

Aluminum

Active Member
Sep 7, 2012
431
45
28
"single thread" or whatever main thread is bound in a weakly-MT-aware program. (games)

Running on lots of cores, then we're back to 22 * 2 * 2.2+ >> 6 * 4.whatever.

I suppose in the bizzaro case of "I need to run 5-6 instances but not 1-4 (the bins are not a cliff) of a very ST-bound program but running a dozen or more is mysteriously no help" made up scenario it is better. In the real world where I have both the 22 broadwell smokes the 8 ivy @ 4.5 in literally anything written for MT.
 

lni

Member
Aug 20, 2017
34
10
8
39
Except at 3.7Ghz that's the maximum you can run a 2696 V4 on one or two cores. A 3970X can easily surpass low 4Gh'z on an overclock and the average according to reviews was at least 4.5Ghz+ for a 3960X. That's more than a significant gap that the 2696 V4 cannot compensate for even with a bclk overclock on it (which would be very little). Sure the 2696 V4 has 22c/44t but if an application cannot take advantage of more cores beyond a certain amount it will be single threaded performance that determines it in the end. IPC advantage cannot compensate for at least 800mhz in clock speeds between the 3970X overclocked and 2696 V4 locked to 3.7Ghz.
first of all, in case you really care about single threaded performance, you shouldn't be using those 6 core processors, there are 2/4 core chips with higher IPC/frequency.

there are mostly two types of programs that are still singled threaded in 2017:
1. games
2. legacy code

it is totally fine for people to run/enjoy such programs, I do that on daily basis as I have quite a few legacy programs that I need. however, a bold however, using such software as excuse to question the usefulness of those highly parallel machines is not that cool.

it is also a bit strange to consider 3970 vs 2696v4 as a black or white issue. why people can't have both? I mean 3970x + a reliable mb is $350 on taobao.com or ebay.com, there is nothing to stop you to have a dual 2696v4 as the main workstation then another 3970x sitting there to run those legacy code whenever you want.
 

lni

Member
Aug 20, 2017
34
10
8
39
Threadripper is digging into 2nd hand 2011 platform ... but you can atill put together a 2011 thats equal to 1950x for cheaper
show me a threadripper setup that can -

1. score 5500 in cinebench R15 on stock/air
2. 16 DIMM slots, not picky on RAM
3. $170 brand new mb from first tier brand, e.g. supermicro

2696v4 chips being mentioned here are not 2nd hand, the vast majority of them were slightly damaged on the lid and considered as junk by the manufacturer. they are processors manufactured by Intel in 2017. In case you do want a brand new one without any wear & tear, there are dozens of vendors online that can sell you that for just $40 extra on top of their regular 2696v4 price.

the truth is pretty simple here - a typical threadripper setup gives you half of the performance (cb ~3000) for the same price of a dual 2696v4.

one may further argue that x399 platform is new, AMD can release a more powerful threadripper maybe with 32 cores to make it really a high performance workstation (cb 5000 or over), well, for single socket X399, AMD need to break quite a few physical limits to deliver that. you also need to consider the price - how much AMD is going to charge for a 24/36 cores threadripper? my dual 2696v4 + X10DAI with 128G ECC RAM cost me a total of $3k, you'd probably have to pay $2k to get 128G regular RAM with lots of LED light pollution for threadripper which is known to be quite picky on RAM.

;)
 
Last edited: