New 2011 v3 system with ASUS Z10PE-D16 WS and 2x e5-2686 v3

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
DSCN1809.JPG So I am in the process of putting a new 2011 v3 system with 2x e5-2686 v3. Will post some more pics later. everything still out of case at this time. only have 1 cpu and 32GB for now, only using the onboard video, testing everything to make sure it's all good.

These components will replace the stuffs in one of my 2011 v1 system..

So far using 1 cpu, it's about 4x faster than my i7-2600k at calculating pi using y-cruncher. so with 2x cpu will be 8x faster.... thats about what i expected since my old dual e5-2670 was about 4-5 x faster than the i7-2600k..

also will upgrade my other 2011 system with 2x e5-2696 v2 when cpu gets here soon. it's going to be a busy month...lol..
 
Last edited:

Marsh

Moderator
May 12, 2013
2,645
1,496
113
Which case are you planing to use?

I am also putting together a dual ASUS dual Z10PE-D16/4L mining rig this week.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
if anyone else has e5-2686 v3 set up can you verify the core turbo bin? I am not getting the bin multipliers stated here on Wikipedia:

List of Intel Xeon microprocessors - Wikipedia

I only see up to 3.2ghz as well as only 2.28ghz with all 16 cores working...wondering if it's because my cpu is a lower step ES/QS cpu. thanks.
 

Marsh

Moderator
May 12, 2013
2,645
1,496
113
I have following dual E5 v3 on hand

E5-2683 v3 120w
E5-2666 v3 135w
E5-4650 v3 135w
E5-4610 v3

Most likely , I will use the 135w CPU with the ASUS Z10PE-D16.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
Do you need to drill holes for MB standoff support?
There was one hole that was somehow not already drill in my phantek case. I believe it was a hole located at the midway along the further edge of the MB parallel to the edge with the usb and NIC ports. I just invert the standoff at that hole since it only need to have a support underneath at the edge at that single particular point rather than also needing a screwed down holding.

all the other holes are predrilled correctly so the MB is held in very securely even if theres only one hole that is not screwed down
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
cinebenchr15 dual e5-2686v3 stock.jpg
just added the 2nd cpu.

Cinebench score for dual 2686 v3 is now 3715.... was 1988 for one cpu..

The only problem I am running into is that y cruncher doesn't scale right. 5 billion digit of pi with 1 cpu was around 462 sec, with 2 cpu it's actually 467 s. disabling numa only improve time to 461 s with dual cpu. I am wondering if the fact that I am running single channel memory with one 16gb dimm for each cpu is the cause? I mean cinebench scales correctly so this might be something else.... any thought?
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
Seems like once you get over 32-36 threads there's some real problem with correct scaling with some programs...I am wondering if it's because they are not written/optimized for more than 36 threads . or does it also have to do with the fact that I am currently using single channel with just 1 dimm per cpu11

cinebench scales correctly from 1 cpu to 2. y cruncher time to 5 billion digit with 1 cpu and 2 dimm is actually almost 10 sec faster than 2 cpu with 1 dimm per cpu.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
what's the memory configuration to allow for fastest memory speed in my dual cpu system? 8 dimms for the dual cpu since it's quad channel? running just 1 dimm per cpu right now and the guy that made y cruncher thinks that's why I am not getting much better benchmark...
 

TLN

Active Member
Feb 26, 2016
523
84
28
34
8 RDIMMs is the fastest way to go: 2 CPU x Quad channel mem.
Theoretically you can get away with single-ranked memory (16 sticks), but afaik, this will be about the same performance.
Otherwise, 16 RDIMMS will knock the speed 1 step down (to 1866)
16LRDIMMS will work just fine, but LRDIMM is usually slower.

Not 100% true, take it with grain of salt
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
My plan was to load it up with 8x 16GB RDIMMS it has 2x 16GB right now.

It was very disappointing to me that with dual cpu my 5 billion digit pi calculation actually run 5 -10 sec slower than with single cpu. I don't know why since cinebench scaled up accordingly going from single to dual cpu. so I am wondering if the single memory channel bottle neck is slowing it down since to calculate pi to 5 billion digits I have to use nearly all 32gb rams making the memory use being spanned over the single channel and QPI?
 

TLN

Active Member
Feb 26, 2016
523
84
28
34
My plan was to load it up with 8x 16GB RDIMMS it has 2x 16GB right now.

It was very disappointing to me that with dual cpu my 5 billion digit pi calculation actually run 5 -10 sec slower than with single cpu. I don't know why since cinebench scaled up accordingly going from single to dual cpu. so I am wondering if the single memory channel bottle neck is slowing it down since to calculate pi to 5 billion digits I have to use nearly all 32gb rams making the memory use being spanned over the single channel and QPI?
Well, I have 8x16Gb RDIMMs. What are you using for Pi? SuperPI?
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
Well, I have 8x16Gb RDIMMs. What are you using for Pi? SuperPI?
using y cruncher. It's supposed to be the best one for calculating very large amount of digits. The current run record of any number of digit of pi on any computer is 22 trillion digits using y cruncher

what cpus are you using? if anyone got dual cpu with more than 32 threads see if they can simulate the scaling from one to two cpu.
 

TLN

Active Member
Feb 26, 2016
523
84
28
34
what cpus are you using? if anyone got dual cpu with more than 32 threads see if they can simulate the scaling from one to two cpu.
2x2683v3 qs: 28 cores, 56 threads total.
I'm running ESXi, and my dekstop is virtualized, but I wanna try that app anyway.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,227
478
83
49
2x2683v3 qs: 28 cores, 56 threads total.
I'm running ESXi, and my dekstop is virtualized, but I wanna try that app anyway.
Let me know how fast it goes for you. what is your memory configuration? quad channel for each cpu?

with dual 2683 v3 if it scaled correctly, your time to 5 billion digit should be around 300 sec.
 

TLN

Active Member
Feb 26, 2016
523
84
28
34
Let me know how fast it goes for you. what is your memory configuration? quad channel for each cpu?

with dual 2683 v3 if it scaled correctly, your time to 5 billion digit should be around 300 sec.
Yeah, as I've rescribed above. Two CPU, 8 sticks of memory. Quad channel from each CPU. All that Runs ESXi hypervisor(average load by other VM's). Was running on Win7 VM with 56 cores(all cores available) and 64Gb of memory(I got 128Gb total). I can expect better results on Win10, and on baremetal.

upload_2017-9-12_11-2-45.png

upload_2017-9-12_11-3-59.png

upload_2017-9-12_11-4-30.png

Obviously I can upload results to HWbot, if you want validated results. I can get better results on baremetal for sure.

P.S. What is the result I should look at? PI, Total computation time, or Start to EndWall time? Cause if that's a Pi, I'm at #6 in the world rankings already: y-cruncher - Benchmarks - 5b
 

Attachments

Last edited:
  • Like
Reactions: wildpig1234