New 2011 v3 system with ASUS Z10PE-D16 WS and 2x e5-2686 v3

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

TLN

Active Member
Feb 26, 2016
523
84
28
34
What is the result I should be looking at? PI Calculation? I.e. my result is 220?
 

Marsh

Moderator
May 12, 2013
2,644
1,496
113
Wow, :cool:
I lookup the price of each retail E5-4669 v4 is $5K, my down payment for my first house was $2.5K.
 

Patriot

Moderator
Apr 18, 2011
1,450
789
113
Wow, :cool:
I lookup the price of each retail E5-4669 v4 is $5K, my down payment for my first house was $2.5K.
The ebayed 4699s are often less than 2699s and they are 135w chips instead of 145w.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,198
443
83
49
What is the result I should be looking at? PI Calculation? I.e. my result is 220?
You should submit the result so you can be on that page! your benchmark fits right on with the other two 2683 v3 on there! looks like it narrowly beat the dual epyc 7601!


ooou oou can I play?
:) sure. download the y cruncher and crunch away... You should get some impressive time with that 44C system!
 

TLN

Active Member
Feb 26, 2016
523
84
28
34
You should submit the result so you can be on that page! your benchmark fits right on with the other two 2683 v3 on there! looks like it narrowly beat the dual epyc 7601!
I'll be rebuilding my rig soon, and will try that test on bare metal. Today I was running with my Desktop VM in the background (and 10 other VMs). I can estimate 10-15% performance increase.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,198
443
83
49
TLN, if you got a chance and don't mind helping me out, much appreciated if you can test out the dual cpu with only 32GB ram or only 1 dimm per cpu and see what result you get for 5 billions digits. I am wondering if it's b/c I am running single channel that is limiting my time. thanks.
 

TLN

Active Member
Feb 26, 2016
523
84
28
34
TLN, if you got a chance and don't mind helping me out, much appreciated if you can test out the dual cpu with only 32GB ram or only 1 dimm per cpu and see what result you get for 5 billions digits. I am wondering if it's b/c I am running single channel that is limiting my time. thanks.
If you can wait a little bit - I can do that for sure. I'm getting new case tomorrow (according to UPS). Will start moving things into new enclosure later this week. During that time I wanna do a clean install of Windows and test some equipment that I have here. I'll be able to remove memory and run that test for you.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,198
443
83
49
thanks a lot. i won't get back til the end of the week to be able to run this test on my dual e5-2670 but i figure it would be more realistic to run this on a v3 system. still really bothered on why my time to 5 billion digits is so slow with the dual cpu.
 

biorpg

New Member
May 21, 2017
7
3
3
38
Thanks for the info on y-cruncher, I had not used it before. Here are the results I got on my similar system:

Z10PE-D16 WS
2x Xeon E5-2689 v4
128GB(8x16GB) DDR4 2133MHz
2x Geforce 1070 GTX

These benchmarks were performed with a bclk of 103.5, NUMA mode memory(4 nodes), Cluster-on-Die QPI, and SLI disabled.

Single-core benchmarks included to highlight the E5-2689 v4's outstanding 3.7GHz all-core turbo. (3.83Ghz with 103.5 bclk) The Pi-5b benchmark is the multi-core run, and the Pi-500m benchmark is with a single core.
The 3694 Cinebench score shown above the current (bright orange) score was achieved with SLI enabled.(This is the only differing factor that I can think of.)


 
Last edited:
  • Like
Reactions: wildpig1234

biorpg

New Member
May 21, 2017
7
3
3
38
My plan was to load it up with 8x 16GB RDIMMS it has 2x 16GB right now.

It was very disappointing to me that with dual cpu my 5 billion digit pi calculation actually run 5 -10 sec slower than with single cpu. I don't know why since cinebench scaled up accordingly going from single to dual cpu. so I am wondering if the single memory channel bottle neck is slowing it down since to calculate pi to 5 billion digits I have to use nearly all 32gb rams making the memory use being spanned over the single channel and QPI?
Fujitsu has some excellent documentation for their Primergy servers that is generally applicable to all systems using the same chipset and CPU architecture.
The following table shows the relative performance of using less than 1 DPC (DIMM per Channel) which is the 'minimum' 4-way configuration. You are using 0.25 DPC(1-way), and as you can see, this scales almost linearly with the performance impact:

(https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-broadwell-ep-memory-performance-ww-en.pdf)

In essence, its amazing you are even able to boot :D
 
Last edited:
  • Like
Reactions: wildpig1234

wildpig1234

Well-Known Member
Aug 22, 2016
2,198
443
83
49
So my first y cruncher run was with 1 cpu and 2 dimm for that one cpu and my 2nd run of y cruncher was with 2 cpu but only 1 dimm per cpu. I guess not having all 4 channels really limiting the performance?
 

biorpg

New Member
May 21, 2017
7
3
3
38
Sorry, a couple slight mistakes. That table and document are for Broadwell-EP (E5-2600 v4). And the table shows performance impact of actually setting the memory interleaving to 1-way within the BIOS, while actually populating 1 DPC (8 dimms).
Here is the information for Haswell-EP (E5-2600 v3) along with more of the text explaining the table, and an excerpt from later on in the document that actually mentions your configuration.



Independent Mode configurations
This covers all the configurations that are neither in Performance Mode nor are redundant. There are no
restrictions apart from the "Don't mix" ruling for RDIMMs and LRDIMMs as well as for x4 and x8 RDIMMs.
Special attention is also given to configurations with less than four DIMMs per processor, i.e. less than the
minimum number that is required for Performance Mode configurations. The reason for such configurations
can be energy-saving considerations as well as a low amount of required memory capacity. Savings also
result from a minimization of the number of DIMMs. The quantitative assessment that follows below of how a
configuration of less than four memory channels impacts on system performance suggests the following
recommendations:

With regard to the LCC processor class (low-core count), operation with only one DIMM per
processor (minimum configuration) is not recommended. Operation with two or three DIMMs per
processor can on the other hand lead to balanced results as regards performance and energy
consumption.​

In the HCC (high-core count) and MCC (medium-core count) processor classes, operation with one
or three DIMMs per processor is not recommended. Operation with two DIMMs per processor can on
the other hand lead to balanced results as regards performance and energy consumption.​

The non-recommended configurations mean entire (1 DIMM per processor) or partial (3 DIMMs per
processor for the HCC and MCC processors) 1-way interleaving via the memory channels with the clear
performance disadvantage of up to 30%, as shown below, for the commercial application performance. The
special feature regarding three DIMMs with HCC and LCC processors results from the configuration with two
memory controllers over which three DIMMs cannot be equally distributed.
(https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-haswell-ep-memory-performance-ww-en.pdf)
 
Last edited:

wildpig1234

Well-Known Member
Aug 22, 2016
2,198
443
83
49
with regard to the topic of memory bandwidth and number of channels and performance hit, I am assuming the above would also apply to 2011 v1? I still have to save up for the six 16gb ddr4 dimm which are not cheap at all ;(. but I am wondering if I can simulate this effect on my old 2011-v1 system with dual cpu and 16x 8GB dimm by changing the installed ram configuration accordingly.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,198
443
83
49
So I tested out using various different amount of memory channel on my e5-2670 system:
2.5 billions digits pi using y cruncher which use 13gb ram

1 cpu 2x 8gb (dual channel) : 481 sec
2 cpu 2x 8gb (single channel, 1 dimm/cpu): 478 sec
2 cpu 4x 8gb (dual channel , 2 dimm/cpu): 377 sec
2 cpu 8x 8gb (quad channel, 4 dimm/cpu): 288 sec

so seems like the number of memory channel can have very big impact
 
Last edited:
  • Like
Reactions: TType85

TLN

Active Member
Feb 26, 2016
523
84
28
34
so seems like the number of memory channel can have very big impact
It's not necessary channels, but memory bandwidth. And in that specific application: Easy to calculate, so you need as much mem bandwidth as possible.

Also, desktop memory works at higher speeds: For example DDR4-4800 will give you more throughput then server DDR4-2133, even if you can compare 1CPU to 2CPU. You cannot use i7 in dual-cpu configurations.
 

wildpig1234

Well-Known Member
Aug 22, 2016
2,198
443
83
49
It seems like y cruncher uses the memory bandwidth differently than cinebench. Cinebench scales correctly even though I was only using single channel memory config with the dual 2686v3 but y cruncher was not scalung correctly when you use single channel.... y cruncher needs lots of memory bandwidth