IvyBridge Xeon E5-2697v2 experiences

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Andreas

Member
Aug 21, 2012
127
1
18
Hadn't been here for a while.

A week ago I received my two E5-2697v2 CPUs (12c, 24t, 2.7Ghz, 130 watt).
Thought there is some interest sharing experience with these.
When comparing to SB Xeon, I am referring to the E5-2687W I got a year ago.

Packaging:
The 12-core IB Xeons are a bit larger than the E5-2687W chip. No issues in the Asus m7B, the socket is spacy nough to cope with this. Not sure, if the narrow LGA-2011 sockets might have issues with these. recommend to be checked, if you are going this route.

Power:
Running on idle, the IB Xeon consumes more power than the SB Xeon (about 20 watt diff, probably due to larger core count)
Under higher load (stanford folding) the newer chip consumes about 40 watt less than my SB Xeons.

performance:
in the average the 30-35% better than my SB Xeons. Of course dependent on the instruction mix, the ratio varies.

Best case I had observed was 70%. There are most likely 2 possible root causes for this large deviation:
1) heavy dependencies on single instructions which had been significantly improved
2) Superlinear acceleration through the larger L3-cache, moving a bigger chunk of the working dataset up the memory hierarchy.

Compatibility:
works well with Asus Z9PE-D16 (BIOS 5105)
does not start with (my) Asus Z9PE-D8 (BIOS 5103) CPU is listed as compatible with this BIOS version (Support call is open)

cheers,
Andy
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Hadn't been here for a while.

A week ago I received my two E5-2697v2 CPUs (12c, 24t, 2.7Ghz, 130 watt).
Thought there is some interest sharing experience with these.
When comparing to SB Xeon, I am referring to the E5-2687W I got a year ago.

Packaging:
The 12-core IB Xeons are a bit larger than the E5-2687W chip. No issues in the Asus m7B, the socket is spacy nough to cope with this. Not sure, if the narrow LGA-2011 sockets might have issues with these. recommend to be checked, if you are going this route.

Power:
Running on idle, the IB Xeon consumes more power than the SB Xeon (about 20 watt diff, probably due to larger core count)
Under higher load (stanford folding) the newer chip consumes about 40 watt less than my SB Xeons.

performance:
in the average the 30-35% better than my SB Xeons. Of course dependent on the instruction mix, the ratio varies.

Best case I had observed was 70%. There are most likely 2 possible root causes for this large deviation:
1) heavy dependencies on single instructions which had been significantly improved
2) Superlinear acceleration through the larger L3-cache, moving a bigger chunk of the working dataset up the memory hierarchy.

Compatibility:
works well with Asus Z9PE-D16 (BIOS 5105)
does not start with (my) Asus Z9PE-D8 (BIOS 5103) CPU is listed as compatible with this BIOS version (Support call is open)

cheers,
Andy
Andy - great information! Any interest in running Ubuntu benchmarks? :) Would love to get those for the test set to compare with the other processors we have looked at.
 

Andreas

Member
Aug 21, 2012
127
1
18
Patrick,
if they are ready to run - happy to do. If I need to become a specialist before, my time budget might be a barrier.

Let me know,
Andy
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Andy - fairly easy:
1. Install Ubuntu 64-bit server with OpenSSH
2. Here are the commands to use:
Update/ Install required packages

sudo apt-get update && sudo apt-get upgrade -y && sudo apt-get install build-essential libx11-dev libglu-dev hardinfo crafty phoronix-test-suite -y

hardinfo
hardinfo | less | tee hardinfo.txt

Ubuntu UnixBench 5.1.3

wget https://byte-unixbench.googlecode.com/files/UnixBench5.1.3.tgz && tar -zxvf UnixBench5.1.3.tgz && cd UnixBench && time make && wget http://forums.servethehome.com/pjk/fix-limitation.patch && patch Run fix-limitation.patch &&./Run dhry2reg whetstone-double syscall pipe context1 spawn execl shell1 shell8 shell16 && cd ..

c-ray 1.1
wget http://www.futuretech.blinkenlights.nl/depot/c-ray-1.1.tar.gz && tar -zxvf c-ray-1.1.tar.gz && cd c-ray-1.1 && make && cat scene | ./c-ray-mt -t 32 -s 7500x3500 > foo.ppm | tee c-ray1.txt && cat sphfract | ./c-ray-mt -t 32 -s 1920x1200 -r 8 > foo.ppm | tee c-ray2.txt && cd ..

crafty
crafty bench

(need to control X or control C out)
PTS Tests
phoronix-test-suite benchmark pts/stream pts/compress-7zip pts/openssl pts/pybench

Menu key sequence: y n n 5 n

Then it is pretty much just capturing the data points

I do need to get this fully scripted, but the above is basically a few commands to copy/ paste in putty

Argh the forums are parsing the links, if you want to e-mail me I can send you and a spreadsheet template to copy/ paste numbers into. Basically it takes the copy/ paste of 6 commands so you do not need to be an expert.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
BTW Andy - someone is scripting this over the weekend for us. By early next week should be as easy as running a wget for the script then executing the script.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Script delivered. Installing Ubuntu in a VM to give it a try.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
I'd love to see some cache latency benchmarks on these Ivy-Bridge-EP chips (esp. on the 10 and 12 core dies) so we can understand the ring bus structure better. For example, if you have the time, LMBENCH is a good test: lmbench | Free System Administration software downloads at SourceForge.net . Sandra and AIDA also have latency benchmarks for windows ( SiSoftware Zone )
Have a set of specific commands to download and run lmbench how you want? If so, let me know and I can add to the benchmarking script.
 

phoenix

New Member
Oct 2, 2013
3
0
0
Have a set of specific commands to download and run lmbench how you want? If so, let me know and I can add to the benchmarking script.
Below is (don't know how to attach) a script that'll output the results of lat_mem_rd into ./lmbench-3.0-a9/lat_mem_rd_core.txt
It works on my i7-4800MQ laptop with fedora 19 and gcc 4.8.1, but the I'm not sure what the core #'s will be on the 2697 system (especially with HT on). If interested, could see the efficiency of the memory controller with numa on and off from various cores as in http://software.intel.com/en-us/forums/topic/333444 (e.g. numactl --cpunodebind=0 --membind=1 ), which also has comparisons for westmere and sandy bridge-ep. The structure of the triple ring bus and 2 memory controllers might make for more weirdness than we've seen on past intel cpu's.





#!/bin/bash

wget -O lmbench-3.0-a9.tgz http://downloads.sourceforge.net/pr....0-a9/&ts=1380742845&use_mirror=softlayer-dal
sleep 10
tar -xvf lmbench-3.0-a9.tgz
cd lmbench-3.0-a9
cd src
make lmbench
sleep 10
cd ..
for core in 0 3 4 7 8 11
do
echo "core $core: \n" >> lat_mem_rd_core.txt
taskset -c $core bin/x86_64-linux-gnu/lat_mem_rd -t 1024 2>> lat_mem_rd_core.txt
done
 
Last edited:

Andreas

Member
Aug 21, 2012
127
1
18
Patrick,
thanks for packaging it up.
I was travelling in the last few days (US, Portugal, Switzerland, Denmark) but are now back.
I'll report as soon as I have something to report - Please gimme some time, the inbox is quite full :)
Andy
 

Andreas

Member
Aug 21, 2012
127
1
18
Patrick,
I started the script on 2 machines:

1) Dual E5-2697v2
2) Quad E5-4650

How long will this run approx? minutes or hours?

BTW, one quick observation.
I have a power meter connected to the Quad Xeon machine:
The computational density of the benchmark is very low (at least the part I watched). The board has an idle consumption of 135 watt (at the wall).
With the multi-CPU version benchmarks (dhrystone, whetstone) the board consumes 150-155 watt)
Pipe throughput = 160-170 watt

To compare: With FAH cores, the board is usually in the 610 watt region. With Linpack = 650 watt
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Patrick,
I started the script on 2 machines:

1) Dual E5-2697v2
2) Quad E5-4650

How long will this run approx? minutes or hours?
Disk tests are turned off which helps a lot.

My guess is under an hour on that hardware. On a Raspberry Pi it does take many hours though. There are some variables such as the amount of time it takes to download packages which can add a bit of time.

phoenix - will give that one a try. The core thing is a bit of a quandry since this is meant for 1 - 64+ core systems.
 

Andreas

Member
Aug 21, 2012
127
1
18
as it churns along
16x concurrent shell scripts = 300 watt, 87% idle
64x dhrystone 2 using register = 512 watt, 100% load
64x DP whetstone = 490 watt, 100% load
64x system call = 470 watt
64x pipe throughput = 570 watt, 93% sys
64x pipe based context switches = 580 watt, 93% sys
64x process creation = 400 watt, 42% sys, 57% idle
64x Excel throughput = 340 watt, 22% sys, 76% idle
64x shell scripts (1 concurrent) = 390 watt, 6% user, 29% sys, 65% idle
64x shell scripts (8 concurrent) = 390 watt, 7% user, 30% sys, 63% idle
64x shell scripts (16 concurrent) = 390 watt, 7% user, 31% sys, 62% idle
c-ray = 13183 milliseconds
(unable to open book file, book is disabled)
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Not too bad at all! Love the pics of the old 4P. UnixBench is by FAR the longest component of the suite.
 

phoenix

New Member
Oct 2, 2013
3
0
0
phoenix - will give that one a try. The core thing is a bit of a quandry since this is meant for 1 - 64+ core systems.
the core id argument to taskset could depend on the cores detected by the system, for instance via /sys/devices/system/cpu/online
if it'd be easiest to just run it over all cores detected, the '-t' argument to lat_mem_rd could be lowered to 256 to reduce the runtime

Thanks for including it
 
Last edited: