Extremely low RAM performance with dual E5-2620

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

istav555

Member
Apr 14, 2013
46
4
8
Hello everybody,

I'm getting very poor results with RAM-intensive workloads on a dual E5-2620 system I recently installed. Up to now I was using E3s and was very much satisfied. Here's what I get with MaxxMEM on the E5:

Memory-copy: 5352 MByte/s
Memory-Read: 9428 MByte/s
Memory-Write: 4085 MByte/s
Memory-Latency: 135.9ms

Compare that to the E3:

Memory-copy: 17801MByte/s
Memory-Read: 14450 MByte/s
Memory-Write: 14959 MByte/s
Memory-Latency: 91.4ms

I also found another guy having a similar problem with the same hardware: NUMA on Xeon E5-2620 | Intel® Developer Zone

He did not find any solution for it whatsoever...

Any ideas? What should/could I try to fix this?!?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,520
5,828
113
Hi what is the rest of the configuration? Mind also running stream?
 

istav555

Member
Apr 14, 2013
46
4
8
Hi what is the rest of the configuration? Mind also running stream?
Never used stream before, tried it but it's Linux only I think? I get a message that I need Cygwin to run it...

The server board is a Supermicro X9DRW-iF and I have 96GB of RAM. The OS is Windows Server 2012. It might be the RAM config (8*8GB and 8*4GB modules), but I have used mixed modules with E3s too and never had such a problem... However, I'll try using 16*8 modules... It's just that I need remote hands for this and it will take a while...
 

istav555

Member
Apr 14, 2013
46
4
8
Is it a Windows machine? If so: http://www.cs.virginia.edu/stream/FTP/Contrib/StreamWin-32-64_distro.zip should be what you need.
Thanks, this worked! :)

Interesting, performance is greatly affected depending on number of threads used... The E5 performs better with 8 threads and more:

E5:

Code:
Single thread:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        6013.8883       0.0283       0.0266       0.0347
Scale:       6012.9626       0.0283       0.0266       0.0351
Add:         7294.0095       0.0350       0.0329       0.0424
Triad:       7396.7923       0.0344       0.0324       0.0419
-------------------------------------------------------------

2 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        6140.2067       0.0291       0.0261       0.0387
Scale:       6172.9551       0.0294       0.0259       0.0450
Add:         7078.1355       0.0364       0.0339       0.0430
Triad:       7042.7224       0.0375       0.0341       0.0466
-------------------------------------------------------------

4 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       12218.9513       0.0142       0.0131       0.0175
Scale:      12332.2684       0.0139       0.0130       0.0178
Add:        14130.8791       0.0182       0.0170       0.0229
Triad:      14017.6292       0.0185       0.0171       0.0288
-------------------------------------------------------------

8 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       22836.8664       0.0078       0.0070       0.0108
Scale:      23072.9238       0.0084       0.0069       0.0236
Add:        25714.4948       0.0103       0.0093       0.0117
Triad:      25632.9370       0.0102       0.0094       0.0118
-------------------------------------------------------------

16 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       32303.0470       0.0151       0.0050       0.0325
Scale:      32912.0256       0.0148       0.0049       0.0312
Add:        34950.0091       0.0183       0.0069       0.0409
Triad:      35067.6681       0.0194       0.0068       0.0317
-------------------------------------------------------------

32 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        5565.1466       0.0570       0.0288       0.0954
Scale:       5330.1213       0.0641       0.0300       0.1830
Add:         7680.0118       0.0716       0.0312       0.2024
Triad:       9882.7670       0.0628       0.0243       0.1954
-------------------------------------------------------------

E3:

Code:
Single thread:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       17462.3571       0.0094       0.0092       0.0107
Scale:      17583.8385       0.0093       0.0091       0.0104
Add:        17591.4371       0.0140       0.0136       0.0161
Triad:      17766.8141       0.0139       0.0135       0.0159
-------------------------------------------------------------

2 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       17247.4070       0.0094       0.0093       0.0100
Scale:      17286.7264       0.0094       0.0093       0.0101
Add:        16709.7180       0.0145       0.0144       0.0154
Triad:      16685.5658       0.0145       0.0144       0.0152
-------------------------------------------------------------

4 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       17416.9390       0.0095       0.0092       0.0110
Scale:      17389.3324       0.0094       0.0092       0.0110
Add:        17047.8238       0.0145       0.0141       0.0176
Triad:      17044.4427       0.0145       0.0141       0.0171
-------------------------------------------------------------

8 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       16198.4319       0.0127       0.0099       0.0290
Scale:      16199.9586       0.0114       0.0099       0.0217
Add:        16190.2940       0.0170       0.0148       0.0312
Triad:      16198.0925       0.0175       0.0148       0.0312
-------------------------------------------------------------

16 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        5160.0000       0.0648       0.0310       0.0937
Scale:       3424.1689       0.0709       0.0467       0.0938
Add:         7642.1105       0.0732       0.0314       0.1248
Triad:       5124.6147       0.0718       0.0468       0.1248
-------------------------------------------------------------

32 threads:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2568.0372       0.0961       0.0623       0.1343
Scale:       2563.8362       0.0948       0.0624       0.1278
Add:         3848.8546       0.0971       0.0624       0.1277
Triad:       4643.6282       0.0991       0.0517       0.1537
-------------------------------------------------------------

Are these results justified by the GHz difference between the two processors? (2GHz for the E5 compared to 3.3GHz for the E3)
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,520
5,828
113
The E5 gets quad channel memory but the E5-2620 is also limited to 1333MHz RAM. The E3-1200 V2 series can use 1600MHz DDR3. The 2x memory bandwidth performance you are seeing on the E5 still seems a bit low but you are at least 2x now.

Do you have full configs for both machines?
 

istav555

Member
Apr 14, 2013
46
4
8
Do you have full configs for both machines?
Yes of course:

Dual E5-2620
Supermicro X9DRW-iF
8x 8GB DDR3 SDRAM ECC Unbuffered 1333
8x 4GB DDR3 SDRAM ECC Unbuffered 1333

E3-1230 V2
Supermicro ICRO X9SCM-F
4x 8GB DDR3 SDRAM ECC Unbuffered 1333

About disks, at the moment I'm trying many things, pure-SSD, CacheCade, Intel 710, Samsung 840 Pro, Intel 520...

The E5 gets quad channel memory but the E5-2620 is also limited to 1333MHz RAM. The E3-1200 V2 series can use 1600MHz DDR3. The 2x memory bandwidth performance you are seeing on the E5 still seems a bit low but you are at least 2x now.
Same RAM for both servers (1333) so no 1600MHz for the E3 which makes the results even poorer for the E5 I guess? (I also verified with CPU-Z that both systems run at the same DRAM frequency)

I didn't quite get what you mean by "but you are at least 2x now"...

Something else, CPU-Z cannot read the RAM SPD data on the E5. Any idea about anything else I could use for that?
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
My work depends on maximizing memory bandwidth, so I've spent quite a bit of time on this topic.

To get maximum performance from your particular setup, use either eight or sixteen DIMMS (one or two per memory channel), spread them out over the right slots (see the manual), and use DIMMS with the same capacity, speed, number of banks, and the same timings - it's easiest to just use identical DIMMS. Each of these three rules makes a huge difference.

To test your maximum memory bandwidth, you can use STREAM or SiSoft Sandra. STREAM is something of a "standard", but unfortunately the Windows version is no good for CPUs with six or twelve cores since the UI won't let you set the number of threads to be equal to the number of cores - which is requried to get the best results. With your dual six core CPUs, you'd test with 12 threads, which you cannot do using the Windows GUI, so Sandra is the better choice.

With eight identical DIMMS spread out over the right slots and tested with Sandra, you should see MUCH higher memory bandwidth. If not, which is a very unlikely case, then there really is a deeper issue.

Try SIV (system information viewer) to read RAM details.
 
Last edited:

Lost-Benji

Member
Jan 21, 2013
424
23
18
The arse end of the planet
Retest with only 4GB DIMMs and then repeat with the 8GB DIMMs. Post results. Also, more detail on your hardware and its configuration, so far, your being rather vague.
Also ensure that you have configured the BIOS correctly, this will be the likely issue and/or poorly placed DIMMs (not running quad-channel).

Your also comparing a dual-channel IMCH to a quad-channel IMCH.
ARK | Intel® Xeon® Processor E5-2620 (15M Cache, 2.00 GHz, 7.20 GT/s Intel® QPI)
ARK | Intel® Xeon® Processor E3-1230 v2 (8M Cache, 3.30 GHz)

Having quad-channel doesn't mean double speed either, known fact that the triple-channel IMCH was always faster than the quads.
DBA is on the money also.
 

istav555

Member
Apr 14, 2013
46
4
8
So much useful information here, thank you guys! :)

You are both right, I have to use identical DIMMs for this. However, I don’t really have any options about which DIMM slots to use since it's a 16-slot M/B. I requested from my provider to upgrade to 128GB RAM but use new identical DIMMs for that. It will take a few days for this to be delivered so I will update the thread once I have everything ready.

A couple of notes:

1. I'll use Sandra to measure RAM performance.
2. Thanks for the heads up for SIV! Amazing tool, thanks!
3. What else would you like to know about the H/W config? CPU, RAM, M/B are already available, do you need more details about RAM timings or something like that?
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,654
2,066
113
Curious if you resolved your problem, if so what did you find? Was it the mix-match RAM? Was it the CPU low frequency?

I was literally typing up my thread when this popped up as a suggested read, spot on I might add.
 
  • Like
Reactions: Patriot

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,654
2,066
113
I was going to post about performance between the E5-2620 and ask if a higher frequency E5 like a 1650, if performance would change at all or not if it uses RAM. In which case I wonder the performance of my slow ram vs the 1866 stuff. Sounds like I may have some testing to do :D
 

istav555

Member
Apr 14, 2013
46
4
8
Well I never really got to going all-in with this one...

However, I did populate all RAM slots with the same 8GB DIMMs back then and still had the same poor results from the E5. At that point I stopped trying to find out what's wrong and just settled with the idea of lower RAM performance than what I expected. Other priorities came up so just had no more time for this one...

But, I now have a dual E5 2620 with 256GB of RAM. I'll run some more tests!! :)
 
  • Like
Reactions: T_Minus