Drag to reposition cover

GIGABYTE MS03-CE0 + Intel Xeon Emerald Rapids EMR-SP

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

RolloZ170

Well-Known Member
Apr 24, 2016
8,864
2,823
113
germany
Although for some reason the single core clock is the same as for the Sapphire Rapids Gold CPU and the multicore is like 20% more despite having twice the cores number. I thought we would see some IPC improvements on the Emerald Rapids.
IPC means not all instructions. cbr23 uses operations already optimized.
it is a TDP thing. more cores have lower all core clocks. core count beats clocks but not much.
cbr23 is not memory sensitive, only one or 4 or 8 DIMMs makes no difference.
 

JosefHrib

Active Member
Jul 25, 2023
126
114
43
39
Added results without VMX Hypervisor Cinebench23, Cinebench2024, CPU-Z and AIDA64 Memory Benchmark with 8 memory modules.
 
Last edited:

custom90gt

Active Member
Nov 17, 2016
325
114
43
41
I'll be doing this same upgrade when my Q2SR gets here. I'm super excited for this build. What is the bios mod that you needed to do?
 

JosefHrib

Active Member
Jul 25, 2023
126
114
43
39
I'll be doing this same upgrade when my Q2SR gets here. I'm super excited for this build. What is the bios mod that you needed to do?
@RolloZ170 helped me. disable acm. exist second variant with enable acm but must be older version.
Emr is a nice upgrade from spr. Too bad they couldn't be run on motherboards asus w790 sage ace. But if you don't have a problem with c741 boards, this is an opportunity to get very good cpu for little money.
 
Last edited:
  • Like
Reactions: custom90gt

mkgai

New Member
Apr 26, 2025
1
1
3
@RolloZ170 helped me. disable acm. exist second variant with enable acm but must be older version.
Emr is a nice upgrade from spr. Too bad they couldn't be run on motherboards asus w790 sage ace. But if you don't have a problem with c741 boards, this is an opportunity to get very good cpu for little money.
Do you happen to have a copy of those instructions to disable acm?

I have a Gigabyte MS03-CE0 and a Q2SR A0. I saw your contributions to this community and realized I am in the right place - thanks for the great info.
 
  • Like
Reactions: RolloZ170

Andrix

New Member
Mar 15, 2025
10
9
3
What is the sustained all-core AVX512 frequency for Q2SR?

I am experimenting with QYFS + MS03-CE0, and it is running at 1.8 GHz in LINPACK benchmarks. The "Turbo Ratio Limits - AVX-512" from hwinfo screenshots is 23 for 56 cores, but apparently it is just the hard limit and not what is sustained. Anyways, I am trying to figure out how large the gain in performance would be if I replace QYFS by Q2SR. Has anyone had a look at the core clocks during an AVX512 workload?
 

RolloZ170

Well-Known Member
Apr 24, 2016
8,864
2,823
113
germany
What is the sustained all-core AVX512 frequency for Q2SR?
2400mhz
Code:
Turbo Ratio Limits - IA/SSE, Fused:    40x (1-32c), 34x (33-48c), 27x (49-58c), 26x (59-64c)
Turbo Ratio Limits - IA/SSE, Resolved:    40x (1-32c), 34x (33-48c), 27x (49-58c), 26x (59-64c)
Turbo Ratio Limits - AVX2, Fused:    38x (1-32c), 32x (33-48c), 26x (49-54c), 25x (55-64c)
Turbo Ratio Limits - AVX2, Resolved:    38x (1-32c), 32x (33-48c), 26x (49-54c), 25x (55-64c)
Turbo Ratio Limits - AVX-512, Fused:    35x (1-32c), 29x (33-48c), 25x (49-54c), 24x (55-64c)
Turbo Ratio Limits - AVX-512, Resolved:    35x (1-32c), 29x (33-48c), 25x (49-54c), 24x (55-64c)
Turbo Ratio Limits - TMUL, Fused:    35x (1-32c), 29x (33-48c), 23x (49-54c), 22x (55-64c)
Turbo Ratio Limits - TMUL, Resolved:    35x (1-32c), 29x (33-48c), 23x (49-54c), 22x (55-64c)
 

Andrix

New Member
Mar 15, 2025
10
9
3
2400mhz
Code:
Turbo Ratio Limits - AVX-512, Fused:    35x (1-32c), 29x (33-48c), 25x (49-54c), 24x (55-64c)
Turbo Ratio Limits - AVX-512, Resolved:    35x (1-32c), 29x (33-48c), 25x (49-54c), 24x (55-64c)
Thanks, but have you actually verified that the core clock reaches 2400mhz in an AVX512 workload? It doesn't work this way for QYFS, at least not with the standard power limit settings. A hwinfo screenshot shows
Code:
Turbo Ratio Limits - AVX-512, Fused:    35x (1-28c), 29x (29-42c), 24x (43-50c), 23x (51-56c)
Turbo Ratio Limits - AVX-512, Resolved:    35x (1-28c), 29x (29-42c), 24x (43-50c), 23x (51-56c)
The turbo limit should be 2.3 GHz, whereas I see the following during a linpack benchmark run:
Code:
$ grep MHz /proc/cpuinfo | sort -n -k 4
cpu MHz        : 1739.738
cpu MHz        : 1797.958
cpu MHz        : 1797.960
...
cpu MHz        : 1798.842
cpu MHz        : 1799.059
cpu MHz        : 1799.105
cpu MHz        : 1800.179
cpu MHz        : 2573.705
cpu MHz        : 2800.000
 

RolloZ170

Well-Known Member
Apr 24, 2016
8,864
2,823
113
germany
The turbo limit should be 2.3 GHz, whereas I see the following during a linpack benchmark run:
limit is only if all cores run AVX512 which is rarely happen. during AVX512 heavy workload there is no space for core clock asking thought,
if i run AIDA benchmarks with heavy AVX512 i can do anything else, HWinfo doesn't response. ask core clock is a mailbox, if the code don't wait for completion(usualy they don't) you get wrong values.
 

Andrix

New Member
Mar 15, 2025
10
9
3
As a follow-up to the above conversation, I got my hands on a Q2SR (many thanks to @RZSN for arranging me access to his hardware) to run Intel optimized benchmarks. I used linpack and mp_linpack benchmarks that are the shared-memory and distributed Intel implementations of the High-Performance LINPACK (HPL) benchmark. To those unfamiliar with HPL, it gives you an idea about the FP64 peak performance in compute-bound tasks.

Anyways, here is a summary for all-cores runs.
CPUlinpack (GFLOPS)mp_linpack (GFLOPS)AVX512 freq. (GHz)Base freq. (GHz)
Q2SR (64c)380040001.9-2.21.7
QYFS (56c)274030701.81.9
8480+ (56c)292031201.92.0

My main take away is that Q2SR offers additional 30+% of peak performance compared to QYFS. I was trying to guess what the gain is by comparing the turbo frequencies (lack of information) and cinebench data (abundant but not that informative for me, also suggesting 13% gain).

Getting back to the question about all-core AVX512 frequencies, Q2SR runs at 2.1 GHz (occasionally 2.2 GHz spotted) in the beginning of a longer workload and drops to 2.0 GHz (occasionally 1.9 GHz spotted) after a continous load. The linpack numbers in the table were recorded towards the end of a 20-minute run. A linpack benchmark started after some short idle time would peak at 3940 GFLOPS. The performance degradation in longer runs could be related to power-limit settings (PL1 time set to 128 s as discussed in this thread) and temperatures, but I don't understand the deal with powerlimits entirely. The CPU power consumption (the turbostat reading) initially reaches 380W staying like this for a while and then reduces to 350W. If anyone can explain this, your comment would be very welcome.

I can post more technical details (possibly in a separate thread?) if anyone is interested.
 

RolloZ170

Well-Known Member
Apr 24, 2016
8,864
2,823
113
germany
The CPU power consumption (the turbostat reading) initially reaches 380W staying like this for a while and then reduces to 350W. If anyone can explain this, your comment would be very welcome.
power limit for Xeon's is strictly TDP. TDP can be exceeded for a limited time(max.448 sec.).
with PL1 Time Window = 1 sec. you will have 380W for one second.
 
  • Like
Reactions: DHamov

Andrix

New Member
Mar 15, 2025
10
9
3
Well, here is what I saw:
linpack-pl1.png
The workload consists of 4 smaller runs about 40s each. The PL1 time window is set to 128s, and I sketched what I believe was the first one. The average power was 350W (TDP). I assume the next window should start right after those 128s. But then those 380W should have lasted a bit longer in the second window. Does the temperature enter the conversation here or is it completely irrelevant?
 
  • Like
Reactions: DHamov

RolloZ170

Well-Known Member
Apr 24, 2016
8,864
2,823
113
germany
The workload consists of 4 smaller runs about 40s each. The PL1 time window is set to 128s, and I sketched what I believe was the first one. The average power was 350W (TDP). I assume the next window should start right after those 128s. But then those 380W should have lasted a bit longer in the second window. Does the temperature enter the conversation here or is it completely irrelevant?
PL1 TimeWindow e.g. 128 is not one shoot forever. with some healing (internal calculator) the time can start again, or a fraction of.
If Temperator is a major value, with very proper cooling you could run 380W forever, but it is not the case.
edit: the timer is started after reaching the TDP limit.
 
  • Like
Reactions: DHamov and Andrix

sam55todd

Active Member
May 11, 2023
212
65
28
limit is only if all cores run AVX512 which is rarely happen
I'm not sure if it even has AVX512 for all x64 cores, as far as I know normally Intel CPUs (most Xeons) have two AVX512 execution units per CPU (hardly per tile/chiplet)

edit: specification for 8592+ clearly states:
# of AVX-512 FMA Units = 2
 
Last edited:
  • Like
Reactions: DHamov

RolloZ170

Well-Known Member
Apr 24, 2016
8,864
2,823
113
germany