Drag to reposition cover

ASUS Pro WS W790E-SAGE SE + Intel Xeon Sapphire Rapids SPR-SP

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NO_ob

New Member
Oct 29, 2024
18
0
1
anyone here have all 8 memory channels filled would you be able to run llama-bench, my qquad channel results are below i only got like 12-13 t/s with dual channel so i think memory channels effect the performance a lot



32 threads with half cores disabled

'/Documents/llama.cpp/build/bin/llama-bench' --numa distribute -t 32 -m '/Downloads/llama-2-7b.Q4_0.gguf' -p 0 -n 128,256,512

| model | size | params | backend | threads | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg128 | 21.92 ± 0.83 |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg256 | 22.97 ± 0.13 |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg512 | 22.47 ± 0.02 |

104 threads with all cores enabled

'/Documents/llama.cpp/build/bin/llama-bench' --numa distribute -t 104 -m '/Downloads/llama-2-7b.Q4_0.gguf' -p 0 -n 128,256,512

| model | size | params | backend | threads | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg128 | 24.22 ± 0.05 |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg256 | 23.95 ± 0.03 |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg512 | 23.33 ± 0.01 |
 

RolloZ170

Well-Known Member
Apr 24, 2016
8,032
2,514
113
germany
The motherboard itself is new, the processor - QYK8 from Ebay, No
we have had defective new motherboards, rare but happen.
how do chinese (often) test CPU ?
they put them in socket, turn on the motherboard and try if they get in to BIOS = pass.
try to get other CPU e.g. QYFP/QYFQ (and not QYK8) to sort things out.
 

Max1024

New Member
Jun 23, 2024
13
1
3
I do not know how it was tested, but I will manage to buy something cheaper I have some options: 8461V ES QYFU- $ 103.42 ,or 8468 ES QYFQ $ 85.50 or 8468 ES QYFP $ 81.09 maybe better to chose QYFU- $ 103.42 ?
 
Last edited:

sam55todd

Active Member
May 11, 2023
206
62
28
...I think either the processor QYK8 or the board...
90% CPU issues, Engineering Samples are are expected to be full of bugs which isn't even officially supported by MB/BIOS. ES in general is quite a pain and masochism. I prefer to avoid such gamble whenever is possible.

I have spare QYFQ which doesn't work on my SM MB and ebay seller never requested it to be sent back - just refunded order more than a year ago (I think it was for around $400 back then).
Can send it via ebay as "for parts" item to whoever is in UK for £10 to cover postage.
 

RolloZ170

Well-Known Member
Apr 24, 2016
8,032
2,514
113
germany
90% CPU issues, Engineering Samples are are expected to be full of bugs which isn't even officially supported by MB/BIOS. ES in general is quite a pain and masochism. I prefer to avoid such gamble whenever is possible.
very unhelpfull comments. SPR D0 are working fine for many people here. some bad CPU can happen with all types of them.
e.g.
i have had a platinum 8176 (prod.unit Retail, no ES) went fine into BIOS but BSOD in OS load.
defective core(s) - enableing only 10 cores got me in OS.
(edit: no core disable bitmap because supermicro motherboard )
 
Last edited:

tms

New Member
Sep 25, 2024
10
9
3
anyone here have all 8 memory channels filled would you be able to run llama-bench, my qquad channel results are below i only got like 12-13 t/s with dual channel so i think memory channels effect the performance a lot



32 threads with half cores disabled

'/Documents/llama.cpp/build/bin/llama-bench' --numa distribute -t 32 -m '/Downloads/llama-2-7b.Q4_0.gguf' -p 0 -n 128,256,512

| model | size | params | backend | threads | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg128 | 21.92 ± 0.83 |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg256 | 22.97 ± 0.13 |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg512 | 22.47 ± 0.02 |

104 threads with all cores enabled

'/Documents/llama.cpp/build/bin/llama-bench' --numa distribute -t 104 -m '/Downloads/llama-2-7b.Q4_0.gguf' -p 0 -n 128,256,512

| model | size | params | backend | threads | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg128 | 24.22 ± 0.05 |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg256 | 23.95 ± 0.03 |

| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg512 | 23.33 ± 0.01 |
Gigabyte MS33-AR0 + EMR Q2SR + 8 x DDR5 64Gb, 4800 MHz (Kingston Server Premier KSM48R40BD4TMM-64HMR).
BIOS R07 (tweak RolloZ170): Standart Perfomance, Virtual Numa = Enable (4 nodes).

https_://github.com/ggerganov/llama.cpp/releases/download/b4663/llama-b4663-bin-win-cuda-cu12.4-x64.zip
build: c026ba3c (4663)
https_://huggingface.co/TheBloke/Llama-2-7B-GGUF

D:\llama_cpp>llama-bench.exe --numa distribute -t 32 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 32 | tg128 | 29.15 ± 0.04 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 32 | tg256 | 28.61 ± 0.08 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 32 | tg512 | 27.66 ± 0.21 |

D:\llama_cpp>llama-bench.exe --numa distribute -t 104 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 104 | tg128 | 19.91 ± 0.38 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 104 | tg256 | 20.28 ± 0.07 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 104 | tg512 | 19.79 ± 0.04 |

D:\llama_cpp>llama-bench.exe --numa distribute -t 128 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 128 | tg128 | 18.42 ± 0.28 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 128 | tg256 | 18.47 ± 0.17 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 128 | tg512 | 17.82 ± 0.06 |

Clear Linux OS :
./llama-bench --numa distribute -t 32 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 32 | tg128 | 33.71 ± 2.58 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 32 | tg256 | 37.20 ± 0.85 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 32 | tg512 | 37.35 ± 0.52 |

./llama-bench --numa distribute -t 104 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 104 | tg128 | 29.59 ± 2.32 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 104 | tg256 | 32.03 ± 0.57 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 104 | tg512 | 31.69 ± 0.08 |

./llama-bench --numa distribute -t 128 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 128 | tg128 | 28.14 ± 2.36 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 128 | tg256 | 28.82 ± 1.15 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 128 | tg512 | 27.26 ± 0.47 |
 
Last edited:
  • Like
Reactions: NO_ob

NO_ob

New Member
Oct 29, 2024
18
0
1
Gigabyte MS33-AR0 + EMR Q2SR + 8 x DDR5 64Gb, 4800 MHz (Kingston Server Premier KSM48R40BD4TMM-64HMR).
BIOS R07 (tweak RolloZ170): Standart Perfomance, Virtual Numa = Enable (4 nodes).

https_://github.com/ggerganov/llama.cpp/releases/download/b4663/llama-b4663-bin-win-cuda-cu12.4-x64.zip
build: c026ba3c (4663)
https_://huggingface.co/TheBloke/Llama-2-7B-GGUF

D:\llama_cpp>llama-bench.exe --numa distribute -t 32 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 32 | tg128 | 29.15 ± 0.04 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 32 | tg256 | 28.61 ± 0.08 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 32 | tg512 | 27.66 ± 0.21 |

D:\llama_cpp>llama-bench.exe --numa distribute -t 104 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 104 | tg128 | 19.91 ± 0.38 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 104 | tg256 | 20.28 ± 0.07 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 104 | tg512 | 19.79 ± 0.04 |

D:\llama_cpp>llama-bench.exe --numa distribute -t 128 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 128 | tg128 | 18.42 ± 0.28 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 128 | tg256 | 18.47 ± 0.17 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CUDA,RPC | 99 | 128 | tg512 | 17.82 ± 0.06 |

Clear Linux OS :
./llama-bench --numa distribute -t 32 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 32 | tg128 | 33.71 ± 2.58 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 32 | tg256 | 37.20 ± 0.85 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 32 | tg512 | 37.35 ± 0.52 |

./llama-bench --numa distribute -t 104 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 104 | tg128 | 29.59 ± 2.32 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 104 | tg256 | 32.03 ± 0.57 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 104 | tg512 | 31.69 ± 0.08 |

./llama-bench --numa distribute -t 128 -m llama-2-7b.Q4_0.gguf -p 0 -n 128,256,512
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 128 | tg128 | 28.14 ± 2.36 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 128 | tg256 | 28.82 ± 1.15 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | RPC | 99 | 128 | tg512 | 27.26 ± 0.47 |
so some improvement but not as big as going from 2 to 4 thanks its also kinda itneresting that after like 32 threads speed doesnt increase even though all threads are used i saw all of mine going to 100%

oh just saw the clear linux result also crazy that it almost doubles the performance with linux, maybe thats more aligned with my 2 to 4 channel increase so maybe memory channel effect linux performance more than it does on windows my benchmarks are from arch linux
 
Last edited:
Mar 12, 2020
75
30
18
Okay quick question. It never dawned on me to ask what Windows Version is everyone running? I am on Windows 10 22H2 without debloat (could prob do this but haven't gotten around to it). I think I found a reason to goto Windows 11 24H2 and that was for the Android subsystem, but I see that is being removed March 5. I found Google Play for PC and that's works on 10.

Here's an update for those using Xeon Platium 8480+ QYFS paired with Ryzen 6900XT and 128GB DDR5:
  • Have to restart once a week, sometimes once every two weeks. But other than that, computer stays on 24/7.
  • Firefox has issue with delayed playback with Youtube videos. Chrome plays youtube videos just fine. Could just be a Google hates Firefox thing.
  • Some games are not smooth with high motion movement. Example: Genshin Impact, moving mouse left to right creates artifacts. On Ryzen 5950x movement is butter smooth.
Those are my only issues. Using a 56 core processor is overkill but who is complaining. You can get the QYFS processor on Ebay on for less than $300. Has anyone moved to W9-3475x or W9-3495x 4677 or any other processor this motherboard can take that is unlocked. These two processors on ebay for greater than $3000 but on Taobao it can be had for $1000. Thanks in advance
 
Mar 12, 2020
75
30
18
JosefHrib, even though your response hasn't posted yet (got an email), Thanks for continuing us down the rabbit hole. May have to do some extra gig work to move on up. I found you newest thread. Nice!!!
 

JosefHrib

Active Member
Jul 25, 2023
124
108
43
39
JosefHrib, even though your response hasn't posted yet (got an email), Thanks for continuing us down the rabbit hole. May have to do some extra gig work to move on up. I found you newest thread. Nice!!!
For example EMR Q2SR 64core 320MB L3 350W cost plus minus same as SPR QYFS 56core 105MB L3 350W. Only minus is that user must exchange board from W790 to C741. Or wait on GNR. Now I live with Q2SR and it is good cpu.
 
Last edited:
  • Like
Reactions: tms
Mar 12, 2020
75
30
18
For example EMR Q2SR 64core 320MB L3 350W cost plus minus same as SPR QYFS 56core 105MB L3 350W. Only minus is that user must exchange board from W790 to C741. Or wait on GNR. Now I live with Q2SR and it is good cpu.
Going to the Xeon 8592+, do I really have to go away from the W790?
 

SDletmk

Member
Dec 30, 2023
61
4
8
I just got a W790E-Sage SE, but I can't get it past Code 00. I've tried with both a W3-2423 CPU and a QYFS. I don't know the current BIOS version. Is there anything I might be doing wrong? I've checked power connectors, etc., and all the parts are known to work on another system.
 

RolloZ170

Well-Known Member
Apr 24, 2016
8,032
2,514
113
germany
I just got a W790E-Sage SE, but I can't get it past Code 00. I've tried with both a W3-2423 CPU and a QYFS. I don't know the current BIOS version
doesn't matter. all versions support W3-2423.
maybe you have pictures for us ?
wrong 24pin used(there are two) forgotten EPS 8pin cables, check out side a case to sort out misplaced motherboard standoffs etc.
I just got a W790E-Sage SE, but I can't get it past Code 00. I've tried with both a W3-2423 CPU and a QYFS. I don't know the current BIOS version. Is there anything I might be doing wrong? I've checked power connectors, etc., and all the parts are known to work on another system.
CPU installation correct ? the 4 screws crosswise (only one turn per stroke) until you feel the end of thread, you have to be brave.
 

SDletmk

Member
Dec 30, 2023
61
4
8
doesn't matter. all versions support W3-2423.
maybe you have pictures for us ?
wrong 24pin used(there are two) forgotten EPS 8pin cables, check out side a case to sort out misplaced motherboard standoffs etc.

CPU installation correct ? the 4 screws crosswise (only one turn per stroke) until you feel the end of thread, you have to be brave.
CPU was installed crosswise as you've said, I took it out and put it back in. I took the EPS power in and out - the power LED lit up when it was out, and turned off when it when back in. There does appear to be some minor damage on the motherboard's corner near the CPU power.

I'm following the manual exactly. Motherboard is placed on one of those external case-plates with standoffs. Also, the room it's being tested in is somewhat cold (60F/16C), so I tested the LN2 jumper and it had no change with the jumper in its normal or LN2 positions. The cooler I'm using is a Noctua NH-U14S DX-3647 with the 4677 adapter kit, in case there are any known issues with how that cooler mounts to these boards.

Edit: None of the Q-LEDs are lit up and the BMC LED is blinking, yet the heat sink is cold (no power to CPU?) and the error code is still 00.
Edit 2: Attempted to flash earliest BIOS in case the system BIOS was corrupted. Bios seemed to flash successfully, but the system showed no change in response.
Edit 3: Attempted a different power supply (1500W, previous was 850W). Unfortunately, I don't have two PSUs with the same wattage for a dual PSU test.
 
Last edited:

tms

New Member
Sep 25, 2024
10
9
3
Xeon Platium 8480+ QYFS - What is the xmrig hashrate shown by the processor if all channels on the motherboard are involved?
Intel Xeon Platinum 8480+ ES QYFR 56 Cores + 8 x DDR5 (4800 MHz).
29612.08 H/s / single thread: 528.79 H/s (Power policy = Standart perfomance, Virtual Numa = Enable (4 nodes)).
32591.34 H/s / single thread: 614.93 H/s (Power policy = Best perfomance, Virtual Numa = Enable (4 nodes)).
 
Last edited:

Max1024

New Member
Jun 23, 2024
13
1
3
SDletmk
Bios version is on the sticker near the socket from the right.
Add 1 RAM stick according to manual. Plug 24pin cable + 2x8 Pin and + additional 8pin (near 24pin). Start in and wait for 4-5 minutes, its slowly starts first time.