ES Xeon Discussion

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Cythisia

Member
Jul 25, 2024
37
7
8
With disabled cores, does this slow down QYFP's cache access, or remove total cache? So say 56 disabled to 28 cores, still has access to full 105mb of cache? And is there a bios option to stop accessing cache from another socket? Or is the cache only accessed on the utilized socket cores?
 

RolloZ170

Well-Known Member
Apr 24, 2016
7,184
2,240
113
And is there a bios option to stop accessing cache from another socket? Or is the cache only accessed on the utilized socket cores?
why do you not want the data from the much faster L3 ?
if you change the data in the Real RAM, it must be declared invalid if this location is stored in some L3 cache,
otherwise all other cores using the L3 cache work with wrong(old) data.
 

tms

New Member
Sep 25, 2024
7
6
3
Gigabyte MS33-AR0 + EMR Q2SR + 8 x DDR5 64Gb, 4800 MHz (Kingston Server Premier KSM48R40BD4TMM-64HMR).
BIOS R07 (tweak RolloZ170): Standart Perfomance, Virtual NUMA = Disable (2 nodes).

AlmaLinux 9.5, ClickHouse server version 25.1.1.3390 (official build).

echo "SELECT * FROM system.numbers LIMIT 100_000_000 OFFSET 100_000_000 SETTINGS max_threads = 128, max_insert_threads = 128" | ./clickhouse-benchmark --host=localhost --port=9000 -i 10

ch_bench_2025_01_19.JPG
Number of processed queries.
QPS: How many queries the server performed per second
RPS: How many rows the server reads per second
MiB/s: How many mebibytes the server reads per second
result RPS: How many rows placed by the server to the result of a query per second
result MiB/s. How many mebibytes placed by the server to the result of a query per second

Percentiles of queries execution time.
 

sam55todd

Active Member
May 11, 2023
186
55
28
Gigabyte MS33-AR0 + EMR Q2SR + 8 x DDR5 64Gb, 4800 MHz (Kingston Server Premier KSM48R40BD4TMM-64HMR).
BIOS R07 (tweak RolloZ170): Standart Perfomance, Virtual NUMA = Disable (2 nodes).

AlmaLinux 9.5, ClickHouse server version 25.1.1.3390 (official build).

echo "SELECT * FROM system.numbers LIMIT 100_000_000 OFFSET 100_000_000 SETTINGS max_threads = 128, max_insert_threads = 128" | ./clickhouse-benchmark --host=localhost --port=9000 -i 10

View attachment 41422
Number of processed queries.
QPS: How many queries the server performed per second
RPS: How many rows the server reads per second
MiB/s: How many mebibytes the server reads per second
result RPS: How many rows placed by the server to the result of a query per second
result MiB/s. How many mebibytes placed by the server to the result of a query per second

Percentiles of queries execution time.
Impressive, but at the same time 100mil+ rows per second might not be something break-through/"amazing".
I know it may not be directly subject for ES but while it's hot:
1) Do you have any bench at least from some reference highly-granular single-table (no joins) of real DB (preferably fitting inMemory {so it's probably 400GB+}, desirably numerical-only columns something like key-key-[bigint]value, no strings) with billions of rows to check how it works with ColumnStore index (to avoid impact by some storage bottlenecks)? - just asking "in case", something like this can be actually "too much" anyway.
2) Does Q2SR has IAA accelerators (how many units CPU has?) and ClikHouse configured to use it (e.g. via QPL) and AVX-512 ?
I'm not familiar specifically with CH (mostly MS-focused) but AFAIK some DBs on a back-engine side can "optimize" execution plan if there's no need to fetch results back to client leading to crazy high scores (although your note says "placed by the server to the result" suggesting benchmark takes care of this).
3) What's the "effective clock speed" on CPU during benchmark?
p.s. pretty sure mainstream DBs aren't programmed yet to take advantage of AMX?
Thanks.
 
Last edited:

RolloZ170

Well-Known Member
Apr 24, 2016
7,184
2,240
113
2) Does Q2SR has IAA accelerator (how many units CPU has?)
Q2SR has 4 IAX units.
PCI Address (Bus Device:Function) Number: 231:2:0
PCI Address (Bus Device:Function) Number: 236:2:0
PCI Address (Bus Device:Function) Number: 241:2:0
PCI Address (Bus Device:Function) Number: 246:2:0
 
Last edited:
  • Like
Reactions: sam55todd and tms

RolloZ170

Well-Known Member
Apr 24, 2016
7,184
2,240
113
RolloZ170, is EMR-SP QS processor seems like "Genue Intel 0000" in tests, or like a production CPU?
QS is production CPU but ES bit set. you see full brand string, not "0000"
but some sellers state they sell QS, but deliver "0000" ES ones.
seller of QS show production units in the're listing, strange new worlds.
 
  • Like
Reactions: Aleksei P

Aleksei P

New Member
Jun 7, 2022
8
3
3
QS is production CPU but ES bit set. you see full brand string, not "0000"
but some sellers state they sell QS, but deliver "0000" ES ones.
seller of QS show production units in the're listing, strange new worlds.
Thank a lot! You predicted my problem from my first question. :)