How?: Single-threaded -> CPU1 High clock speed. Multi-threaded -> CPU2 high core count

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

fatherboard

Member
Jun 15, 2025
173
10
18
This could be a big ask, I do not expect something is already out there, just trying:

Is it possible to somehow have 2 CPUs:
- CPU 1: High clock speed, exclusively used for single threaded operations, e.g. ≥ 4.7 GHz, or even overclocked to 5.4. I know it's TR5
- CPU 2: High core count, exclusively used for multi-threaded operations, e.g. 192 cores I know it's SP5

AND to automatically (somehow) have the single threaded operations be picked up and processed by the High clock speed CPU1, and have automatically (somehow) the multi-threaded operations be picked up by the high core count CPU2?

For clear cut applications where all processing is single-threaded or multi-threaded I can use separate machines, I already have that separation.
But some applications mix single and multi-threaded operations, and it's not known which is single and which is multi threaded, for these applications it's just frustrating to have to slow down everything to 1 thread at 2.25 GHz, and not even know about it. In a month of unattended processing this split could save significant time.

If you have come across anything like this, I'll appreciate, including the automated part that assigns/reroutes to the right CPU.

P.S. please consider using the ignore button on my profile.
 

i386

Well-Known Member
Mar 18, 2016
4,849
1,895
113
36
Germany
Automatically? No
Manually? Yes. in windows for example fire up the task manager, go to details, set affinity and pick the cores on which the software should run.
and it's not known which is single and which is multi threaded
This is what you think about and analyze before even consider buying hardware...
 
  • Like
Reactions: MBastian and nexox

CyklonDX

Well-Known Member
Nov 8, 2022
1,795
642
113
for windows - automatic solution would be process lasso; you can pre-define cpu/core/thread affinities to processes (it will remember those settings). It also does really good job out of the box managing workflow on cpu's that have more than 8c. *i think its a must for windows systems today.
*It can also help you identify which cores are best, i.e. provide highest performance, cache locality, frequency etc.

on linux its bit harder. but its also doable. *if you need that let me know and i can point you how-to (at least on rhel or slackware)
 
Last edited:

twin_savage

Active Member
Jan 26, 2018
158
124
43
35
for windows - automatic solution would be process lasso; you can pre-define cpu/core/thread affinities to processes (it will remember those settings). It also does really good job out of the box managing workflow on cpu's that have more than 8c. *i think its a must for windows systems today.
*It can also help you identify which cores are best, i.e. provide highest performance, cache locality, frequency etc.

on linux its bit harder. but its also doable. *if you need that let me know and i can point you how-to (at least on rhel or slackware)
This might be slightly OT, but I tried using process lasso to control which processor and memory was used for a workload on some of the high bandwidth memory Xeons and it was not able to control process memory locality which completely defeats the purpose of being able to set core affinity.
Using Linux was the only option to actually control exactly what CPU cores and what memory were used to execute using numactl. This is more of a problem for NUMA systems, however more and more systems are becoming NUMA as time goes on.
This also means process lasso won't be able to deal with CXL type 3 devices.
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,795
642
113
This might be slightly OT, but I tried using process lasso to control which processor and memory was used for a workload on some of the high bandwidth memory Xeons and it was not able to control process memory locality which completely defeats the purpose of being able to set core affinity.
Using Linux was the only option to actually control exactly what CPU cores and what memory were used to execute using numactl. This is more of a problem for NUMA systems, however more and more systems are becoming NUMA as time goes on.
This also means process lasso won't be able to deal with CXL type 3 devices.
The application needs to start with numa affinity predefined to be assigned to desired correct memory channels. (The same is with linux.)
~> start with "start /affinity X application.exe" then you can let process lasso take over with thread pinning.
 

MBastian

Active Member
Jul 17, 2016
284
93
28
Germany
Mixing Epyc CPUs on a dual board is a thing? Like one 9575F for speed one 9965 for cores? Without digging further into it I'd think it's probably only worth it if the workloads can be divided per application, not per thread inside an application.Personally I am happy not to deal with inter-CPU / NUMA communication anymore and stay with single socket systems. VM nodes being an exception.
 
Last edited:

fatherboard

Member
Jun 15, 2025
173
10
18
This is what you think about and analyze before even consider buying hardware...
Obviously, that analysis was done a loooooong time ago, long before even the idea of starting anything at all, a list of applications was made, both business critical and none critical, and the decision was a long time ago to have 3 types of machines:
  1. CPU Single-threaded: There is a reason a high clock speed 5.4 GHz Threadripper pro joined the party: to handle the single threaded applications.
  2. CPU Multi-threaded: The other machines will be purely high core count machines for multi-threaded applications, including the main app.
  3. GPU: some apps fancy the GPU, these will get a separate machine.
This question now attempts to go a step further, is there a 4rth type of machines or set-ups or whatever to address mostly one app for which it is impossible to find out what part is single-threaded and what is multi-threaded without opening the code. It's a known unknown, that's impossible to know. So impossible that even the app owners themselves can't tell because it's been developed over an extremely long time. Worse, there are contradicting statements some say mostly single others say multi, and to push it further in the darkness other 3rd party code is added which is also unknown. This is where my question comes in. Instead of launching the hundreds of processes one by one manually and watching the CPU, I wondered if there were ways to split the workload automatically inside the same app.
 

fatherboard

Member
Jun 15, 2025
173
10
18
Mixing Epyc CPUs on a dual board is a thing? Like one 9575F for speed one 9965 for cores? Without digging further into it I'd think it's probably only worth it if the workloads can be divided per application, not per thread inside an application.
For me it's only worth it if I can split processes inside an app, because per app I've already solved with different machines.
If there is any way whatsoever, I'm happy to test it.
I'm slowly starting to realise this might be impossible since I see the whole app as one process in the performance monitor. I can set affinity, but this limits the whole app to chosen CPUs, not just the single-threaded tasks.
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,795
642
113
First the easiest solution is to have your application/s run in docker.

There is taskset to change pinning to existing process but that won't move/enforce your memory locality if process was initially started on different numa node; to overcome that you need to change start params of your application by using numactl command with --physcpubind=x --membind=X /application-start. You can change your service files to use that command instead of default one. *You can also add cpuaffinity into service.

You can write your own balacing tool, by periodic scrape and assign to threads you want. *(This is just example)

Code:
#!/bin/bash

# Configuration
CHECK_INTERVAL=60        # Interval in seconds between checks
CPU_THRESHOLD=80         # CPU usage threshold percentage
BIND_CPUS="1-3"          # CPUs to bind processes to (e.g., "1-3")
PROCESSES_TO_MONITOR=("your_process_1" "your_process_2")  # List of processes to monitor

# Function to get CPU usage
get_cpu_usage() {
    top -bn1 | grep "Cpu(s)" | sed "s/.*, *$[0-9.]*$%* id.*/\1/" | awk '{print 100 - $1}'
}

# Function to bind a process to specific CPUs
bind_process() {
    local pid=$1
    taskset -p -c $BIND_CPUS $pid
    echo "Bound process $pid to CPUs $BIND_CPUS"
}

# Main loop
while true; do
    # Get current CPU usage
    cpu_usage=$(get_cpu_usage)
    cpu_usage=${cpu_usage%.*}  # Remove decimal part

    echo "Current CPU usage: $cpu_usage%"

    if [ "$cpu_usage" -ge "$CPU_THRESHOLD" ]; then
        echo "CPU usage threshold exceeded. Binding processes to CPUs $BIND_CPUS."

        for process in "${PROCESSES_TO_MONITOR[@]}"; do
            pid=$(pgrep -f "$process")
            if [ -n "$pid" ]; then
                bind_process $pid
            else
                echo "Process $process not found."
            fi
        done
    else
        echo "CPU usage is within limits. No action needed."
    fi

    # Wait for the next check interval
    sleep $CHECK_INTERVAL
done