Remap CPU to RAM buses for increased performance

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

fatherboard

Member
Jun 15, 2025
173
10
18
Have you ever done the actual remapping of the cpu to the ram buses in order to increase performance? How is it done, and how to do it in such a way that it increases performance? in other words how not to mess up remapping CPU to RAM buses for increased performance?

Search results mention:

Reasons for Remapping:
Performance: Relocating frequently accessed data or code to faster memory areas can improve performance.
Overcoming Limitations: Remapping can allow systems to access more memory than would otherwise be addressable, especially in 32-bit systems.
I understand it can either be done with Memory Management Unit (MMU) for modern CPUs, and some systems use external logic circuitry.
How do I even know what areas of the RAM are faster? how do I relocate the frequently used data to these areas?

Why am I asking? because I'm clueless on how to do it and I want to avoid what search results also mention:
However, excessive remapping can also introduce overhead if not implemented carefully, potentially negating any performance gains.
I'm not (yet) interested in the security purpose of the remapping (by obscuring ram addresses to hide from malicious code), I'm now only interested in how to increase performance.
 

RolloZ170

Well-Known Member
Apr 24, 2016
9,922
3,188
113
germany
I understand it can either be done with Memory Management Unit (MMU) for modern CPUs, and some systems use external logic circuitry.
i think you mix MMU with IMC(integrated memory controller)
MMU = separation between process memory and main memory, memory protection.

most IMC can drive memory in channel interleaving (use search engine) can make sequencial mem.bandwidth faster but also result in higher latency. depends on your workload is latency if more important than sequencial bandwidth.
don't forget the L3 (LLC)
 
  • Like
Reactions: twin_savage

RolloZ170

Well-Known Member
Apr 24, 2016
9,922
3,188
113
germany
e.g. if the cores in tile A have code and or data in tile D,
i guess the performance can be inproved if the data&code is located in tile A too.
this should be orgasized by the OS, but this works only with apps who do not use the whole memory.
SPR mem ctrl x.jpg
 

fatherboard

Member
Jun 15, 2025
173
10
18
Although I'm aware at some point in time I may have asked or will ask a stupid question, but this question is definitely not one of them. I don't come up with these things myself, I'm too unsavvy for that, I heard a guy (who I believe wasn't supposed to say it) said their CTO remapped CPU to RAM buses, and this CTO got it as a recommendation from a top tier software company in their industry. Sure enough, after doing it they have delivered astonishing results remarkably fast, so good it got them nominated for an industry prize.

So that got me thinking and searching, and I got the Google AI search summary below that seems to know exactly what I'm talking about, it just doesn't tell the how:
REMAPPING CPU TO RAM BUSES

Memory remapping, in the context of CPUs and RAM, refers to the process of changing how the CPU accesses physical memory locations. This can involve relocating the memory address space or altering how the CPU interprets addresses to access different memory regions, potentially improving performance or security.

Here's a breakdown:

1. What is Memory Remapping?
  • **Address Mapping:**CPUs use a system of addresses to locate and access data stored in RAM. Memory remapping changes this mapping, so the CPU might access a specific physical memory location using a different address than it originally would.
  • Reasons for Remapping:
    • Performance: Relocating frequently accessed data or code to faster memory areas can improve performance.
    • Security: Remapping can be used to obscure memory addresses, making it harder for malicious software to find and exploit sensitive data.
    • Boot Process: In some systems, remapping is used during the boot process to relocate the initial boot code and stack from ROM to RAM.
    • Overcoming Limitations: Remapping can allow systems to access more memory than would otherwise be addressable, especially in 32-bit systems.
  • How it Works:
    • Memory Management Unit (MMU): Modern CPUs use an MMU to handle memory remapping. The MMU translates logical addresses (used by the CPU) to physical addresses (used by the RAM).
    • External Logic: In some systems, external logic circuitry can also be used to remap memory addresses.

2. Examples of Remapping:
  • **Boot Process:**At power-up, a system might load the initial boot code from ROM located at address 0. After the CPU initializes, the boot code and stack can be remapped to RAM, which is often faster, allowing for faster execution of the operating system or application.
  • **Address Space Expansion:**In some cases, memory remapping can allow a 32-bit system to access more than 4GB of RAM by remapping the address space.
  • **Security Schemes:**Some security schemes involve shuffling memory blocks within the RAM to make it harder for attackers to track data accesses.

3. Key Concepts:
  • System Bus: The main pathway for communication between the CPU and memory (RAM).
  • Data Bus: The part of the system bus that carries the actual data being transferred.
  • Address Bus: The part of the system bus that carries the memory addresses.
  • Front Side Bus (FSB): A specific type of system bus that connects the CPU to the main memory (RAM).
  • Memory Controller: A component that manages the flow of data between the CPU and RAM.

4. Remapping and Performance:
  • By strategically mapping frequently accessed data to faster memory locations, remapping can reduce the time it takes for the CPU to retrieve data, improving overall system performance.
  • However, excessive remapping can also introduce overhead if not implemented carefully, potentially negating any performance gains.
And because google is spying on me, after I posted this question, it added this to the AI search results:
In the context of your question: You're asking about remapping the CPU's access to RAM to improve performance, specifically how to do it without negatively impacting performance. You also mentioned that you're not interested in the security aspects.

To achieve performance gains through memory remapping, you need to:

  1. 1. Identify frequently accessed data/code:
    Determine which parts of your application or system utilize the most processing time.
  2. 2. Locate faster memory:
    Identify areas within your RAM that offer faster access speeds. This could be based on the memory controller's architecture or other factors.
  3. 3. Remap using the appropriate method:
    Depending on your system, use the MMU or external logic to remap the identified data/code to the faster memory regions.
  4. 4. Test and fine-tune:
    Thoroughly test your system after remapping to ensure performance improvements and identify any potential issues or overhead.
I would really like to know how.
 

mattventura

Well-Known Member
Nov 9, 2022
767
427
63
Maybe I'm misunderstanding, but isn't this effectively what Numa-Per-Socket does? Internally, a server CPU with multiple memory controllers like the one above works more or less like multiple CPU sockets, but with lower latency between them. So you can tell the motherboard to present the CPU as multiple NUMA nodes which can accelerate some applications. The flipside is that it can make other workloads slower, because many workloads want memory bandwidth (i.e. spread the data across all of the IMCs) over the reduced latency of keeping it all within one NPS node.
 
  • Like
Reactions: nexox

twin_savage

Active Member
Jan 26, 2018
167
127
43
35
Sub-NUMA clustering (or virtual NUMA if application is not NUMA-aware) can be used to inform cores where best to access memory; This in conjunction with numactl can give fairly good control over where cores are accessing their memory from to reduce latency and increase bandwidth.

Even with all these optimization I struggle to achieve good memory performance in my simulations because of memory concurrency bottlenecks. We need better/smarter process scheduling to really take advantage of the memory subsystems modern HPC processors give us.
In this vein Tim Chen's new cache aware load balancing kenel patchs might be the silver bullet we're looking for; optimizing system memory performance is only a minor piece of the problem, optimizing LLC performance is probably a bigger issue, atleast on the HPC problems I work on.
 
  • Like
Reactions: nexox