Disk Buffer/Cache control on Ubuntu 22.04

Bert · May 6, 2023

Ubuntu has a built in file cache which is mostly automatic and greatly helps with performance. I am running an application whose performance highly sensitive to memory access speeds and application is running on a 2 socket solution. Application is using GPU and memory and finally outputs to a disk.

If memory is not allocated on the correct socket, the performance degrades by half so memory control is very important. Application makes multiple iterations on the same buffer to process them iteratively. I am using numactl to pin the application to correct socket and set the preferred memory allocator.

File cache is giving me hard time here:

1. Allocating memory on the target socket, forcing app to use the wrong socket. I work around that problem by force freeing the cache memory and restart the app but that's cumbersome and takes my time for manual intervention. ( /proc/sys/vm/drop_caches )
2. Allocating memory for drives where cache is not and starving disks where the cache would help as we quickly want to offload the memory. I tried to work around this by disabling disk cache and hoping that this would also turn off memory buffer for the disk but it didn't work that way.

In short, I need to be able to turn off memory buffer/cache or set the size for some disks to ensure memory is available where it is needed. That would help me to solve ally my problems but I couldn't find any such setting. Is there one?

CyklonDX · May 6, 2023

I would advise killing the issue, by enabling memory mirroring mode on the system. This will ensure each numa node has same memory, and it won't matter anymore which socket wants what from memory or where the memory really is vs the numa.
Con that its cutting your memory in half. (*tho your read performance goes up too, write not so much more on downslope)

One of really good way to isolate caches per application with smallest footprint is to dockerize it / chroot it.
The docker when done right should ignore all your system settings, and run natively with ones it has specified (customized)

You can limit the docker much easier than a process.
(some details here)

Runtime options with Memory, CPUs, and GPUs

Specify the runtime options for a container

docs.docker.com

by mapping different fs per container/s | disks you can also ensure its limited to disk file caches on local system, and that those are kept in same numa node - so there's no cross-talk. As long as disk/sas controller is also present on 2nd cpu (and cpu's do not need to cross-talk to access each other devices) In perfect world you would want to have sas controller per each cpu, and network card per cpu. You can enable MPIO, connect 2 2 different sas controllers to same backplane, and in this way you will always be native to your numa node - so no cross talk would happen while you would have same visibility from both places (and won't have to rename it to different things).

Next one would be network, this is where you prob should do something else, as teaming/bonding them would not be for the best. Best to configure your dockers to use local to itself cpu for best performance.

Bert · May 7, 2023

Well that would work but the reason the app is memory latency sensitive, it uses all the memory. I need the memory from both NUMA node. App itself uses 256GB memory for runtime, the rest of memory is used by the system as disk cache and system needs that for quickly pushing buffers out of the app. Yes ideally, I can switch to 64GB modules then I can contain all the execution to single CPU.

The main issue is that, ubuntu/linux is wasting memory on I/O devices that will not benefit from buffer pool and starve the i/o device where I need the buffer pool.

I found that by using sync option, I can turn off memory pool but that pretty much kills the performance of any I/O operation now because every I/O does fsync.

What I need is being able to configure the cache size not completely disable it:
Use 20GB memory for these devices, Use 100GB memory for this device.

I don't think that I am this first one asking for this functionality.

CyklonDX · May 7, 2023

Bert said:
The main issue is that, ubuntu/linux is wasting memory on I/O devices that will not benefit from buffer pool and starve the i/o device where I need the buffer pool.

If docker is too much work, you can use
hdparm -W 0 /dev/sdX
You can also edit hdparm if you have it; /etc/hdparm.conf and set write_cache = off globally.
or

GitHub - Feh/nocache: minimize caching effects

minimize caching effects. Contribute to Feh/nocache development by creating an account on GitHub.

github.com

Bert said:
I found that by using sync option, I can turn off memory pool but that pretty much kills the performance of any I/O operation now because every I/O does fsync.

What I need is being able to configure the cache size not completely disable it:

Linux Performance Tuning: Dealing with Memory and Disk IO | Yugabyte

Discover how Linux deals with memory through its page cache and how memory availability—or lack thereof—influences buffered IO performance.

www.yugabyte.com

You can configure how often it tries to flush to disk, and how big it can get.

Search

Disk Buffer/Cache control on Ubuntu 22.04

Bert

Well-Known Member

CyklonDX

Well-Known Member

Runtime options with Memory, CPUs, and GPUs

Bert

Well-Known Member

CyklonDX

Well-Known Member

GitHub - Feh/nocache: minimize caching effects

Linux Performance Tuning: Dealing with Memory and Disk IO | Yugabyte