Write amplification. It's been a hot minute since I really dived into specifics, but as I recall few (if any modern) SSDs are actually 512b under the hood, and writing 512b still requires a larger free area (and erase pages are even larger, but thats a whole other discussion). IIRC even many "4k native" drives need 8k or 16k free blocks to write. If you tell the OS/file system the drive is 512b, it will sometimes issue multiple 512b writes that should have been contiguous as part of (or all of) a 4k block. This will not always be automatically fixed, its an "it depends". When it isn't a 4k write can turn into 8 4k writes. In order to "save space" the file system would ALSO need to use 512b sectors/clusters/blocks. A 1TB drive with 4k sectors (drive and file system) can store close to 250 million files. If you have even close to 100 million files in most file systems on a single drive you are having what is known as "a bad time". NTFS minimum is 4K, while EXT3/4 is "clever" and you can store more than one inode per physical disk sector already (depending on format options).
512b mode is for legacy compatibility, and generally comes with downsides to iops/performance (including increased CPU overhead).
IMHO Write Amplification is mostly an Issue with ZFS when used with a VM Disk, possibly made worse if you use LUKS encrypted Devices and feed those instead of RAW Disks / Partitions to ZFS. I do for one ...
I'm not sure how much of a real Issue it is with enterprise Drives fed directly to an Application (e.g. PostgreSQL).
Then of course it depends on how "deep" and complex your "Stack" is.
Mine could look like the following on most Proxmox VE Hosts:
RAW Disk <-> GPT Partition <-> LUKS encrypted Device <-> ZFS Pool <-> VM Disk <-> EXT4 FS
It can definitively be worse if you do ZFS in the Guest on top of a ZFS Host Pool, then you have to coordinate (do only on one Side) Snapshots otherwise due to COW you'll run out of Space very quicky. Furthermore TRIM also needs to be done in the Guest periodically (either manually or automatically).
Although to be fair, besides Write Amplification, another Issue (when using the old Default ZFS
volblocksize of 8K, is that it will occupy TWICE as much space on any RAIDZ2 Backup Server, 16k (new Default) is going to be approx. 1:1 Size.
I could also see this space Overhead (but
for a different Reason) directly when using a HyperV Guest on Windows 11, you need to format
mkfs.ext4 -G 4096, otherwise a 32GB Virtual Disk might become 100GB on the Windows Host. Not sure about write amplification.
I was trying to play a bit around with Write Amplification Benchmarks half a Year ago with this:
Various Tools for Proxmox VE Systems. Contribute to luckylinux/proxmox-tools development by creating an account on GitHub.
github.com
Don't ask me for exact Values, I cannot remember

.
But I think the main Takeaway were:
- if you got consumer SSDs there's not much you can do to reduce Write Amplification significantly to the Point it won't be an Issue
- if you got enterprise SSDs there's not much Point to reduce Write Amplification because you have so much Endurance anyways
Sure, you can gain quite a bit by reducing Logging (that is A LOT of TBW saved each Year), use the default cached Writes (although slightly risky) in Proxmox VE (I believe
sync is going to be MUCH worse), but apart from that not really sure.
I tried
mkfs.ext4 -G 4096 but I don't think it made such a big difference on KVM Virtual Machines in terms of Write Amplification.
Similarly for ZFS
volblocksize higher than 16k.
Possibly PostgreSQL WAL Configuration could improve something (I'm not a PostgreSQL Expert), but not sure there's much else you can do.
If the Disk Partition is aligned to 1MiB (what I typically do), ZFS
volblocksize is set to 16k, Proxmox VE Write Cache is enabled (otherwise I think your Write Amplification will go through the Roof), EXT4 default should be 4KiB, LUKS2 Default should be 4KiB (
blkid reports 4096 for my LUKS Devices), I don't think there's much you can do, unless you can tune Caching & your specific Application.