Comparison: 512e versus 4kn

Tinkerer

Member
Sep 5, 2020
43
10
8
I heard quite often that it doesn't matter and even that 4kn is faster or that 512e comes with "a performance hit". However, I have yet to see any direct comparisons with tools and parameters I trust (such as fio), so I decided to run my own. I will try to keep this concise :).

Obligatory necessities:
Hardware: Asus Z11PA-U12-10G-2S with Xeon 4210R, 192GB DDR4 ECC.
Controller: onboard SATA
OS: Red Hat 8.4, kernel 4.18.0-305.19.1.el8_4.x86_64
NO GUI (not installed): boot to multi-user.target (runlevel 3 if you like)
Test tool: fio 3.19
Disks: 2 x WD Gold 8TB Enterprise Class Hard Disk Drive - 7200 RPM Class SATA 6 Gb/s 256MB Cache 3.5 Inch - WD8003FRYZ

Only user and optional background processes were stopped. Nothing else was changed, tuned or tweaked.
The server is running Red Hat Virtualization (Ovirt, if you will), all vm's were stopped, the server put in maintenance and its processes stopped.

Partitions were aligned at 2048 with gdisk with all defaults (n, enter, enter, enter, enter, w).
Partitions were formatted with ext4 no special parameters.

Disks checked and confirmed physical and logical blocksize:
Bash:
# cat /sys/block/sdb/queue/logical_block_size
512
# cat /sys/block/sdb/queue/physical_block_size
4096
# cat /sys/block/sdc/queue/logical_block_size
4096
# cat /sys/block/sdc/queue/physical_block_size
4096
TL;DR:
512e is ~16% faster with 4 threads of random mixed workloads of 70% read/30% write with 16k blocksize. This increases to ~19% with 8 threads and higher queuedepths with 4k blocksize. Sequential reads with 1M blocksize are 8% slower compared to 4kn, while sequential writes with 1M are 3% slower.

I did 4 runs with fio. Actually, I did A LOT more runs, but I'll spare you all those, these are the averages of it:
Bash:
RUN 1    fio --filename=fiotest --size=4GB --rw=randrw --rwmixread=70 --rwmixwrite=30 --bs=16k --ioengine=libaio --iodepth=16 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 2    fio --filename=fiotest --size=4GB --rw=randrw --rwmixread=70 --rwmixwrite=30 --bs=4k --ioengine=libaio --iodepth=32 --runtime=120 --numjobs=8 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 3    fio --filename=fiotest --size=40GB --rw=read --bs=1M --ioengine=libaio --iodepth=1 --runtime=240 --numjobs=1 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 4    fio --filename=fiotest --size=40GB --rw=write --bs=1M --ioengine=libaio --iodepth=1 --runtime=240 --numjobs=1 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
Obligatory disclaimer:
These are my own results, YMMV. Take it as you please.

And I'll say this beforehand: I've tested this and confirmed it, using direct=1 and end_fsync on ext4 and xfs filesystems will negate caching and bufferering. In other words, there is no need to use test size twice the size of memory. This is different on ZFS for example, ARC works differently (also tested and confirmed).

Now for the numbers:
512e read iops512e read MiB/s512e write iops512e write MiB/s4kn read iops4kn read MiB/s4kn write iops4kn write MiB/s
RUN 1
359​
5,62​
156​
2,45​
298​
4,67​
130​
2,04​
RUN 2
374​
1,46​
162​
0,63​
305​
1,19​
132​
0,52​
RUN 3
175​
176,00​
-​
-​
189​
190,00​
-​
-​
RUN 4
-​
-​
195​
196,00​
-​
-​
201​
201,00​

MDADM Raid0 results
So you thought I was done? Ha! While I was at it, I decided to test a raid0 against different chunk sizes with 512e disks. Reformatted the 4kn drive with HUGO to 512e, checked and confirmed:

Bash:
# cat /sys/block/sdc/queue/logical_block_size
512
# cat /sys/block/sdc/queue/physical_block_size
4096
All arrays were created the same way, except for chunk size:
# mdadm --create --verbose /dev/md/mdraid0_backup --run --chunk=16K --metadata=1.2 --raid-devices=2 --level=0 /dev/sdb /dev/sdc
All filesystems were created the same way as above, with the exception of the xfs test (details below).

The numbers:
chunk=16Kread iopsread MiB/swrite iopswrite iops
RUN 1
733​
11,50​
317​
4,97​
RUN 2
767​
3​
328​
1,28​
RUN 3
398​
398​
-​
-​
RUN 4
-​
-​
405​
406​

chunk=64Kread iopsread MiB/swrite iopswrite iops
RUN 1
741​
11,60​
321​
5,02​
RUN 2
774​
3,03​
331​
1,30​
RUN 3
404​
405​
-​
-​
RUN 4
-​
-​
407​
408​

chunk=128Kread iopsread MiB/swrite iopswrite iops
RUN 1
746​
11,70​
322​
5,05​
RUN 2
767​
3​
328​
1,28​
RUN 3
404​
404​
-​
-​
RUN 4
-​
-​
414​
414​

chunk=1Mread iopsread MiB/swrite iopswrite iops
RUN 1
739​
11,50​
321​
5,02​
RUN 2
774​
3,03​
331​
1,30​
RUN 3
404​
405​
-​
-​
RUN 4
-​
-​
407​
408​

For good measure, one more on xfs with 64K chunks:
This time, I did use a few parameters to format the partition as specified here:
Code:
# mkfs.xfs -d su=128K -d sw=2 /dev/md/mdraid0_backup -f
chunk=64K (XFS)read iopsread MiB/swrite iopswrite iops
RUN 1
772​
12,10​
334​
5,22​
RUN 2
813​
3,18​
347​
1,36​
RUN 3
425​
425​
-​
-​
RUN 4
-​
-​
427​
427​

My own conclusion after all of this is that I will run 512e disks on an mdadm raid with 64k chunks formatted with xfs as shown here.

Today I will reformat all my 8TB ultrastars to 512e and recreate a raid-10 array. I'll do a few more fio runs after that just to make sure its in line with these tests on the WD's.

Hope you appreciate my efforts!
 

Spearfoot

Active Member
Apr 22, 2015
113
49
28
Thanks for all the hard work!

Your results contradict my pre-conceived notion that 4kn would out-perform 512e... learn something new every day!
 
  • Like
Reactions: Tinkerer