Comparison: 512e versus 4kn

Tinkerer

Member
Sep 5, 2020
45
13
8
I heard quite often that it doesn't matter and even that 4kn is faster or that 512e comes with "a performance hit". However, I have yet to see any direct comparisons with tools and parameters I trust (such as fio), so I decided to run my own. I will try to keep this concise :).

Obligatory necessities:
Hardware: Asus Z11PA-U12-10G-2S with Xeon 4210R, 192GB DDR4 ECC.
Controller: onboard SATA
OS: Red Hat 8.4, kernel 4.18.0-305.19.1.el8_4.x86_64
NO GUI (not installed): boot to multi-user.target (runlevel 3 if you like)
Test tool: fio 3.19
Disks: 2 x WD Gold 8TB Enterprise Class Hard Disk Drive - 7200 RPM Class SATA 6 Gb/s 256MB Cache 3.5 Inch - WD8003FRYZ

Only user and optional background processes were stopped. Nothing else was changed, tuned or tweaked.
The server is running Red Hat Virtualization (Ovirt, if you will), all vm's were stopped, the server put in maintenance and its processes stopped.

Partitions were aligned at 2048 with gdisk with all defaults (n, enter, enter, enter, enter, w).
Partitions were formatted with ext4 no special parameters.

Disks checked and confirmed physical and logical blocksize:
Bash:
# cat /sys/block/sdb/queue/logical_block_size
512
# cat /sys/block/sdb/queue/physical_block_size
4096
# cat /sys/block/sdc/queue/logical_block_size
4096
# cat /sys/block/sdc/queue/physical_block_size
4096
TL;DR:
512e is ~16% faster with 4 threads of random mixed workloads of 70% read/30% write with 16k blocksize. This increases to ~19% with 8 threads and higher queuedepths with 4k blocksize. Sequential reads with 1M blocksize are 8% slower compared to 4kn, while sequential writes with 1M are 3% slower.

I did 4 runs with fio. Actually, I did A LOT more runs, but I'll spare you all those, these are the averages of it:
Bash:
RUN 1    fio --filename=fiotest --size=4GB --rw=randrw --rwmixread=70 --rwmixwrite=30 --bs=16k --ioengine=libaio --iodepth=16 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 2    fio --filename=fiotest --size=4GB --rw=randrw --rwmixread=70 --rwmixwrite=30 --bs=4k --ioengine=libaio --iodepth=32 --runtime=120 --numjobs=8 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 3    fio --filename=fiotest --size=40GB --rw=read --bs=1M --ioengine=libaio --iodepth=1 --runtime=240 --numjobs=1 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 4    fio --filename=fiotest --size=40GB --rw=write --bs=1M --ioengine=libaio --iodepth=1 --runtime=240 --numjobs=1 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
Obligatory disclaimer:
These are my own results, YMMV. Take it as you please.

And I'll say this beforehand: I've tested this and confirmed it, using direct=1 and end_fsync on ext4 and xfs filesystems will negate caching and bufferering. In other words, there is no need to use test size twice the size of memory. This is different on ZFS for example, ARC works differently (also tested and confirmed).

Now for the numbers:
512e read iops512e read MiB/s512e write iops512e write MiB/s4kn read iops4kn read MiB/s4kn write iops4kn write MiB/s
RUN 1
359​
5,62​
156​
2,45​
298​
4,67​
130​
2,04​
RUN 2
374​
1,46​
162​
0,63​
305​
1,19​
132​
0,52​
RUN 3
175​
176,00​
-​
-​
189​
190,00​
-​
-​
RUN 4
-​
-​
195​
196,00​
-​
-​
201​
201,00​

MDADM Raid0 results
So you thought I was done? Ha! While I was at it, I decided to test a raid0 against different chunk sizes with 512e disks. Reformatted the 4kn drive with HUGO to 512e, checked and confirmed:

Bash:
# cat /sys/block/sdc/queue/logical_block_size
512
# cat /sys/block/sdc/queue/physical_block_size
4096
All arrays were created the same way, except for chunk size:
# mdadm --create --verbose /dev/md/mdraid0_backup --run --chunk=16K --metadata=1.2 --raid-devices=2 --level=0 /dev/sdb /dev/sdc
All filesystems were created the same way as above, with the exception of the xfs test (details below).

The numbers:
chunk=16Kread iopsread MiB/swrite iopswrite iops
RUN 1
733​
11,50​
317​
4,97​
RUN 2
767​
3​
328​
1,28​
RUN 3
398​
398​
-​
-​
RUN 4
-​
-​
405​
406​

chunk=64Kread iopsread MiB/swrite iopswrite iops
RUN 1
741​
11,60​
321​
5,02​
RUN 2
774​
3,03​
331​
1,30​
RUN 3
404​
405​
-​
-​
RUN 4
-​
-​
407​
408​

chunk=128Kread iopsread MiB/swrite iopswrite iops
RUN 1
746​
11,70​
322​
5,05​
RUN 2
767​
3​
328​
1,28​
RUN 3
404​
404​
-​
-​
RUN 4
-​
-​
414​
414​

chunk=1Mread iopsread MiB/swrite iopswrite iops
RUN 1
739​
11,50​
321​
5,02​
RUN 2
774​
3,03​
331​
1,30​
RUN 3
404​
405​
-​
-​
RUN 4
-​
-​
407​
408​

For good measure, one more on xfs with 64K chunks:
This time, I did use a few parameters to format the partition as specified here:
Code:
# mkfs.xfs -d su=128K -d sw=2 /dev/md/mdraid0_backup -f
chunk=64K (XFS)read iopsread MiB/swrite iopswrite iops
RUN 1
772​
12,10​
334​
5,22​
RUN 2
813​
3,18​
347​
1,36​
RUN 3
425​
425​
-​
-​
RUN 4
-​
-​
427​
427​

My own conclusion after all of this is that I will run 512e disks on an mdadm raid with 64k chunks formatted with xfs as shown here.

Today I will reformat all my 8TB ultrastars to 512e and recreate a raid-10 array. I'll do a few more fio runs after that just to make sure its in line with these tests on the WD's.

Hope you appreciate my efforts!
 

Spearfoot

Active Member
Apr 22, 2015
113
49
28
Thanks for all the hard work!

Your results contradict my pre-conceived notion that 4kn would out-perform 512e... learn something new every day!
 
  • Like
Reactions: Tinkerer

Tinkerer

Member
Sep 5, 2020
45
13
8
On SSds I've seen it the other way round:

Will test my DC P3605 this weekend.
Thanks for sharing.

It could very well be that the results would be reversed if you had tested under Linux with ext4 and fio. Not that it invalidates your results because they were taken on Windows with NTFS and different tooling (on the contrary), it does however make them incomparable to the tests I did. In the same manner, my results could turn out the other way around if I had tested on Windows with that tool.

If you can spare the time and be willing to put up with it, I'd would suggest you run your first tests multiple times in a row (up to 8 or 10 times), throw away absurdly high or absurdly low results (meaning so high or so low its unreal), and take the average of the tests that remain. This way you establish a baseline and rule out any margin of error. Comparative test runs can be run less times, maybe 3 or 4 runs should be fine to take an average. As long as you can filter out any outliers (high or low). If you run only once, you risk getting skewed results.
 

efschu3

Member
Mar 11, 2019
70
16
8
Oh, I'm not the author of the linked test - was just goole'ing around to see if there's a SSD comparison. For what I can say right now, the DC P3605 is REALY constant on performance concerns.
 

efschu3

Member
Mar 11, 2019
70
16
8
I've run your tests on my Intel DC P3605, but on whole raw disk w/o filesystem. Here the results:

Code:
RUN1
    512:
        read: IOPS=192k, BW=3004MiB/s
        write: IOPS=82.3k, BW=1287MiB/s
    4k:
        read: IOPS=192k, BW=3003MiB/s
        write: IOPS=82.3k, BW=1286MiB/s
RUN2
    512:
        read: IOPS=432k, BW=1688MiB/s
        write: IOPS=185k, BW=723MiB/s
    4k:
        read: IOPS=432k, BW=1687MiB/s
        write: IOPS=185k, BW=723MiB/s
RUN3
    512:
        read: IOPS=2976, BW=2976MiB/s
    4k:
        read: IOPS=2968, BW=2968MiB/s
RUN4
    512:
        write: IOPS=1504, BW=1505MiB/s
    4k:
        write: IOPS=1558, BW=1559MiB/s
 

UhClem

Active Member
Jun 26, 2012
164
63
28
NH, USA
I heard quite often that it doesn't matter ...
You really should do your (4Kn vs 512e) test on (at least) one of your Ultrastar 8TB's. There is a major difference between the 8TB Ultrastar and the WD8003FRYZ that is very likely pertinent in this context. A careful comparative review of their datasheets (Product Briefs) will enlighten.

Aside: To convey a higher degree of confidence in your above test results, you should have affirmed that the two drives had the same firmware version (and listed it); and you should have "cross-sampled" the two drives by testing each drive with both formats: 1)DrvA-4Kn, 2)DrvB-4Kn, 3)DrvA-512e, 4)DrvB-512e. (If the results are consistent,) That would effectively eliminate the possibility that one of the drives was, itself, an outlier (performance-wise)--this "rigor" is just SOP in academic/professional circles. Also, a randrw test on a HDD should probably not be limited to a 4GB span (unless that really does represent your use case).

(My first paragraph assumes that both your WD drives do have the same firmware, and that "cross-sampling" would show consistency. I.e., your test results are "correct"... but your conclusion is flawed/misguided.)