Comparison: 512e versus 4kn

Tinkerer · Oct 9, 2021

I heard quite often that it doesn't matter and even that 4kn is faster or that 512e comes with "a performance hit". However, I have yet to see any direct comparisons with tools and parameters I trust (such as fio), so I decided to run my own. I will try to keep this concise

.

Obligatory necessities:
Hardware: Asus Z11PA-U12-10G-2S with Xeon 4210R, 192GB DDR4 ECC.
Controller: onboard SATA
OS: Red Hat 8.4, kernel 4.18.0-305.19.1.el8_4.x86_64
NO GUI (not installed): boot to multi-user.target (runlevel 3 if you like)
Test tool: fio 3.19
Disks: 2 x WD Gold 8TB Enterprise Class Hard Disk Drive - 7200 RPM Class SATA 6 Gb/s 256MB Cache 3.5 Inch - WD8003FRYZ

Only user and optional background processes were stopped. Nothing else was changed, tuned or tweaked.
The server is running Red Hat Virtualization (Ovirt, if you will), all vm's were stopped, the server put in maintenance and its processes stopped.

Partitions were aligned at 2048 with gdisk with all defaults (n, enter, enter, enter, enter, w).
Partitions were formatted with ext4 no special parameters.

Disks checked and confirmed physical and logical blocksize:

Bash:

# cat /sys/block/sdb/queue/logical_block_size
512
# cat /sys/block/sdb/queue/physical_block_size
4096
# cat /sys/block/sdc/queue/logical_block_size
4096
# cat /sys/block/sdc/queue/physical_block_size
4096

TL;DR:

512e is ~16% faster with 4 threads of random mixed workloads of 70% read/30% write with 16k blocksize. This increases to ~19% with 8 threads and higher queuedepths with 4k blocksize. Sequential reads with 1M blocksize are 8% slower compared to 4kn, while sequential writes with 1M are 3% slower.

I did 4 runs with fio. Actually, I did A LOT more runs, but I'll spare you all those, these are the averages of it:

Bash:

RUN 1    fio --filename=fiotest --size=4GB --rw=randrw --rwmixread=70 --rwmixwrite=30 --bs=16k --ioengine=libaio --iodepth=16 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 2    fio --filename=fiotest --size=4GB --rw=randrw --rwmixread=70 --rwmixwrite=30 --bs=4k --ioengine=libaio --iodepth=32 --runtime=120 --numjobs=8 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 3    fio --filename=fiotest --size=40GB --rw=read --bs=1M --ioengine=libaio --iodepth=1 --runtime=240 --numjobs=1 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 4    fio --filename=fiotest --size=40GB --rw=write --bs=1M --ioengine=libaio --iodepth=1 --runtime=240 --numjobs=1 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1

Obligatory disclaimer:
These are my own results, YMMV. Take it as you please.

And I'll say this beforehand: I've tested this and confirmed it, using direct=1 and end_fsync on ext4 and xfs filesystems will negate caching and bufferering. In other words, there is no need to use test size twice the size of memory. This is different on ZFS for example, ARC works differently (also tested and confirmed).

Now for the numbers:

	512e read iops	512e read MiB/s	512e write iops	512e write MiB/s	4kn read iops	4kn read MiB/s	4kn write iops	4kn write MiB/s
RUN 1	359	5,62	156	2,45	298	4,67	130	2,04
RUN 2	374	1,46	162	0,63	305	1,19	132	0,52
RUN 3	175	176,00	-	-	189	190,00	-	-
RUN 4	-	-	195	196,00	-	-	201	201,00

MDADM Raid0 results
So you thought I was done? Ha! While I was at it, I decided to test a raid0 against different chunk sizes with 512e disks. Reformatted the 4kn drive with HUGO to 512e, checked and confirmed:

Bash:

# cat /sys/block/sdc/queue/logical_block_size
512
# cat /sys/block/sdc/queue/physical_block_size
4096

All arrays were created the same way, except for chunk size:

# mdadm --create --verbose /dev/md/mdraid0_backup --run --chunk=16K --metadata=1.2 --raid-devices=2 --level=0 /dev/sdb /dev/sdc

All filesystems were created the same way as above, with the exception of the xfs test (details below).

The numbers:

chunk=16K	read iops	read MiB/s	write iops	write iops
RUN 1	733	11,50	317	4,97
RUN 2	767	3	328	1,28
RUN 3	398	398	-	-
RUN 4	-	-	405	406

chunk=64K	read iops	read MiB/s	write iops	write iops
RUN 1	741	11,60	321	5,02
RUN 2	774	3,03	331	1,30
RUN 3	404	405	-	-
RUN 4	-	-	407	408

chunk=128K	read iops	read MiB/s	write iops	write iops
RUN 1	746	11,70	322	5,05
RUN 2	767	3	328	1,28
RUN 3	404	404	-	-
RUN 4	-	-	414	414

chunk=1M	read iops	read MiB/s	write iops	write iops
RUN 1	739	11,50	321	5,02
RUN 2	774	3,03	331	1,30
RUN 3	404	405	-	-
RUN 4	-	-	407	408

For good measure, one more on xfs with 64K chunks:
This time, I did use a few parameters to format the partition as specified here:

Code:

# mkfs.xfs -d su=128K -d sw=2 /dev/md/mdraid0_backup -f

chunk=64K (XFS)	read iops	read MiB/s	write iops	write iops
RUN 1	772	12,10	334	5,22
RUN 2	813	3,18	347	1,36
RUN 3	425	425	-	-
RUN 4	-	-	427	427

My own conclusion after all of this is that I will run 512e disks on an mdadm raid with 64k chunks formatted with xfs as shown here.

Today I will reformat all my 8TB ultrastars to 512e and recreate a raid-10 array. I'll do a few more fio runs after that just to make sure its in line with these tests on the WD's.

Hope you appreciate my efforts!

Spearfoot · Oct 10, 2021

Thanks for all the hard work!

Your results contradict my pre-conceived notion that 4kn would out-perform 512e... learn something new every day!

efschu3 · Oct 21, 2021

On SSds I've seen it the other way round:

TechnologyGuide

Thank you for visiting the TechnologyGuide network. Unfortunately, these forums are no longer active. We extend a heartfelt thank you to the entire community for their steadfast support—it is really you, our readers, that drove

forum.notebookreview.com

Will test my DC P3605 this weekend.

Tinkerer · Oct 21, 2021

efschu3 said:
On SSds I've seen it the other way round:

TechnologyGuide

Thank you for visiting the TechnologyGuide network. Unfortunately, these forums are no longer active. We extend a heartfelt thank you to the entire community for their steadfast support—it is really you, our readers, that drove

forum.notebookreview.com

Will test my DC P3605 this weekend.

Thanks for sharing.

It could very well be that the results would be reversed if you had tested under Linux with ext4 and fio. Not that it invalidates your results because they were taken on Windows with NTFS and different tooling (on the contrary), it does however make them incomparable to the tests I did. In the same manner, my results could turn out the other way around if I had tested on Windows with that tool.

If you can spare the time and be willing to put up with it, I'd would suggest you run your first tests multiple times in a row (up to 8 or 10 times), throw away absurdly high or absurdly low results (meaning so high or so low its unreal), and take the average of the tests that remain. This way you establish a baseline and rule out any margin of error. Comparative test runs can be run less times, maybe 3 or 4 runs should be fine to take an average. As long as you can filter out any outliers (high or low). If you run only once, you risk getting skewed results.

efschu3 · Oct 21, 2021

Oh, I'm not the author of the linked test - was just goole'ing around to see if there's a SSD comparison. For what I can say right now, the DC P3605 is REALY constant on performance concerns.

efschu3 · Oct 24, 2021

I've run your tests on my Intel DC P3605, but on whole raw disk w/o filesystem. Here the results:

Code:

RUN1
    512:
        read: IOPS=192k, BW=3004MiB/s
        write: IOPS=82.3k, BW=1287MiB/s
    4k:
        read: IOPS=192k, BW=3003MiB/s
        write: IOPS=82.3k, BW=1286MiB/s
RUN2
    512:
        read: IOPS=432k, BW=1688MiB/s
        write: IOPS=185k, BW=723MiB/s
    4k:
        read: IOPS=432k, BW=1687MiB/s
        write: IOPS=185k, BW=723MiB/s
RUN3
    512:
        read: IOPS=2976, BW=2976MiB/s
    4k:
        read: IOPS=2968, BW=2968MiB/s
RUN4
    512:
        write: IOPS=1504, BW=1505MiB/s
    4k:
        write: IOPS=1558, BW=1559MiB/s

UhClem · Oct 25, 2021

Tinkerer said:
I heard quite often that it doesn't matter ...

You really should do your (4Kn vs 512e) test on (at least) one of your Ultrastar 8TB's. There is a major difference between the 8TB Ultrastar and the WD8003FRYZ that is very likely pertinent in this context. A careful comparative review of their datasheets (Product Briefs) will enlighten.

Aside: To convey a higher degree of confidence in your above test results, you should have affirmed that the two drives had the same firmware version (and listed it); and you should have "cross-sampled" the two drives by testing each drive with both formats: 1)DrvA-4Kn, 2)DrvB-4Kn, 3)DrvA-512e, 4)DrvB-512e. (If the results are consistent,) That would effectively eliminate the possibility that one of the drives was, itself, an outlier (performance-wise)--this "rigor" is just SOP in academic/professional circles. Also, a randrw test on a HDD should probably not be limited to a 4GB span (unless that really does represent your use case).

(My first paragraph assumes that both your WD drives do have the same firmware, and that "cross-sampling" would show consistency. I.e., your test results are "correct"... but your conclusion is flawed/misguided.)

twin_savage · May 15, 2023

To add to the results, martixy from the L1 forums got these results with a Seagate ST18000NM000J:

	512e read iops	512e read MiB/s	512e write iops	512e write MiB/s	4kn read iops	4kn read MiB/s	4kn write iops	4kn write MiB/s
RUN 1	479	7.677	207	3.323	486	7.780	210	3.368
RUN 2	550	2.202	237	0.951	569	2.278	245	0.983
RUN 3	205	206	-	-	257	258	-	-
RUN 4	-	-	186	187	-	-	245	245

They would appear to run counter to the original results; perhaps certain drives have firmware optimized for 512e vs 4kn performance?

Search

Comparison: 512e versus 4kn

Tinkerer

Member

Spearfoot

Active Member

efschu3

Active Member

TechnologyGuide

Tinkerer

Member

TechnologyGuide

efschu3

Active Member

efschu3

Active Member

UhClem

just another Bozo on the bus

twin_savage

Active Member