Is bigger always better? What size of hard drives for your archives?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Bert

Well-Known Member
Mar 31, 2018
866
412
63
45
I noticed as drive sizes gets bigger and bigger; it is becoming harder and harder to fill up disks. Perhaps, it is better to have large disk arrays and keep the disk count high for higher I/O bandwidth.

I am not interested in with Random I/O but even for sequential I/O bandwidths seems to be limited. Are there SAS3 HDDs that are capable of going over 250MB/sec peak sequential write speeds? AFAIS, they are pretty much topping around that, even for the nearline HDDs with 20TB and more capacities.



Why do sequential read/write speeds not scale with capacity?
 
Last edited:

nexox

Well-Known Member
May 3, 2023
730
313
63
Are there SAS3 HDDs that are capable of going over 200MB/sec peak sequential write speeds?
It's been a long time since I did a sequential benchmark with no other load running, but my 6TB Seagate SAS drives would do better than 200MB/s - they're on an ancient SAS1 expander and four disks could saturate the four lane uplink at ~1100MB/s back when I first set it up. I assume newer disks aren't slower, increased density usually leads to increased sequential IO at the same platter RPM.

That said I'm not the person to ask about fewer larger vs more smaller drives, because my array of now 8x6TB drives is fine and when I finally get done fiddling with my new fileserver and install those things on a SAS3 expander I'm going to get a huge performance improvement. (Not least because my current fileserver CPU can't even keep up with the SAS1 expander doing RAID6 parity computations on writes.)

I do keep looking at bigger drives because they're just so cheap, but they're still not cheap enough to win on power savings cutting four or five drives, in any case I expect when it's time to upgrade I can do something like bcachefs so random smaller SSDs can handle the sequential write bursts and a few big spinners in a parity RAID can make up the bulk/cold storage for that Office Space DVD rip I downloaded over dialup but will realistically never watch, because I have it in like 3 higher resolutions.
 

BlueFox

Legendary Member Spam Hunter Extraordinaire
Oct 26, 2015
2,122
1,538
113
They have been? Datasheet you linked even says 298MB/s. They have been slowly increasing with platter density.

You're not going to see a linear increase outside of the maximum possible capacity because smaller drives generally wind up using fewer platters and are usually a couple generations behind in terms of density.
 
  • Like
Reactions: mach3.2 and nexox

Tech Junky

Active Member
Oct 26, 2023
393
129
43
If you want something faster look at the Seagate mach.2 which can hit SATA SSD speeds with the dual actuator setup. Otherwise look at the U.x drives for true SSD speeds using NVME. My Kioxia CD8 benches at 6.5GB/s.
 

CyklonDX

Well-Known Member
Nov 8, 2022
872
293
63
Why do sequential read/write speeds not scale with capacity?
Its a good question in theory it should increase the speed, since data is written in smaller format and your platters still spin at same speed.

Here's my theory on why:
The problem is the format in which they are written, or read and while the data is written in smaller physical space on the platters, its also using more compressed encoding bits process before its written to or read from the platters. To read or write your data is also bigger struggle that takes more time - its likely why there's greatly increased cache on bigger disks - as its possible it needs to read much larger portion of disk to make sense of the data it has. I can imagine that if they wrote in older format it would be possible to provide great gains in performance but also hitting the space hard.

Different technologies obviously slowly overtake the hdd storage, and prices are coming down too. (tho not as fast as it was 2-3 years ago before covid - we had half the price every year, and new size every 1.5 year ~ we've been stuck at 2t 150ish usd for decent ssd for a while now.)
 
  • Like
Reactions: Bert

nexox

Well-Known Member
May 3, 2023
730
313
63
If you want something faster look at the Seagate mach.2 which can hit SATA SSD speeds with the dual actuator setup.
Just out of curiosity, have you used these drives? Almost everything I can find about them is just a regurgitation of the same press release, I entirely forgot they existed until you mentioned them, but Server Parts Deals has some of the 2x14TB SAS drives for a not-unreasonable price...
 

twin_savage

Member
Jan 26, 2018
69
37
18
33
Fun fact, on normal single actuator drives only 1 head is ever reading or writing at a time, even though some of the more flagship level drives can have as many as 20 heads.


I entirely forgot they existed until you mentioned them, but Server Parts Deals has some of the 2x14TB SAS drives for a not-unreasonable price...
If you do go with mach.2 drives I'd stay away from the Sata variety and use SAS instead, they have fundamentally different ways of exposing themselves to the OS and the Sata mach.2 drives have some annoying limitations because they use sector boundaries instead of luns for the two actuators.



Here's my theory on why:
The problem is the format in which they are written, or read and while the data is written in smaller physical space on the platters, its also using more compressed encoding bits process before its written to or read from the platters. To read or write your data is also bigger struggle that takes more time - its likely why there's greatly increased cache on bigger disks - as its possible it needs to read much larger portion of disk to make sense of the data it has. I can imagine that if they wrote in older format it would be possible to provide great gains in performance but also hitting the space hard.
We've been using the exact same encoding scheme for hdds for roughly the past 15 years, called advanced format. To my knowledge there isn't even another standard in the pipeline because of how good AF is. Sometimes AF drives will be called 4k or 512e drives.
 
  • Like
Reactions: nexox

nexox

Well-Known Member
May 3, 2023
730
313
63
If you do go with mach.2 drives I'd stay away from the Sata variety and use SAS instead, they have fundamentally different ways of exposing themselves to the OS and the Sata mach.2 drives have some annoying limitations because they use sector boundaries instead of luns for the two actuators.
I was wondering how SATA would handle that, I assumed it would be suboptimal, glad the ones I just maybe ordered are SAS.
 

CyklonDX

Well-Known Member
Nov 8, 2022
872
293
63
We've been using the exact same encoding scheme for hdds for roughly the past 15 years, called advanced format. To my knowledge there isn't even another standard in the pipeline because of how good AF is. Sometimes AF drives will be called 4k or 512e drives.
Thats not what i meant. I meant the analog data bits not logical sectors and such. (there are lot of tricks to interpret sequences of bits as much more complex data forms like qam); The different modes like PMR, SMR, HMR, CMR and many others are different forms also are reflective of such methods.
1714710498106.png
 

Tech Junky

Active Member
Oct 26, 2023
393
129
43
@nexox

I was eying them but they never went on sale before I changed my mind and went in a different direction with chips instead of spinners. When they did hit the market the price was a bit extreme for a spinner even with the capacity and speed in comparison.
 
  • Like
Reactions: nexox

i386

Well-Known Member
Mar 18, 2016
4,267
1,562
113
34
Germany
Why do sequential read/write speeds not scale with capacity?
rotational speed (7200rpm/120 rotations per second for most drives)
numbers of platters (inside the hdd casing)

higher rotation speed would increase heat and noise levels (sas hdds are screamers at 15k rpm and produce a lot of heat)
numbers of platters would add weight and require more powerful motors (for starting and operation) -> more heat
 

Bert

Well-Known Member
Mar 31, 2018
866
412
63
45
Its a good question in theory it should increase the speed, since data is written in smaller format and your platters still spin at same speed.

Here's my theory on why:
The problem is the format in which they are written, or read and while the data is written in smaller physical space on the platters, its also using more compressed encoding bits process before its written to or read from the platters. To read or write your data is also bigger struggle that takes more time - its likely why there's greatly increased cache on bigger disks - as its possible it needs to read much larger portion of disk to make sense of the data it has. I can imagine that if they wrote in older format it would be possible to provide great gains in performance but also hitting the space hard.

Different technologies obviously slowly overtake the hdd storage, and prices are coming down too. (tho not as fast as it was 2-3 years ago before covid - we had half the price every year, and new size every 1.5 year ~ we've been stuck at 2t 150ish usd for decent ssd for a while now.)
It seems like this is the only viable explanation, perhaps we need an hard drive expert to confirm.

With higher density, more data is being read but it seems like the circuitry to process that data is growing at the same rate limiting the sequential I/O.

I am going to stick with smaller size disks around 4TB and 6TB for my archival storage hoping that I can get max theoretical bandwidth with Raid 0.
 

BlueFox

Legendary Member Spam Hunter Extraordinaire
Oct 26, 2015
2,122
1,538
113
There's no theory, it's purely a function of platter density. Nothing to do with encoding, cache, etc. My first post sums it up pretty well.
 

nexox

Well-Known Member
May 3, 2023
730
313
63
One issue is that density is a combination of two things that don't necessarily increase at the same rate - number of sectors per track and number of tracks per platter. If a drive manufacturer manages to add more tracks of the same number of sectors then the sequential IO per platter won't change (assuming the same RPM,) because the bits will move past the head at the same rate.
 
  • Like
Reactions: Bert

twin_savage

Member
Jan 26, 2018
69
37
18
33
Thats not what i meant. I meant the analog data bits not logical sectors and such. (there are lot of tricks to interpret sequences of bits as much more complex data forms like qam); The different modes like PMR, SMR, HMR, CMR and many others are different forms also are reflective of such methods.
AF specifies how the actual data bits on the platters are encoded, not just the logical sector composition (which explains why there are no advanced format SSDs).

PMR, SMR, HMR, CMR and even TDMR all encode data onto the platters of the drive in the exact same way; if you were to take a GME and scan the entire surface of a hdd platter they'd all look effectively the same between between the different technologies, with the only variance being the SMR disks would have their tracks spaced more closely together and would have some non-fixed blank "buffer tracks".

There isn't any extra abstraction (if we discount some of the clever positional skewing and servo tracking tricks) happening when writing data out to the surface of the platter; there are schemes some manufactures use when reading data like sampling heuristics upon a failed LDCP or two dimensional recording technology's ability to better track tracks, but these are just tricks to resolve the existing data encode better.

In the far future multi-layer hard drives might change this though, but I wouldn't expect that to happen for more than a decade.
 

Bert

Well-Known Member
Mar 31, 2018
866
412
63
45
There's no theory, it's purely a function of platter density. Nothing to do with encoding, cache, etc. My first post sums it up pretty well.
I don't see how this explains. 1TB drive has 180MB/sec vs 24TB has 290MB/sec. If sequential I/O is only related to density, 24TB drive should have hit 4000MB/sec.

It seems like heads there is some other factor.
 

Bert

Well-Known Member
Mar 31, 2018
866
412
63
45
One issue is that density is a combination of two things that don't necessarily increase at the same rate - number of sectors per track and number of tracks per platter. If a drive manufacturer manages to add more tracks of the same number of sectors then the sequential IO per platter won't change (assuming the same RPM,) because the bits will move past the head at the same rate.
Ah this is a good one. Density increase can come from having more tracks not having more bits in the same length of the track. This explains it very well.

Is there any other limitation in regards to platter? For example, I expect a 10 platter drive having 10x more throughput than 1 platter drive but that doesn't seem to be the case as well. For example:


 

twin_savage

Member
Jan 26, 2018
69
37
18
33
I don't see how this explains. 1TB drive has 180MB/sec vs 24TB has 290MB/sec. If sequential I/O is only related to density, 24TB drive should have hit 4000MB/sec.
The bulk of the speed difference, or lack thereof considering the size difference, is the result of platter count in this hypothetical instance.
The hdd is only going to be read/write to one side of a platter at a time, so only a single platter's density is feeding into the speed of the drive.
There are some minority factors like the servo's ability to quickly track without error that also slows drives down (and explains why there isn't a linear scaling of speed with platter density).

I couldn't explain all the intricacies of the the servo error affecting read/write speeds, but this article gives a decent idea of the issues going on:
 
  • Like
Reactions: Bert

BlueFox

Legendary Member Spam Hunter Extraordinaire
Oct 26, 2015
2,122
1,538
113
I don't see how this explains. 1TB drive has 180MB/sec vs 24TB has 290MB/sec. If sequential I/O is only related to density, 24TB drive should have hit 4000MB/sec.

It seems like heads there is some other factor.
You're making the assumption that capacity is linearly related to platter density. It isn't. Like I stated in my original post, lower capacity drives will also have fewer platters. Many datasheets specify density, including the HC580 one you provided. Lets look at some examples:

24TB HC580: 1210Gbits/sq in and 298MB/s
10TB HC510: 816Gbits/sq in and 249MB/s

Approximately 48% higher density, but since we're not working in 1 dimension, take the square root and you're left with ~22%. Pretty close to the difference between the two drives (~20%). Of course the math isn't perfect because tracks and sectors are not square, nor are both axes shrinking at the same rate, but I think it fits.

You can do the math with 1TB drives and you'll see it adds up as well: https://documents.westerndigital.co...a200-series/data-sheet-ultrastar-dc-ha210.pdf
 
  • Like
Reactions: Bert and nexox

Bert

Well-Known Member
Mar 31, 2018
866
412
63
45
The bulk of the speed difference, or lack thereof considering the size difference, is the result of platter count in this hypothetical instance.
The hdd is only going to be read/write to one side of a platter at a time, so only a single platter's density is feeding into the speed of the drive.
There are some minority factors like the servo's ability to quickly track without error that also slows drives down (and explains why there isn't a linear scaling of speed with platter density).

I couldn't explain all the intricacies of the the servo error affecting read/write speeds, but this article gives a decent idea of the issues going on:
This is very strange. I expect all platters to be used. Physically it should be possible but Article says tracks cannot be aligned but I thought read and write units were along the lines of cyclinder.

I don't quite understand why they are not investing into using multiple platters from a single head.

Now it is clear what the discrepancy is for. This also explains why they are putting multiple heads, I thought that was for random I/O. It looks like it will possibly double writr sequential write and read speeds.
 
Last edited:
  • Like
Reactions: nexox