64 MB vs 128 MB buffer? Advanced Format, Any benefits?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Boddy

Active Member
Oct 25, 2014
772
144
43
Hi all,

I'm looking to purchase some 6 or 8 TB HGST Helium drives for backup of database, about 20 to 30 TB.
(Probably RAID 6 or Raid2z set up.)

Older HGST drives have 64 MB buffer where newer drives 128 MB.
There is about $60 or 25% price difference between older and newer drives.
For a backup scenario, is there any benefit with 128 MB buffer to justify price increase?

The newer drives have 'Advanced Format'. "AF" is said to have enhanced ECC capabilities.
Similarly, in a RAID set up with a database of 20-30 TB, is there sufficient benefit to justify a $60 or 25% price difference?

In a RAID set up, is it better to have 8 x 6TB drives or 6 x 8TB drives?
Many thanks in advance. Cheers :)
 

Boddy

Active Member
Oct 25, 2014
772
144
43
hi @j_h_o thanks for your reply.

It would be nice to minimise power but availability/reliability of main database is my main concern.
If server was onsite, then I would power on when I needed to do a back up, but it is highly likely that the server will be off site as I don't have fibre or sufficient bandwidth to host the site.
I have several RAID controllers I could use, likely to be Intel RS25DB080.
Alternatively I have a RS25SB008, RMS25KB080 available.

What do you suggest?
Foe back up purposes, does extra 64 MB buffer or 'Advanced Format' hard drive make any noticeable difference to spend $60 per drive?
 

j_h_o

Active Member
Apr 21, 2015
644
180
43
California, US
For backup, I'd be worried about:
  • what kind of network connection you have between the production workload and the backup set, which will dominate the performance more than the drive buffer
  • what happens when one of the drives fails in the backup array (are you onsite to swap a drive?)
  • what kind of SLA you have on your dataset/how quickly you need to be able to recover; are you live replicating the DB elsewhere or is this your primary backup? Or is this a lagged/offsite copy that will be infrequently used?
  • idle power usage of your drives
If you have multiple 10Gbps links between the servers, then the number of spindles and/or buffer might actually start playing into things.

What OS are you running though? Is it in a VM/virtualization? For ECC/bitrot, using ZFS or ReFS (if using hardware RAID) might be good options. I'd personally use ZFS here, but it depends what your DB is running on, and if you're using NFS, SMB, etc. between your servers. Is that what you're doing, by the way? Transferring data across the network to another server in the same room?

I don't think the buffer or AF plays into this at all. I think the question really is size of drive/number of spindles in your case. And this is generally a question/balance of power, performance, and recoverability when a drive fails. And possibly the number of drive bays in your enclosure. If you're using hardware RAID 5 I wouldn't go more than 6 drives, for example. If you're in an enclosure, I try not to fill all the bays, and give myself some headroom for new/replacement drives when this set needs to be decommissioned/needs drives replaced.

If you're going hardware RAID, then I'd try not to exceed 6 drives per set. Buy whatever size drives needed.
If you're doing ZFS RAIDZ2, then 6 drives is where you start. But then I question why you're doing RAIDZ2 for a backup.

Personally, I'd do more copies of your data (i.e. 2 or 3 separate snapshots of the data, taken at different times, etc.) rather than building in more redundancy for your single backup snapshot.

How often do you think you'll need this backup? What do you do most frequently with the backup? Rollback transaction logs?

If you want more guidance, then you should reframe your questions/provide more information:
  • What OS are you running on the production workload? Is it in a VM that you can replicate/snapshot?
  • What enclosure would these drives sit in?
  • What network connection will be running between the servers/In case of failure, how quickly do you need to recover the data?/How quickly do you need backups to complete? How often will you be backing up?
  • What's your ideal recovery timeline/timeframe/scenario? How often do you expect to do that?
  • Is this your primary/only onsite copy of the database? Or how many copies will there be? Do you need multiple in-time snapshots/lagged copies?
  • Would this be running on the same server as production (ugh) or a separate box?
  • How much growth of data are you anticipating and how much headroom do you need on your backup array?
 

Boddy

Active Member
Oct 25, 2014
772
144
43
Thanks so much @j_h_o for your extensive and detailed information. Very much appreciated! Great information on backups :)
All good points and certainly I must give consideration to closer to 'production'.

I'm planning on SSDs (or a combination of HDDs and SSD cache) for live site, my first tier back up some 400 GB 10K SAS drives and these 6 TB helium's are for 2nd or 3rd tier back ups. I'm even considering using a 200 GB Microsoft cloud account (as you get with some external HDDs) as a remote back up option.

I thought with RAIDz I'd have some redundancy in my backups. Perhaps this is unnecessary. If you had any suggestions I'd welcome your feedback. Regards :)
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,142
594
113
New York City
www.glaver.org
I thought with RAIDz I'd have some redundancy in my backups. Perhaps this is unnecessary. If you had any suggestions I'd welcome your feedback.
I'm using HUH728080AL4200 drives in my RAIDzilla 2.5 systems. That's the SAS version with 4Kn sectors (the host controller does transfers in 4KB chunks). That is probably not in the price range you're looking at - the SATA 512e (advertising link, not from me) drives are less than half the price, or even less if you shuck the drives out of cases.

As far as backup, I'm doing a combination of replication and tape backup. Details here (including shell scripts and other examples).
 
  • Like
Reactions: Boddy

Boddy

Active Member
Oct 25, 2014
772
144
43
Hey @Terry Kennedy thanks for that! I had quick read of your article. I like it. I was considering the SAS version of HGST 8 TB as they are not that expensive, price per TB as other SATA HGST drives. I've seen the 8 TB HUH728080AL5204 going for US$424. Would you recommend this drive? As a secondary back up, do you think SAS is important? I'll have to check the RAID cards mentioned above to see if they transfer in 4Kn sections. Thanks for feedback, as it helps me clarify what drives I will get. Cheers
 

Boddy

Active Member
Oct 25, 2014
772
144
43
I understand the HSGT HUH728080AL5204 is a 512 emulating drive, but according to HGST literature all the newer HGST with 'Advanced Format' natively write in 4K block sizes. https://www.hgst.com/sites/default/files/resources/AFtechbrief.pdf

He8 HUH728080AL4200 is native 4K and is about 50% dearer at $660.

Is it necessary to purchase a He8 HUH728080AL4200 when the above HUH728080AL5204 appears to be able to write in 4K or have I misunderstood something?
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,142
594
113
New York City
www.glaver.org
I was considering the SAS version of HGST 8 TB as they are not that expensive, price per TB as other SATA HGST drives. I've seen the 8 TB HUH728080AL5204 going for US$424. Would you recommend this drive?
I don't have a large enough population of HGST He8 drives (drives * years) to have a good opinion yet. However, none of them have failed so far. They seem to be well-respected drives from what I've read (and who I know is using them in OEM systems).
As a secondary back up, do you think SAS is important?
The SAS versions of many drives seem to be more reliable than the SATA versions. Again, no experience with the He8 SATA drives. If you're going to put drives behind a SAS expander, you get higher transfer rates with SAS and don't have to deal with SAS/SATA translation in the expander, which can be problematic).
I'll have to check the RAID cards mentioned above to see if they transfer in 4Kn sections.
That is often hard information to discover on older cards. The 9201-16i card I'm using (with 20.00.07.00 firmware) definitely supports them. You'll likely have better luck with host adapters (IT) than with RAID cards (IR).
He8 HUH728080AL4200 is native 4K and is about 50% dearer at $660.
I can tell you I don't pay anything near that price (for new, 5-year-warranty drives).
Is it necessary to purchase a He8 HUH728080AL4200 when the above HUH728080AL5204 appears to be able to write in 4K or have I misunderstood something?
Almost all drives these days are 4K on the "inside". The difference is how they act at the connector. A 512e drive internally blocks / deblocks to 4K. This can cause performance issues if accesses are not aligned (causing 8 read-modify-write cycles per 4K written). Most modern operating systems will automatically detect 512e drives and have at least some optimization for them (most SSDs are 512e). 4Kn drives also have 4K sectors at the connector. The controller and operating system both need to be able to deal with a sector that's 8x bigger than traditional sectors. On the controller, this cuts down on per-sector overhead and the operating system may also perform better with larger chunks of data.