announcing my "bulk hard drive testing" script for Linux on STH

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BLinux

cat lover server enthusiast
Jul 7, 2016
2,734
1,123
113
artofserver.com
in various "Great Deal" threads, I've shared how i run badblocks to test my HDDs and during some of those great deals, I use my 24-bay Supermicro to test as many as I can in parallel. in the most recent of such discussions here:

https://forums.servethehome.com/ind...b-sata-for-85-after-coupon.21440/#post-199937

I decided to write a script to automate such testing. So, yesterday I threw together this script I call 'bht' and wanted to share with other STHers. It can be found on github:

ezonakiusagi/bht

I've done some rudimentary testing, but please use it with some caution as the badblocks testing is data destructive (don't accidentally run the test on your boot drive!). Instructions are provided in the README, but feel free to ask questions here.

welcome any feedback. it's just a first draft really...

UPDATE1: fixed issue when testing SAS drives and added some protections against accidental data destruction.

UPDATE2: made a video demonstrating this tool:

 
Last edited:

BLinux

cat lover server enthusiast
Jul 7, 2016
2,734
1,123
113
artofserver.com
for anyone who tried using my bht script, another user let me know that it had problems with SAS drives. my test rig was full of SATA drives at the time I put this together, and I falsely assumed the SMART output would be similar for both SAS and SATA drives. Anyway, I've fixed it up and tested with some SAS drives in my test rig. Also, added some protection checks so that the user doesn't accidentally start running a test on a drive that is mounted, part of a LVM2 volume group, or part of a ZFS pool. I'm not sure how to check if a drive is part of a mdraid software raid set though... any ideas?

Anyway, it's new and improved: support SAS + better protection against accidental data destruction.
 

tommybackeast

Active Member
Jun 10, 2018
286
105
43
in various "Great Deal" threads, I've shared how i run badblocks to test my HDDs and during some of those great deals, I use my 24-bay Supermicro to test as many as I can in parallel. in the most recent of such discussions here:

https://forums.servethehome.com/ind...b-sata-for-85-after-coupon.21440/#post-199937

I decided to write a script to automate such testing. So, yesterday I threw together this script I call 'bht' and wanted to share with other STHers. It can be found on github:

ezonakiusagi/bht

I've done some rudimentary testing, but please use it with some caution as the badblocks testing is data destructive (don't accidentally run the test on your boot drive!). Instructions are provided in the README, but feel free to ask questions here.

welcome any feedback. it's just a first draft really...

UPDATE1: fixed issue when testing SAS drives and added some protections against accidental data destruction.

UPDATE2: made a video demonstrating this tool:

Since you wrote this you obviously have knowledge of the complex workings of HDD ...

In my past, I always tested HDD with HDSentinel on a windows box and was very happy with that process.

I am total linux noob; so am curious how testing via Linux and testing via a Windows program like HDSentinel differs? thanks
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,734
1,123
113
artofserver.com
Since you wrote this you obviously have knowledge of the complex workings of HDD ...
That would be an massive overstatement. my script just automates the tedious task of running badblocks program on each drive. Neither that nor badblocks is particularly sophisticated.

In my past, I always tested HDD with HDSentinel on a windows box and was very happy with that process.

I am total linux noob; so am curious how testing via Linux and testing via a Windows program like HDSentinel differs? thanks
I'm the opposite, I haven't used Windows in a meaningful way for over a decade. I find it difficult to use now, and a pain (mostly due to my lack of knowledge). So, I couldn't tell you the difference between that tool and what i do with badblocks in my script.

the video posted in OP explains what badblocks does. if you can find an explanation of what HDSentinel does, you can compare them to judge for yourself what is better or worse.
 

Octopuss

Active Member
Jun 30, 2019
459
73
28
Czech republic
I know basically nothing about Linux, but I do have a server running TrueNAS.
I found out about this script recently, and it seems it's exactly what I'm looking for to test five new disks.
But I have no idea how to run it.
I copied the text from GitHub into a new file called bht, but when I try to run it, this happens:

octopuss@Skladiste:~ $ bht
-sh: bht: not found

Can someone give me step by step idiot instructions what to do please?
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,734
1,123
113
artofserver.com
I know basically nothing about Linux, but I do have a server running TrueNAS.
I found out about this script recently, and it seems it's exactly what I'm looking for to test five new disks.
But I have no idea how to run it.
I copied the text from GitHub into a new file called bht, but when I try to run it, this happens:

octopuss@Skladiste:~ $ bht
-sh: bht: not found

Can someone give me step by step idiot instructions what to do please?
I don't think my bht will run correctly in TrueNAS as I wrote it mostly for Linux environment. The error you show means that "bht" is not in the PATH environment so your shell doesn't know where to find the program.

if bht is in your current working directly, you can run it like this instead:

# ./bht --help

If that works, then you can run it like that.. but again, I don't think it's going to work in TrueNAS.
 
  • Like
Reactions: T_Minus

Breezy2428

Member
Jul 30, 2023
45
17
8
I ran iostat -d near the beginning of bht ~250MB/s x 9 = ~2.2GB/s in total?! it tapered down to about 170MB/s towards the tail of the first write.

The drives are 14TB SAS but the backplne HBA and motherboard etc are a decade old I am impressed by the writes even if it is a best case scenario.

badblocks[Testing with pattern 0xaa 6.81% done, 1:01:18 elapsed. (0/0/0 errors)]


kB_wrtn/s
sda 256128.00
sdb 254976.00
sdd 235776.00
sde 244224.00
sdf 241152.00
sdg 257152.00
sdh 266816.00
sdi 229568.00
sdj 234176.00

it has now moved on to the read portion, IO stat shows a lot of writing still happening and not much reading? is this normal?

badblocks[Reading and comparing 4.20% done, 19:19:21 elapsed. (0/0/0 errors)]



Device tps kB_read/s kB_wrtn/s
sda 1439.14 13626.84 170582.78
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,734
1,123
113
artofserver.com
I ran iostat -d near the beginning of bht ~250MB/s x 9 = ~2.2GB/s in total?! it tapered down to about 170MB/s towards the tail of the first write.

The drives are 14TB SAS but the backplne HBA and motherboard etc are a decade old I am impressed by the writes even if it is a best case scenario.

badblocks[Testing with pattern 0xaa 6.81% done, 1:01:18 elapsed. (0/0/0 errors)]


kB_wrtn/s
sda 256128.00
sdb 254976.00
sdd 235776.00
sde 244224.00
sdf 241152.00
sdg 257152.00
sdh 266816.00
sdi 229568.00
sdj 234176.00

it has now moved on to the read portion, IO stat shows a lot of writing still happening and not much reading? is this normal?

badblocks[Reading and comparing 4.20% done, 19:19:21 elapsed. (0/0/0 errors)]



Device tps kB_read/s kB_wrtn/s
sda 1439.14 13626.84 170582.78
Have all of the drives started the read/compare test? I find that some drives are slightly faster than others. There's always one that finishes first and the others can be a few minutes behind so they are still writing the 0xAA pattern.
 

Breezy2428

Member
Jul 30, 2023
45
17
8
I want to tank you for your channel it has been invaluable resource along with STH for someone like me starting out with server gear.

Yes they all started the read/compare, I did not catch it at the end so they were all running writes one --status, then all reading the next, this was the last Writting, there are minor differences but nothing worrying.



WUH721414AL4204_Y6GGRS0D:
badblocks[Testing with pattern 0xaa 76.80% done, 12:52:12 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGV7TD:
badblocks[Testing with pattern 0xaa 76.71% done, 12:52:12 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGU1UD:
badblocks[Testing with pattern 0xaa 76.38% done, 12:52:12 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGULND:
badblocks[Testing with pattern 0xaa 76.64% done, 12:52:13 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGV6ED:
badblocks[Testing with pattern 0xaa 76.29% done, 12:52:14 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGVJAD:
badblocks[Testing with pattern 0xaa 76.60% done, 12:52:14 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGSK0D:
badblocks[Testing with pattern 0xaa 76.53% done, 12:52:14 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:50]
WUH721414AL4204_Y6GGNZRD:
badblocks[Testing with pattern 0xaa 76.17% done, 12:52:15 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGUVUD:
badblocks[Testing with pattern 0xaa 76.68% done, 12:52:15 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]


and current

WUH721414AL4204_Y6GGRS0D:
badblocks[Reading and comparing 13.20% done, 20:39:40 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGV7TD:
badblocks[Reading and comparing 13.26% done, 20:39:41 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGU1UD:
badblocks[Reading and comparing 12.12% done, 20:39:42 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGULND:
badblocks[Reading and comparing 12.89% done, 20:39:42 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGV6ED:
badblocks[Reading and comparing 12.32% done, 20:39:44 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGVJAD:
badblocks[Reading and comparing 12.94% done, 20:39:45 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGSK0D:
badblocks[Reading and comparing 12.67% done, 20:39:45 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:50]
WUH721414AL4204_Y6GGNZRD:
badblocks[Reading and comparing 11.91% done, 20:39:46 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGUVUD:
badblocks[Reading and comparing 13.17% done, 20:39:47 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,734
1,123
113
artofserver.com
WUH721414AL4204_Y6GGRS0D:
badblocks[Reading and comparing 13.20% done, 20:39:40 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGV7TD:
badblocks[Reading and comparing 13.26% done, 20:39:41 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGU1UD:
badblocks[Reading and comparing 12.12% done, 20:39:42 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGULND:
badblocks[Reading and comparing 12.89% done, 20:39:42 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGV6ED:
badblocks[Reading and comparing 12.32% done, 20:39:44 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGVJAD:
badblocks[Reading and comparing 12.94% done, 20:39:45 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGSK0D:
badblocks[Reading and comparing 12.67% done, 20:39:45 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:50]
WUH721414AL4204_Y6GGNZRD:
badblocks[Reading and comparing 11.91% done, 20:39:46 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGUVUD:
badblocks[Reading and comparing 13.17% done, 20:39:47 elapsed. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=4:18]
Glad to hear my channel has been helpful. Yes, STH is one of my favorite tech forums so I'm sure you'll find a lot of great stuff and people here. :)

So, iostat is still showing a lot of writes right now, even though all the drives above are ding the read/compare test?
 

Breezy2428

Member
Jul 30, 2023
45
17
8
Very helpful :cool:

Yes according to bht -- status we are currently in the read and compare phase of the first run.

Code:
~/Desktop/Test_Data$ sudo ~/Desktop/Test_Data/bht --status
[sudo] password for [user]:
WUH721414AL4204_Y6GGRS0D:
    badblocks[Reading and comparing  21.06% done, 21:51:04 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGV7TD:
    badblocks[Reading and comparing  21.12% done, 21:51:05 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGU1UD:
    badblocks[Reading and comparing  19.94% done, 21:51:06 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGULND:
    badblocks[Reading and comparing  20.75% done, 21:51:07 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGV6ED:
    badblocks[Reading and comparing  20.17% done, 21:51:08 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGVJAD:
    badblocks[Reading and comparing  20.79% done, 21:51:09 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGSK0D:
    badblocks[Reading and comparing  20.52% done, 21:51:10 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:50]
WUH721414AL4204_Y6GGNZRD:
    badblocks[Reading and comparing  19.75% done, 21:51:10 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:18]
WUH721414AL4204_Y6GGUVUD:
    badblocks[Reading and comparing  21.01% done, 21:51:11 elapsed. (0/0/0 errors)]
    HDD Type:[SAS]
    SMART:[power_on_time(hours:minutes)=4:18]

But according to iostat we are still doing a lot more writing than reading although as time moves on we appear to be doing less writing and more reading, is there an undisclosed taper period? multi hour lag in IO stat? (ignore sdc its the OS ssd and where the logs are being written)

Code:
$ iostat -d
Linux 6.1.0-11-amd64 (Heavy)     08/22/2023     _x86_64_    (48 CPU)

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sda            1482.95     33292.56    156525.02         0.00 2908088244 13672382464          0
sdb            1481.39     33093.28    156525.02         0.00 2890681780 13672382464          0
sdc               1.59        14.55        29.19         0.00    1271353    2549908          0
sdd            1478.40     32710.44    156525.02         0.00 2857240648 13672382464          0
sde            1466.14     31140.81    156525.02         0.00 2720134600 13672382464          0
sdf            1478.85     32767.07    156525.02         0.00 2862187060 13672382464          0
sdg            1471.19     31786.91    156525.02         0.00 2776571060 13672382464          0
sdh            1482.21     33197.36    156525.02         0.00 2899772616 13672382464          0
sdi            1475.45     32333.07    156525.02         0.00 2824277172 13672382464          0
sdj            1468.48     31440.51    156525.02         0.00 2746312648 13672382464          0
 
Last edited:

Breezy2428

Member
Jul 30, 2023
45
17
8
Found an explanation for iostat behavior, it is in fact a multi hour lag after all. in this case a 141 hour average, 6 days


"when you call iostat, the first output is a reading that averages the stats for all devices since the first boot. This one won’t change visibly very often unless the system was JUST booted, and almost certainly isn’t what you want."

I tried using just the -y option he references no change.

But using his whole command did in fact show no writing during read back check, we are near the end of the 00 readback (almost done) the format is hard to read as it wrapped a line but its there.

watch -n 1 iostat -xy --human 1 1

Code:
Every 9.0s: iostat -xy --human 1 1                                                                           Heavy: Sun Aug 27 23:30:14 2023

Linux 6.1.0-11-amd64 (Heavy)    08/27/2023      _x86_64_        (48 CPU)


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.3%    0.0%    5.7%   13.5%    0.0%   78.5%

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm
/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda           1325.00    165.6M     0.00   0.0%   43.75   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   57.97  98.4%
sdb           1245.00    155.6M     0.00   0.0%   46.34   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   57.69  99.2%
sdc              0.00      0.0k     0.00   0.0%    0.00     0.0k    3.00     20.0k     2.00  40.0%    1.67     6.7k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00    0.01   0.4%
sdd           1434.00    179.2M     0.00   0.0%   39.54   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   56.70  99.2%
sde           1373.00    171.6M     0.00   0.0%   40.79   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   56.00  98.0%
sdf           1345.00    168.1M     0.00   0.0%   43.08   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   57.94  99.6%
sdg           1315.00    164.4M     0.00   0.0%   44.90   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   59.05  99.2%
sdh           1143.00    142.9M     0.00   0.0%   50.64   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   57.88  98.4%
sdi           1351.00    168.9M     0.00   0.0%   43.05   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   58.16  99.2%
sdj           1408.00    176.0M     0.00   0.0%   40.32   128.0k    0.00      0.0k     0.00   0.0%    0.00     0.0k    0.00      0.0k     0.
00   0.0%    0.00     0.0k    0.00    0.00   56.77  98.4%
 
Last edited:

Damo

Active Member
Sep 7, 2022
131
37
28
How long does it take to run on a 14TB DC HC530 SAS drive?

Presume I can cancel and see results live too rather than wait to the end.
 

Breezy2428

Member
Jul 30, 2023
45
17
8
Mine are writing really fast guess reading will take significantly more. Done 1% in < 10 minutes (write)
There are 4 writes and 4 reads. the percent you are seeing is for one of them.
10min x 100=1,000 minutes x 8 operations 8000 min, /60 = 133 hours /24= 5.5 days,


The outer bands of the disks are faster it slows a bit as you head for the inner cylinders I saw peaks of 250MB /s and lows in the 160 range near the end of an operation.
 
  • Like
Reactions: Damo

Damo

Active Member
Sep 7, 2022
131
37
28
There are 4 writes and 4 reads. the percent you are seeing is for one of them.
10min x 100=1,000 minutes x 8 operations 8000 min, /60 = 133 hours /24= 5.5 days,


The outer bands of the disks are faster it slows a bit as you head for the inner cylinders I saw peaks of 250MB /s and lows in the 160 range near the end of an operation.
Don't think I want to run it 6 days is this test like Memtester first pass generally good enough.
 

Breezy2428

Member
Jul 30, 2023
45
17
8
Don't think I want to run it 6 days is this test like Memtester first pass generally good enough.

If there is a hard existing hard fault in a particular sector first pass should find it, but if there is a problem that will only show as the drive wears in possibly not.

I was after strong confidence in the drives and old surplus server hardware before I loaded my data on this setup (also takes days) so I was patient and waited. that end goal was successful, after a torture test I have no paranoia that the whole thing could die at any moment. reality is still could but in my mind at least I have faith in it. that is worth something.

this is a replacement for a working but much smaller NAS setup so I had that luxury to wait.
 
Last edited: