SAS Disk SMART

tcpluess · Apr 10, 2024

Good day,
so finally, I got my SAS controller and cables and finally I can test a couple disks I got.
So I have bought a couple used SAS SSDs cheaply. For example I got HGST S842 and some HUSSL4040. Very nice SSDs. I also have some HGST Ultrastar 14 TB hard disks. They will be handy when I will make my media zpool. I also got them used for very small $.

So I connected the drives and checked the SMART status. I already learned that SAS SMART is something completely different than SATA SMART, i.e. there is not so much info from these SAS disks than from a SATA disk.

What I see is for the SSDs I have, they report the wear between 0% and 2%. I consider this as good. One SSD has 30% wear.
One of the hard disks has 5 "Elements in grown defect list". And all disks have nonzero numbers for the "ECC recovered errors". However when I consider that I see that already some petabytes have been read from those disks, I can imagine that it will be unlikely that ECC recivered errors are zero, so I imagine this is normal. However, the "Uncorrectable Errors" is zero for all disks, which is good I think.

So I ask the question the other way round:

If the ECC recovered errors is nonzero, shall I worry or am I good to go to use these SAS SSDs and disks?
I have no idea if this very sparse SMART info is normal or if I just happen to have a "bad" controller (I got a Supermicro 3008-8i SAS controller card 12 GB/s). I am worried a bit about the 5 entries in the grown defect list, on the other hand Proxmox reports the disks as "healthy". And I will use ZFS anyways.

Also the 30% wear SSD I am not too worried about, as these SSDs can withstand multi petabytes. The workload in my server will not even be close to that, so I probably still can use them for the ZFS special device using a mirror config, 3-way mirror or similar.

nabsltd · Apr 10, 2024

For spinning rust, the most important SMART value is "reallocated sector count". The SAS equivalent is "grown defect list". These are sectors that had enough of an issue that the drive "retired" them by copying the data to a spare sector. To me, any non-zero value is worrisome, but another way to consider it is as a percentage of the total spare sectors available (in SAS, the "primary defect list") and as it relates to total sector reads and writes on the drive.

SAS shows ECC recovered counts in its version of SMART, and (as you guessed), as long as there are no uncorrectable errors, it is considered a normal operation and has no bearing on the health of the drive. The only thing to watch out for is if the ECC recovered count is high compared to the total reads and writes...more than about 10% would be something to worry about.

Like ECC on spinning disks, reallocated sectors on SSDs are considered a completely normal part of operation, so as long as the wear value is low (like yours), the disks are fine.

TRACKER · Apr 10, 2024

SAS disks actually have much more statistical data if you use the right command

try sg_logs command (linux, freebsd).
E.g. sg_logs -a /dev/da0

nabsltd · Apr 10, 2024

TRACKER said:
SAS disks actually have much more statistical data if you use the right command

As far as I know, smartctl and HDSentinal both get the same data as sg_logs. HDSentinel, for example, lists the results of every self-test ever performed on the drive. You might have to use the -x parameter to smartctl to get everything.

TRACKER · Apr 10, 2024

nabsltd · Apr 10, 2024

TRACKER said:
ok

I'm not sure what you are using the output to demonstrate. You can see that, overall, the data is the same in the two reports, but sg_logs is more verbose with descriptions while smartctl uses a table with abbreviations.

These days, smartctl is just a wrapper around a bunch of different methods (based on drive type), and a formatter that makes the output look more or less the same despite the vast differences in the underlying drives. The fact that smartctl now incorporates the same queries that storcli (a closed-source LSI program) uses to see drives hidden behind LSI hardware RAID shows just how generic it now is.

TRACKER · Apr 10, 2024

it is clearly visible sg_logs command gives more details compared to smartctl, even with -x option.
Everyone can draw their own conclusions.
For me smartctl is for sata drives and sg_logs for sas

tcpluess · Apr 10, 2024

thanks, I did not know the sg_logs command! is it from the sg3-utils package?

TRACKER · Apr 10, 2024

Hi tcpluess,

yes, correct

The sg3_utils package

tcpluess · Apr 10, 2024

phantastic. I will try it. Thanks!

nabsltd · Apr 11, 2024

TRACKER said:
it is clearly visible sg_logs command gives more details compared to smartctl, even with -x option.

Here's all the data in sg_logs output that isn't in smartctl. There's not a lot here that helps with determining the health of the drive.

Code:

Supported log pages  [0x0]:
    0x00        Supported log pages [sp]
    0x02        Write error [we]
    0x03        Read error [re]
    0x05        Verify error [ve]
    0x06        Non medium [nm]
    0x08        Format status [fs]
    0x0d        Temperature [temp]
    0x0e        Start-stop cycle counter [sscc]
    0x0f        Application client [ac]
    0x10        Self test results [str]
    0x15        Background scan results [bsr]
    0x18        Protocol specific port [psp]
    0x19        General Statistics and Performance [gsp]
    0x1a        Power condition transitions [pct]
    0x2f        Informational exceptions [ie]
    0x30        Performance counters (Hitachi) [pc_hi]
    0x37        Cache (seagate) [c_se]

Start-stop cycle counter page  [0xe]
  Accounting date, year: 2018, week: 46

Application client page  [0xf]
 00     0f 00 40 00 00 00 03 fc  00 00 00 00 00 00 00 00
 10     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 20     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 30     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 .....  [truncated after 64 of 16388 bytes (use '-H' to see the rest)]

General Statistics and Performance  [0x19]
Statistics and performance log parameter
  number of read commands = 9503430
  number of write commands = 21844443
  number of logical blocks received = 4556390602
  number of logical blocks transmitted = 1978348076
  read command processing intervals = 0
  write command processing intervals = 0
  weighted number of read commands plus write commands = 0
  weighted read command processing plus write command processing = 0
Idle time log parameter
  idle time intervals = 69983044
Time interval log parameter for general stats
  time interval negative exponent = 2
  time interval integer = 5

Power condition transitions page  [0x1a]
  Accumulated transitions to active = 0
  Accumulated transitions to idle_a = 0
  Accumulated transitions to idle_b = 0
  Accumulated transitions to idle_c = 0
  Accumulated transitions to standby_z = 0
  Accumulated transitions to standby_y = 0

Informational Exceptions page  [0x2f]
  IE asc = 0x0, ascq = 0x0
  parameter code = 0x1, contents in hex:
 00     00 01 03 03 64 64 19

HGST/WDC performance counters page [0x30]
  Zero Seeks = 25582
  Seeks >= 2/3 = 1243
  Seeks >= 1/3 and < 2/3 = 2800
  Seeks >= 1/6 and < 1/3 = 1468
  Seeks >= 1/12 and < 1/6 = 1212
  Seeks > 0 and < 1/12 = 6528
  Overrun Counter = 0
  Underrun Counter = 103
  Device Cache Full Read Hits = 551
  Device Cache Partial Read Hits = 2957
  Device Cache Write Hits = 2098
  Device Cache Fast Writes = 555447
  Device Cache Read Misses = 18781

HGST/WDC miscellaneous page [0x37, 0x0]
  GList Size = 0
  Number of Information Exceptions = 0
  MED EXC = 0
  HDW EXC = 0
  Total Read Commands = 9503430
  Total Write Commands = 21844443
  Flash Correction Count = 0

Search

SAS Disk SMART

tcpluess

Member

nabsltd

Well-Known Member

TRACKER

Active Member

nabsltd

Well-Known Member

TRACKER

Active Member

Attachments

nabsltd

Well-Known Member

TRACKER

Active Member

tcpluess

Member

TRACKER

Active Member

tcpluess

Member

nabsltd

Well-Known Member