Help me interpret these SAS disk SMART infos

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

tcpluess

Member
Jan 22, 2024
76
8
8
Good day,
I have purchased 2 used SAS drives off ebay. And I also got another drive from someone who, by accident, ordered a wrong drive from a clearance sale, and could not return it, so this particular drive was even brand new. Now I have installed the drives into my server, and I see the following numbers. I am curious about what others think of these; are the drives good or did I buy rubbish?

First the two ebay drives. Drive #1:

Code:
=== START OF INFORMATION SECTION ===
Vendor:               TOSHIBA
Product:              MG07SCA14TE
Revision:             0102
Compliance:           SPC-4
User Capacity:        14,000,519,643,136 bytes [14.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Oct 29 13:53:54 2024 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        65 C

Accumulated power on time, hours:minutes 39068:34
Manufactured in week 16 of year 2019
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  24
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  24
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0    8204573.602           0
write:         0       50         0         0          0      44336.119           0
verify:        0      383         0         0          0    3192191.502           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   38983                 - [-   -    -]
# 2  Background short  Completed                   -   38815                 - [-   -    -]
# 3  Background long   Completed                   -   38792                 - [-   -    -]
# 4  Background short  Completed                   -   38650                 - [-   -    -]
# 5  Background long   Completed                   -   38622                 - [-   -    -]
# 6  Background short  Completed                   -   38592                 - [-   -    -]

Long (extended) Self-test duration: 88200 seconds [24.5 hours]
and ebay drive #2:

Code:
=== START OF INFORMATION SECTION ===
Vendor:               TOSHIBA
Product:              MG07SCA14TE
Revision:             0102
Compliance:           SPC-4
User Capacity:        14,000,519,643,136 bytes [14.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Oct 29 13:54:29 2024 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     37 C
Drive Trip Temperature:        65 C

Accumulated power on time, hours:minutes 38963:44
Manufactured in week 09 of year 2019
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  23
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  23
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        1         0         0          0    8230010.946           0
write:         0        1         0         0          0      47405.032           0
verify:        0        8         0         0          0    3206171.364           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   38878                 - [-   -    -]
# 2  Background short  Completed                   -   38710                 - [-   -    -]
# 3  Background long   Completed                   -   38688                 - [-   -    -]
# 4  Background short  Completed                   -   38658                 - [-   -    -]

Long (extended) Self-test duration: 88200 seconds [24.5 hours]
I see that both ebay drives have a really high amount of data read, in the PB range, and still surprisingly low number of ECC errors. Also the number of power cycles is very low. Considering the large amount of data that was read from these drives, I would say the error counter looks good, does it?

Now the 3rd drive that I got from another auction platform:

Code:
=== START OF INFORMATION SECTION ===
Vendor:               TOSHIBA
Product:              MG08SCA16TE
Revision:             0105
Compliance:           SPC-4
User Capacity:        16,000,900,661,248 bytes [16.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Oct 29 13:56:25 2024 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     39 C
Drive Trip Temperature:        65 C

Accumulated power on time, hours:minutes 19:14
Manufactured in week 09 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  1
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        3         0         0          0       1492.807           0
write:         0        0         0         0          0      12660.666           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -      17                 - [-   -    -]

Long (extended) Self-test duration: 91860 seconds [25.5 hours]
I am quite sure this is true, as the drive was delivered in a original Toshiba box, with all seals intact, as the guy who bought it got it from a clearance sale and could therefore not return it. When I plugged the drive into my server yesterday, I saw zero minutes power on time, and zero for both the read and write data counter, so it is a true virgin drive, I would say.

However, what irritates me a bit is, that only from the resilvering of my ZFS pool, the drive had to read 1.5TB and already has its error counter go up to 3. Still, no unrecoverable errors so far, but this looks a bit surprising compared to the other two drives, doesn't it?

Would you say something with these drives is fishy, or is this just a pure coincidence and the drives are totally fine.?

I am also curious how is it possible that there are 3 ECC errors corrected, but the total number of errors corrected is 0?

For instance, one of my drives that is currently being used looks like so:

Code:
=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WUH721414AL5204
Revision:             C400
Compliance:           SPC-4
User Capacity:        14,000,519,643,136 bytes [14.0 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Oct 29 14:02:33 2024 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification = 0
Total blocks reassigned during format = 0
Total new blocks reassigned = 0
Power on minutes since format = 89274
Current Drive Temperature:     43 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 40165:53
Manufactured in week 21 of year 2019
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  16
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1843
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        1         0         1     383816      77746.099           0
write:         0       22         0        22       1106      15849.924           0
verify:        0        0         0         0          1          0.000           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   40080                 - [-   -    -]
# 2  Background short  Completed                   -   39912                 - [-   -    -]
# 3  Background short  Completed                   -   39747                 - [-   -    -]
# 4  Background long   Completed                   -   39600                 - [-   -    -]
# 5  Background short  Completed                   -   39411                 - [-   -    -]
# 6  Background short  Completed                   -   39242                 - [-   -    -]
# 7  Background short  Completed                   -   39075                 - [-   -    -]
# 8  Background long   Completed                   -   38935                 - [-   -    -]
# 9  Background short  Completed                   -   38742                 - [-   -    -]
#10  Background short  Completed                   -   38721                 - [-   -    -]
#11  Background short  Completed                   -   38574                 - [-   -    -]
#12  Background short  Completed                   -   38435                 - [-   -    -]
#13  Background short  Completed                   -   38267                 - [-   -    -]
#14  Background long   Completed                   -   38120                 - [-   -    -]
#15  Background short  Completed                   -   37931                 - [-   -    -]
#16  Background short  Completed                   -   37763                 - [-   -    -]
#17  Background short  Completed                   -   37595                 - [-   -    -]
#18  Background short  Completed                   -   37500                 - [-   -    -]
#19  Background short  Completed                   -   37484                 - [-   -    -]
#20  Background short  Completed                   -   37476                 - [-   -    -]

Long (extended) Self-test duration: 91620 seconds [25.4 hours]
And I will actually replace this particular drive with the brand new one. But I am now a bit curious if this is a smart move :D
by the way, if you wonder about the large number of SMART tests. I do once a week a short SMART test. And every first sunday of the month, I do a long test. For this reason the SMART logs of my drives are always a bit densely populated.
 

TRACKER

Active Member
Jan 14, 2019
260
110
43
For SAS drive better use "sg_logs" command under linux or freebsd.
It would give you much more detailed information about issues with the drive.
 

tcpluess

Member
Jan 22, 2024
76
8
8
For SAS drive better use "sg_logs" command under linux or freebsd.
It would give you much more detailed information about issues with the drive.

sure, I did this, too. At least the error counters seem to be the same.

Code:
# sg_logs -a /dev/sda
    TOSHIBA   MG07SCA14TE       0102

Supported log pages  [0x0]:
    0x00        Supported log pages [sp]
    0x01        Buffer over-run/under-run [bou]
    0x02        Write error [we]
    0x03        Read error [re]
    0x05        Verify error [ve]
    0x06        Non medium [nm]
    0x0d        Temperature [temp]
    0x0e        Start-stop cycle counter [sscc]
    0x0f        Application client [ac]
    0x10        Self test results [str]
    0x15        Background scan results [bsr]
    0x18        Protocol specific port [psp]
    0x1a        Power condition transitions [pct]
    0x2f        Informational exceptions [ie]
    0x38       

Buffer over-run/under-run page  [0x1]
  under-run = 0
  over-run = 0

Write error counter page  [0x2]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 50
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Read error counter page  [0x3]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 0
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Verify error counter page  [0x5]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 383
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Non-medium error page  [0x6]
  Non-medium error count = 0

Temperature page  [0xd]
  Current temperature = 33 C
  Reference temperature = 65 C

Start-stop cycle counter page  [0xe]
  Date of manufacture, year: 2019, week: 16
  Accounting date, year:     , week:   
  Specified cycle count over device lifetime = 50000
  Accumulated start-stop cycles = 24
  Specified load-unload count over device lifetime = 600000
  Accumulated load-unload cycles = 24

Application client page  [0xf]
 00     0f 00 40 00 00 00 83 fc  00 00 00 00 00 00 00 00
 10     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 20     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 30     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 .....  [truncated after 64 of 16388 bytes (use '-H' to see the rest)]

Self-test results page  [0x10]
  Parameter code = 1, accumulated power-on hours = 38983
    self-test code: background short [1]
    self-test result: completed without error [0]
  Parameter code = 2, accumulated power-on hours = 38815
    self-test code: background short [1]
    self-test result: completed without error [0]
  Parameter code = 3, accumulated power-on hours = 38792
    self-test code: background extended [2]
    self-test result: completed without error [0]
  Parameter code = 4, accumulated power-on hours = 38650
    self-test code: background short [1]
    self-test result: completed without error [0]
  Parameter code = 5, accumulated power-on hours = 38622
    self-test code: background extended [2]
    self-test result: completed without error [0]
  Parameter code = 6, accumulated power-on hours = 38592
    self-test code: background short [1]
    self-test result: completed without error [0]

Background scan results page  [0x15]
  Status parameters:
    Accumulated power on minutes: 2344144 [h:m  39069:4]
    Status: no background scans active
    Number of background scans performed: 119
    Background medium scan progress: 0.04 %
    Number of background medium scans performed: 119

Protocol Specific port page for SAS SSP  (sas-2) [0x18]
relative target port id = 1
  generation code = 3
  number of phys = 1
  phy identifier = 0
    attached SAS device type: SAS or SATA device
    attached reason: power on
    reason: loss of dword synchronization
    negotiated logical link rate: 12 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = <redacted>
    attached SAS address = <redacted>
    attached phy identifier = 1
    Invalid DWORD count = 32
    Running disparity error count = 32
    Loss of DWORD synchronization count = 7
    Phy reset problem count = 0
    Phy event descriptors:
     Invalid word count: 32
     Running disparity error count: 32
     Loss of dword synchronization count: 7
     Phy reset problem count: 0
     Elasticity buffer overflow count: 0
     Received abandon-class OPEN_REJECT count: 0
     Transmitted BREAK count: 0
     Received BREAK count: 0
     Transmitted SSP frame error count: 0
     Received SSP frame error count: 0
relative target port id = 2
  generation code = 3
  number of phys = 1
  phy identifier = 1
    attached SAS device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; unknown rate
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = <redacted>
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization count = 0
    Phy reset problem count = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0
     Elasticity buffer overflow count: 0
     Received abandon-class OPEN_REJECT count: 0
     Transmitted BREAK count: 0
     Received BREAK count: 0
     Transmitted SSP frame error count: 0
     Received SSP frame error count: 0

Power condition transitions page  [0x1a]
  Accumulated transitions to active = 1613232
  Accumulated transitions to idle_a = 1613213
  Accumulated transitions to idle_b = 0
  Accumulated transitions to idle_c = 0
  Accumulated transitions to standby_z = 0
  Accumulated transitions to standby_y = 0

Informational Exceptions page  [0x2f]
  IE asc = 0x0, ascq = 0x0
    Current temperature = 33 C
    Threshold temperature = 0 C  [common extension]

Unable to decode page = 0x38, here is hex:
 00     38 00 02 68 00 01 00 08  00 29 32 e0 00 23 c4 d0
 10     00 02 00 a4 01 2c 00 0f  42 40 00 05 a8 ca 00 00
 20     00 00 00 32 00 01 86 a0  00 00 92 f9 00 00 00 00
 30     00 01 3b 96 00 00 00 00  00 00 93 4a 00 00 00 00
 .....  [truncated after 64 of 620 bytes (use '-H' to see the rest)]
Code:
# sg_logs -a /dev/sdb
    TOSHIBA   MG07SCA14TE       0102

Supported log pages  [0x0]:
    0x00        Supported log pages [sp]
    0x01        Buffer over-run/under-run [bou]
    0x02        Write error [we]
    0x03        Read error [re]
    0x05        Verify error [ve]
    0x06        Non medium [nm]
    0x0d        Temperature [temp]
    0x0e        Start-stop cycle counter [sscc]
    0x0f        Application client [ac]
    0x10        Self test results [str]
    0x15        Background scan results [bsr]
    0x18        Protocol specific port [psp]
    0x1a        Power condition transitions [pct]
    0x2f        Informational exceptions [ie]
    0x38       

Buffer over-run/under-run page  [0x1]
  under-run = 0
  over-run = 0

Write error counter page  [0x2]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 1
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Read error counter page  [0x3]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 1
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Verify error counter page  [0x5]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 8
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Non-medium error page  [0x6]
  Non-medium error count = 0

Temperature page  [0xd]
  Current temperature = 37 C
  Reference temperature = 65 C

Start-stop cycle counter page  [0xe]
  Date of manufacture, year: 2019, week: 09
  Accounting date, year:     , week:   
  Specified cycle count over device lifetime = 50000
  Accumulated start-stop cycles = 23
  Specified load-unload count over device lifetime = 600000
  Accumulated load-unload cycles = 23

Application client page  [0xf]
 00     0f 00 40 00 00 00 83 fc  00 00 00 00 00 00 00 00
 10     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 20     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 30     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 .....  [truncated after 64 of 16388 bytes (use '-H' to see the rest)]

Self-test results page  [0x10]
  Parameter code = 1, accumulated power-on hours = 38878
    self-test code: background short [1]
    self-test result: completed without error [0]
  Parameter code = 2, accumulated power-on hours = 38710
    self-test code: background short [1]
    self-test result: completed without error [0]
  Parameter code = 3, accumulated power-on hours = 38688
    self-test code: background extended [2]
    self-test result: completed without error [0]
  Parameter code = 4, accumulated power-on hours = 38658
    self-test code: background short [1]
    self-test result: completed without error [0]

Background scan results page  [0x15]
  Status parameters:
    Accumulated power on minutes: 2337856 [h:m  38964:16]
    Status: no background scans active
    Number of background scans performed: 119
    Background medium scan progress: 0.08 %
    Number of background medium scans performed: 119

Protocol Specific port page for SAS SSP  (sas-2) [0x18]
relative target port id = 1
  generation code = 5
  number of phys = 1
  phy identifier = 0
    attached SAS device type: SAS or SATA device
    attached reason: unknown
    reason: loss of dword synchronization
    negotiated logical link rate: 12 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = <redacted>
    attached SAS address = <redacted>
    attached phy identifier = 3
    Invalid DWORD count = 24
    Running disparity error count = 24
    Loss of DWORD synchronization count = 5
    Phy reset problem count = 0
    Phy event descriptors:
     Invalid word count: 24
     Running disparity error count: 24
     Loss of dword synchronization count: 5
     Phy reset problem count: 0
     Elasticity buffer overflow count: 0
     Received abandon-class OPEN_REJECT count: 0
     Transmitted BREAK count: 0
     Received BREAK count: 0
     Transmitted SSP frame error count: 0
     Received SSP frame error count: 0
relative target port id = 2
  generation code = 5
  number of phys = 1
  phy identifier = 1
    attached SAS device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; unknown rate
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = <redacted>
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization count = 0
    Phy reset problem count = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0
     Elasticity buffer overflow count: 0
     Received abandon-class OPEN_REJECT count: 0
     Transmitted BREAK count: 0
     Received BREAK count: 0
     Transmitted SSP frame error count: 0
     Received SSP frame error count: 0

Power condition transitions page  [0x1a]
  Accumulated transitions to active = 4030708
  Accumulated transitions to idle_a = 4030688
  Accumulated transitions to idle_b = 0
  Accumulated transitions to idle_c = 0
  Accumulated transitions to standby_z = 0
  Accumulated transitions to standby_y = 0

Informational Exceptions page  [0x2f]
  IE asc = 0x0, ascq = 0x0
    Current temperature = 37 C
    Threshold temperature = 0 C  [common extension]

Unable to decode page = 0x38, here is hex:
 00     38 00 02 68 00 01 00 08  00 29 32 e0 00 23 ac 40
 10     00 02 00 a4 01 2c 00 0f  42 40 00 0b ca 96 00 00
 20     00 00 00 32 00 01 86 a0  00 01 57 f1 00 00 00 00
 30     00 00 03 c0 00 00 00 00  00 00 d5 25 00 00 00 00
 .....  [truncated after 64 of 620 bytes (use '-H' to see the rest)]

Code:
# sg_logs -a /dev/sdi
    TOSHIBA   MG08SCA16TE       0105

Supported log pages  [0x0]:
    0x00        Supported log pages [sp]
    0x01        Buffer over-run/under-run [bou]
    0x02        Write error [we]
    0x03        Read error [re]
    0x05        Verify error [ve]
    0x06        Non medium [nm]
    0x0d        Temperature [temp]
    0x0e        Start-stop cycle counter [sscc]
    0x0f        Application client [ac]
    0x10        Self test results [str]
    0x15        Background scan results [bsr]
    0x18        Protocol specific port [psp]
    0x1a        Power condition transitions [pct]
    0x2f        Informational exceptions [ie]
    0x38       

Buffer over-run/under-run page  [0x1]
  under-run = 0
  over-run = 0

Write error counter page  [0x2]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 0
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Read error counter page  [0x3]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 3
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Verify error counter page  [0x5]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 0
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

Non-medium error page  [0x6]
  Non-medium error count = 0

Temperature page  [0xd]
  Current temperature = 38 C
  Reference temperature = 65 C

Start-stop cycle counter page  [0xe]
  Date of manufacture, year: 2023, week: 09
  Accounting date, year:     , week:   
  Specified cycle count over device lifetime = 50000
  Accumulated start-stop cycles = 1
  Specified load-unload count over device lifetime = 600000
  Accumulated load-unload cycles = 1

Application client page  [0xf]
 00     0f 00 40 00 00 00 83 fc  00 00 00 00 00 00 00 00
 10     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 20     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 30     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
 .....  [truncated after 64 of 16388 bytes (use '-H' to see the rest)]

Self-test results page  [0x10]
  Parameter code = 1, accumulated power-on hours = 17
    self-test code: background short [1]
    self-test result: completed without error [0]

Background scan results page  [0x15]
  Status parameters:
    Accumulated power on minutes: 1186 [h:m  19:46]
    Status: no background scans active
    Number of background scans performed: 1
    Background medium scan progress: 0.22 %
    Number of background medium scans performed: 1

Protocol Specific port page for SAS SSP  (sas-2) [0x18]
relative target port id = 1
  generation code = 2
  number of phys = 1
  phy identifier = 0
    attached SAS device type: SAS or SATA device
    attached reason: unknown
    reason: loss of dword synchronization
    negotiated logical link rate: 12 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = <redacted>
    attached SAS address = <redacted>
    attached phy identifier = 7
    Invalid DWORD count = 4
    Running disparity error count = 4
    Loss of DWORD synchronization count = 1
    Phy reset problem count = 0
    Phy event descriptors:
     Invalid word count: 4
     Running disparity error count: 4
     Loss of dword synchronization count: 1
     Phy reset problem count: 0
     Elasticity buffer overflow count: 0
     Received abandon-class OPEN_REJECT count: 0
     Transmitted BREAK count: 0
     Received BREAK count: 0
     Transmitted SSP frame error count: 0
     Received SSP frame error count: 0
relative target port id = 2
  generation code = 2
  number of phys = 1
  phy identifier = 1
    attached SAS device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; unknown rate
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = <redacted>
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization count = 0
    Phy reset problem count = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0
     Elasticity buffer overflow count: 0
     Received abandon-class OPEN_REJECT count: 0
     Transmitted BREAK count: 0
     Received BREAK count: 0
     Transmitted SSP frame error count: 0
     Received SSP frame error count: 0

Power condition transitions page  [0x1a]
  Accumulated transitions to active = 1
  Accumulated transitions to idle_a = 0
  Accumulated transitions to idle_b = 0
  Accumulated transitions to idle_c = 0
  Accumulated transitions to standby_z = 0
  Accumulated transitions to standby_y = 0

Informational Exceptions page  [0x2f]
  IE asc = 0x0, ascq = 0x0
    Current temperature = 38 C
    Threshold temperature = 0 C  [common extension]

Unable to decode page = 0x38, here is hex:
 00     38 00 02 68 00 01 00 08  00 29 32 e0 00 00 04 a2
 10     00 02 00 a4 01 2c 00 0f  42 40 00 09 cc 8e 00 00
 20     00 00 00 32 00 01 86 a0  00 01 6c 58 00 00 00 00
 30     00 01 69 5a 00 00 00 00  00 00 06 fd 00 00 00 00
 .....  [truncated after 64 of 620 bytes (use '-H' to see the rest)]
 

tcpluess

Member
Jan 22, 2024
76
8
8
yea, I am irritated that the first two disks have Petabytes written and still have so few corrected errors, whereas the disk that is brand new has only 12 TB written and already 3 errors corrected. Or am I misinterpreting something?
 

TRACKER

Active Member
Jan 14, 2019
260
110
43
well, you have two disks manufactured in 2019 which have more than 8PB read data and one manufactured in 2023 having 1.4 TB :) so i guess pure luck the heavy used ones have very few errors.
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,748
545
113
Canada
Disk manufacturers use various smart fields in different ways to report stuff. The disks look good to me, but before I would be remotely worried, I would be asking the manufacturer for a list of smart attributes that they kept standard, those that they are using for their own purposes and what each entry means :)

With modern disks, I would only be worrying about unrecoverable errors accumulating, and unusual mechanical noises ;)
 

tcpluess

Member
Jan 22, 2024
76
8
8
Good, I agree, the disks look healthy, especially the two used ones perform well and did not report any new errors.
However, the brand new disk seems to increase its error counter. It increases from time to time, even though when I do not read or write especially large amounts of data. Cause for worrying, or just ignore as the unrecoverable errors stay at zero?

Code:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               TOSHIBA
Product:              MG08SCA16TE
Revision:             0105
Compliance:           SPC-4
User Capacity:        16,000,900,661,248 bytes [16.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Thu Oct 31 10:12:05 2024 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        65 C

Accumulated power on time, hours:minutes 63:30
Manufactured in week 09 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  1
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       18         0         0          0      22536.519           0
write:         0        0         0         0          0      12686.089           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -      60                 - [-   -    -]
# 2  Background short  Completed                   -      51                 - [-   -    -]
# 3  Background long   Completed                   -      50                 - [-   -    -]
# 4  Background short  Completed                   -      17                 - [-   -    -]
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,748
545
113
Canada
These kinds of ecc errors are pretty common with spinning disks in general. They get reported after some threshold, as set by the manufacturer in the firmware, is crossed. The exact nature of the error could be pretty much anything that caused the controller to misinterpret the data and fix it, and let's not forget that the smart reporting algorithm itself, is far from perfect too.

Most manufacturers stick to a subset of common reporting parameters for smart data, as laid out in the standards, then they use the rest of the fields for specific internal analyses. But without intimate knowledge of what's what, you are left in the dark. I would say, providing the disk is not making any unusual noises, or sounding like it is having issues seeking and it is not accumulating unrecoverable errors, I would call it good until something happens that changes my mind :)

Sure, you could run badblocks cycles on it, benchmark it's transfer speed and calculate and compare approximate completion times etc, but really who has 50 hours to spare watching that take place? No disk is forever though, that's what backup's are for ;):D
 
  • Like
Reactions: Navvie

Navvie

Member
Nov 21, 2020
34
13
8
Do a short smart test. Run badblocks. Do a short smart test. If the number of errors doesn't increase dramatically, use the drive. Should the worst happen , go to your backups and file a warranty claim. It's a new drive, it is covered by the manufacturer's warranty, right?

PS. UK law treats sale or clearance goods exactly the same as normal goods bought at MRRP. That doesn't mean a item that's not broken or defective can always be returned for a refund, but it usually can, especially if it was made clear the item was bought as a gift when the original transaction took place. Unfortunately a lot people in the UK don't know their rights.
 

tcpluess

Member
Jan 22, 2024
76
8
8
yeah I am using the drive now, with ZFS. In fact, I made a single-drive pool with copies=2 and use it to store my backups. ZFS so far never complained about any errors, and I have now 10 TB of data on the drive and it survives all scrubs. So this behaviour is probably normal. I know that sg_logs allows to clear the error counters, but I will not do this as I want to keep track what is going on. Thanks to multiple backups being made and scrubs, I have now already reached 40 TB data read, and the error counter seemed to stabilize at 59 but with zero defect blocks.I think the drive is fine
 

BackupProphet

Well-Known Member
Jul 2, 2014
1,173
752
113
Stavanger, Norway
intellistream.ai
These Toshiba drives should also support background scan, where the firmware scans the drive and fix issues. I remember I had a drive where the read error increased constantly, then I did a background scan, it stopped increasing and disk was fine. However it mentioned that I had a few bad blocks after that.
 
Last edited:

tcpluess

Member
Jan 22, 2024
76
8
8
mmh yes, background scan was enabled by default. I disabled it cause I thought, ZFS scrub will do it all.
Should I enable the background scan?