READ Errors: does anyone understand why?

voodooFX

Active Member
Jan 26, 2014
245
50
28
This looks a very complicated one, but it's worth trying, maybe someone can help or give me a hint ;)

I have some Seagate Enterprise Capacity 3.5 HDD V.4 (6T) which are behaving very strangely
They have no issues in writing, but they have many while reading

This is what I get in /var/log/messages while reading from the drive(s)

Code:
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Retrying command (per sense data)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Retrying command (per sense data)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Retrying command (per sense data)
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Retrying command (per sense data)
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Error 5, Retries exhausted

And this is the S.M.A.R.T.

Code:
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   506297837        0         0  506297837          0     273577.685           0
write:         0        0         0         0          0      82590.648           0
verify:      134        0         0       134          0          0.000           0
I have 10 drives and this happens on all of them...
They are connected to two different HBAs, with three different cables...

Any idea what can cause this issue?
This is so "strong" that can cause a server crash and restart in 10 seconds if I start reading a lot of data from the pool

BTW the OS is FreeNAS 11.2-U5
 

ajs

Active Member
Mar 27, 2018
101
36
28
Minnesota
What are the drives formatted to? Sounds like you have PI enabled on the format or something. Reformat to 512 byte sectors
 

voodooFX

Active Member
Jan 26, 2014
245
50
28
The drives are already formatted to 512 byte, but to be sure I did format one again and did not helped
RAM is fine (I have tested other HDD/SSD models in the server)
 

mmk

Member
Oct 15, 2016
97
25
18
Czech Republic
Check if you have EEDP enabled on the drives. That at least used to not be supported by FreeBSD. Run something like:

sg_readcap -l /dev/da1

If it shows something like 'type 2 protection' then you'll have to format the drive, for example:

sg_format --format --early /dev/da1

This will of course cause data loss but by the sound of it it's probably not an issue..
 

voodooFX

Active Member
Jan 26, 2014
245
50
28
Hi mmk,

I tested them on CeontOS7 (latest) and they have the same problem
I also "fully" formatted one with sg_format (it took 24+ hrs...) but same result, read errors