READ Errors: does anyone understand why?

voodooFX · Aug 21, 2019

This looks a very complicated one, but it's worth trying, maybe someone can help or give me a hint

I have some Seagate Enterprise Capacity 3.5 HDD V.4 (6T) which are behaving very strangely
They have no issues in writing, but they have many while reading

This is what I get in /var/log/messages while reading from the drive(s)

Code:

Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Retrying command (per sense data)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Retrying command (per sense data)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:15 serv01 (da1:mps0:0:9:0): Retrying command (per sense data)
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Retrying command (per sense data)
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 87 08 00 00 01 00 00
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): SCSI status: Check Condition
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:10,1 (Logical block guard check failed)
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Info: 0x870870
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Field Replaceable Unit: 0
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Descriptor 0x80: 00 00 00 00 02 27 00 00 00 00 00 00 00 00
Aug 21 14:24:16 serv01 (da1:mps0:0:9:0): Error 5, Retries exhausted

And this is the S.M.A.R.T.

Code:

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   506297837        0         0  506297837          0     273577.685           0
write:         0        0         0         0          0      82590.648           0
verify:      134        0         0       134          0          0.000           0

I have 10 drives and this happens on all of them...
They are connected to two different HBAs, with three different cables...

Any idea what can cause this issue?
This is so "strong" that can cause a server crash and restart in 10 seconds if I start reading a lot of data from the pool

BTW the OS is FreeNAS 11.2-U5

ajs · Aug 21, 2019

What are the drives formatted to? Sounds like you have PI enabled on the format or something. Reformat to 512 byte sectors

pricklypunter · Aug 21, 2019

RAM is ok?

voodooFX · Aug 23, 2019

The drives are already formatted to 512 byte, but to be sure I did format one again and did not helped
RAM is fine (I have tested other HDD/SSD models in the server)

mmk · Aug 26, 2019

Check if you have EEDP enabled on the drives. That at least used to not be supported by FreeBSD. Run something like:

sg_readcap -l /dev/da1

If it shows something like 'type 2 protection' then you'll have to format the drive, for example:

sg_format --format --early /dev/da1

This will of course cause data loss but by the sound of it it's probably not an issue..

voodooFX · Aug 27, 2019

Hi mmk,

I tested them on CeontOS7 (latest) and they have the same problem
I also "fully" formatted one with sg_format (it took 24+ hrs...) but same result, read errors

Search

READ Errors: does anyone understand why?

voodooFX

Active Member

ajs

Active Member

pricklypunter

Well-Known Member

voodooFX

Active Member

mmk

Active Member

voodooFX

Active Member