AIO ESXi 5.x x9scl+-F 3xM1015 (sas2008-IT mode) major CRC Errors on scrubs

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Davie

New Member
Oct 15, 2012
7
0
0
Dubai, UAE
AIO ESXi 5.x x9scl+-F 3xM1015 (sas2008-IT mode) major CHKSum Errors on scrubs

Hi all,

First post - sorry to be asking for help right off the bat.

I've got a strange problem with Openindiana (versions 151a5 up to 151a7) where any heavy load on my 10xSegate 3TB drives (or any drives for that matter) cause thousands of CRC Errors under ESXi 5 and 5.1.

The drives are in a Norco 4224 and when the problem manifests itself I see a drive (or two or three) activity LEDs light up and stay lit.

If I boot the Openindiana Live CD and run a scrub everything checks out fine and there are no errors.

I have updated the X9SCL BIOS to 2.0a (can't remember the previous version, but it has the same issues). I'm also using the latest IT mode firmware on the IBM 1015 HBAs.

An ideas on what could be causing this?

TIA

Davie
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,828
113
Davie,

I heard that there is a X9SCx BIOS v2.0b floating around out there. Might be worth trying as a starting point.

Which IT mode firmware on the IBM M1015? Did you flash it to a LSI firmware first?
 

Davie

New Member
Oct 15, 2012
7
0
0
Dubai, UAE
I followed mobilenvidia's instructions to Convert LSI9240(IBM M1015) to a LSI9211-IT mode - and they flashed without problems on board I had in my previous server.

I'll take a look for that BIOS, but I only updated last week so it might be what I already have.

I thought at first that it might be my 8087 SAS cables as they're 1M - but since everything is fine without ESXi I guess they're fine. I've searched loads of forums so far I've only found one past (I've lost) that IT mode is unreliable and that I should use IR mode, I seem to recall it was either the same motherboard and/or HBAs.

I'm really keen to get this fixed as my current config is totally overkill for a dedicate OI/Solaris media server (I've got 6 SSDs to add to the equation when I get this sorted out).

Thanks

D.

edit: I used your Live CD Patrick, so I'll try that new 2.0b BIOS
 
Last edited:

Davie

New Member
Oct 15, 2012
7
0
0
Dubai, UAE
tried the new bios....

doing a copy from one pool to another gives the following in dmesg:


Oct 15 21:57:55 openindiana scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:55 openindiana mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Oct 15 21:57:55 openindiana scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:55 openindiana mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Oct 15 21:57:55 openindiana scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 21:57:55 openindiana mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Oct 15 21:57:55 openindiana scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 21:57:55 openindiana mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

 

Davie

New Member
Oct 15, 2012
7
0
0
Dubai, UAE
a quick zpool status (before a scrub) gives

zpool status repository
pool: repository
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support feature
flags.
scan: none requested
config:

NAME STATE READ WRITE CKSUM
repository ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c8t5000C5004DFD988Ed0 ONLINE 0 0 0
c8t5000C5004E369D82d0 ONLINE 0 0 0
c8t5000C5004E370D67d0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c8t5000C5004E4258A0d0 ONLINE 0 0 0
c8t5000C5004E4472E4d0 ONLINE 0 0 0
c8t5000C5004E449528d0 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
c8t5000C5004E449966d0 ONLINE 0 0 0
c8t5000C5004E4499A7d0 ONLINE 0 0 0
c8t5000C5004E449A98d0 ONLINE 0 0 0

errors: No known data errors


then a scrub gives me the following after a few seconds:


zpool status repository
pool: repository
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub in progress since Mon Oct 15 22:03:21 2012
222M scanned out of 5.93G at 8.21M/s, 0h11m to go
8.26M repaired, 3.65% done
config:

NAME STATE READ WRITE CKSUM
repository DEGRADED 0 0 0
raidz1-0 ONLINE 0 0 0
c8t5000C5004DFD988Ed0 ONLINE 0 0 0
c8t5000C5004E369D82d0 ONLINE 0 0 0
c8t5000C5004E370D67d0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c8t5000C5004E4258A0d0 ONLINE 0 0 0
c8t5000C5004E4472E4d0 ONLINE 0 0 0
c8t5000C5004E449528d0 ONLINE 0 0 0
raidz1-2 DEGRADED 0 0 0
c8t5000C5004E449966d0 ONLINE 0 0 0
c8t5000C5004E4499A7d0 ONLINE 0 0 0
c8t5000C5004E449A98d0 DEGRADED 0 0 135 too many errors (repairing)




and more issues in dmesg:


Oct 15 22:04:52 openindiana vmxnet3s: [ID 654879 kern.notice] vmxnet3s:0: getcapab(0x200000) -> no
Oct 15 22:04:52 openindiana last message repeated 1 time
Oct 15 22:04:52 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 22:04:52 openindiana Log info 0x31080000 received for target 13.
Oct 15 22:04:52 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:04:55 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 22:04:55 openindiana Log info 0x31080000 received for target 13.
Oct 15 22:04:55 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:04:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 22:04:59 openindiana Log info 0x31080000 received for target 14.
Oct 15 22:04:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:04:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 22:04:59 openindiana Log info 0x31080000 received for target 14.
Oct 15 22:04:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:04:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 22:04:59 openindiana Log info 0x31080000 received for target 14.
Oct 15 22:04:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:05:01 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 22:05:01 openindiana Log info 0x31080000 received for target 14.
Oct 15 22:05:01 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:05:01 openindiana scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1

 
Last edited:

Davie

New Member
Oct 15, 2012
7
0
0
Dubai, UAE
and when the scrub finishes I'm left with:


zpool status repository
pool: repository
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 575M in 0h7m with 0 errors on Mon Oct 15 22:10:58 2012
config:

NAME STATE READ WRITE CKSUM
repository DEGRADED 0 0 0
raidz1-0 ONLINE 0 0 0
c8t5000C5004DFD988Ed0 ONLINE 0 0 0
c8t5000C5004E369D82d0 ONLINE 0 0 0
c8t5000C5004E370D67d0 ONLINE 0 0 0
raidz1-1 DEGRADED 0 0 0
c8t5000C5004E4258A0d0 ONLINE 0 0 0
c8t5000C5004E4472E4d0 ONLINE 0 0 0
c8t5000C5004E449528d0 DEGRADED 0 0 4.86K too many errors
raidz1-2 DEGRADED 0 0 0
c8t5000C5004E449966d0 DEGRADED 0 0 43 too many errors
c8t5000C5004E4499A7d0 ONLINE 0 0 0
c8t5000C5004E449A98d0 DEGRADED 0 0 4.15K too many errors

errors: No known data errors


If I boot OI Live CD and import the pool everything is fine after a scrub