I am having troubles getting my Supermicro X8SI6-F motherboard to use SATA disks connected to its LSI SAS2008 SAS/SATA (RAID) controller in Just a Bunch Of Disks (JBOD, AKA IT (Initiator Target)) mode.
I am using Linux (Fedora 14) with the latest kernel (2.6.35.14-95.fc14.x86_64).
I have downloaded and installed the latest LSI SAS2008 firmware and BIOS from Supermicro (ftp://ftp.supermicro.com/driver/SAS/LSI/2008/IT/Firmware/PH10.zip).
I have also downloaded, compiled and installed the latest mpt2sas driver (both from Supermicro and LSI - they are the same).
I connected 2 Western Digital RE4-GP 2TB drives
http://www.wdc.com/en/products/products.aspx?id=40
(I have tried some other old Maxtor 160GB drives as well, with similar results)
I checked that the LSI BIOS reported them as directly attached.
I successfully formatted both drives from the BIOS.
The kernel appears to detect them (sd{f,g} at boot time):
> Sep 16 20:39:41 server kernel: [ 6.037674] mpt2sas version 10.00.00.00 loaded
> Sep 16 20:39:41 server kernel: [ 6.038140] scsi6 : Fusion MPT SAS Host
> Sep 16 20:39:41 server kernel: [ 6.038715] mpt2sas 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> Sep 16 20:39:41 server kernel: [ 6.039028] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8187144 kB)
> Sep 16 20:39:41 server kernel: [ 6.039638] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 51
> Sep 16 20:39:41 server kernel: [ 6.039902] mpt2sas0: iomem(0x00000000fb43c000), mapped(0xffffc90011a40000), size(16384)
> Sep 16 20:39:41 server kernel: [ 6.040393] mpt2sas0: ioport(0x000000000000c000), size(256)
> Sep 16 20:39:41 server kernel: [ 6.114914] mpt2sas0: sending message unit reset !!
> Sep 16 20:39:41 server kernel: [ 6.117934] mpt2sas0: message unit reset: SUCCESS
> Sep 16 20:39:41 server kernel: [ 6.167613] mpt2sas0: Allocated physical memory: size(8051 kB)
> Sep 16 20:39:41 server kernel: [ 6.167884] mpt2sas0: Current Controller Queue Depth(3577), Max Controller Queue Depth(3712)
> Sep 16 20:39:41 server kernel: [ 6.168356] mpt2sas0: Scatter Gather Elements per IO(128)
> Sep 16 20:39:41 server kernel: [ 6.228980] mpt2sas0: LSISAS2008: FWVersion(10.00.02.00), ChipRevision(0x02), BiosVersion(07.19.00.00)
> Sep 16 20:39:41 server kernel: [ 6.229459] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
> Sep 16 20:39:41 server kernel: [ 6.230700] mpt2sas0: sending port enable !!
> Sep 16 20:39:41 server kernel: [ 6.233564] mpt2sas0: host_add: handle(0x0001), sas_addr(0x5003048000771f30), phys(8)
> Sep 16 20:39:41 server kernel: [ 6.241981] mpt2sas0: port enable: SUCCESS
> Sep 16 20:39:41 server kernel: [ 6.249474] scsi 6:0:0:0: Direct-Access ATA WDC WD2002FYPS-0 1G01 PQ: 0 ANSI: 6
> Sep 16 20:39:41 server kernel: [ 6.250033] scsi 6:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221100000000), phy(0), device_name(0x50014ee25b264649)
> Sep 16 20:39:41 server kernel: [ 6.250505] scsi 6:0:0:0: SATA: enclosure_logical_id(0x5003048000771f30), slot(0)
> Sep 16 20:39:41 server kernel: [ 6.251204] scsi 6:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
> Sep 16 20:39:41 server kernel: [ 6.254996] scsi 6:0:0:0: serial_number( WD-WCAVY6988280)
> Sep 16 20:39:41 server kernel: [ 6.255255] scsi 6:0:0:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
> Sep 16 20:39:41 server kernel: [ 6.262965] scsi 6:0:1:0: Direct-Access ATA WDC WD2002FYPS-0 1G01 PQ: 0 ANSI: 6
> Sep 16 20:39:41 server kernel: [ 6.263445] scsi 6:0:1:0: SATA: handle(0x000a), sas_addr(0x4433221101000000), phy(1), device_name(0x50014ee205d20340)
> Sep 16 20:39:41 server kernel: [ 6.263926] scsi 6:0:1:0: SATA: enclosure_logical_id(0x5003048000771f30), slot(1)
> Sep 16 20:39:41 server kernel: [ 6.264528] scsi 6:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
> Sep 16 20:39:41 server kernel: [ 6.268400] scsi 6:0:1:0: serial_number( WD-WCAVY6979293)
> Sep 16 20:39:41 server kernel: [ 6.268670] scsi 6:0:1:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
> Sep 16 20:39:41 server kernel: [ 6.269570] sd 6:0:0:0: Attached scsi generic sg5 type 0
> Sep 16 20:39:41 server kernel: [ 6.270045] sd 6:0:1:0: Attached scsi generic sg6 type 0
> Sep 16 20:39:41 server kernel: [ 6.273153] sd 6:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
> Sep 16 20:39:41 server kernel: [ 6.273726] sd 6:0:1:0: [sdg] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
> Sep 16 20:39:41 server kernel: [ 6.292464] sd 6:0:0:0: [sdf] Write Protect is off
> Sep 16 20:39:41 server kernel: [ 6.292931] sd 6:0:1:0: [sdg] Write Protect is off
> Sep 16 20:39:41 server kernel: [ 6.299462] sd 6:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Sep 16 20:39:41 server kernel: [ 6.299963] sd 6:0:1:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Sep 16 20:39:41 server kernel: [ 6.329423] sdf:
> Sep 16 20:39:41 server kernel: [ 6.329597] sdg: unknown partition table
> Sep 16 20:39:41 server kernel: [ 6.330713] unknown partition table
> Sep 16 20:39:41 server kernel: [ 6.393289] sd 6:0:1:0: [sdg] Attached SCSI disk
> Sep 16 20:39:41 server kernel: [ 6.396883] sd 6:0:0:0: [sdf] Attached SCSI disk
However, when writing to them
> dd if=/dev/zero bs=4M of=/dev/sdf
things go wrong.
I get the following messages over and over:
> Sep 16 20:41:19 server kernel: [ 115.875188] sd 6:0:0:0: attempting task abort! scmd(ffff88021ee01400)
> Sep 16 20:41:19 server kernel: [ 115.875194] sd 6:0:0:0: [sdf] CDB: Write(10): 2a 00 00 00 0c 00 00 00 08 00
> Sep 16 20:41:19 server kernel: [ 115.875211] scsi target6:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0)
> Sep 16 20:41:19 server kernel: [ 115.875216] scsi target6:0:0: enclosure_logical_id(0x5003048000771f30), slot(0)
> Sep 16 20:41:19 server kernel: [ 115.875226] mpt2sas0: sending diag reset !!
> Sep 16 20:41:19 server kernel: [ 116.026998] mpt2sas0: diag reset: FAILED
> Sep 16 20:41:19 server kernel: [ 116.027005] sd 6:0:0:0: task abort: FAILED scmd(ffff88021ee01400)
until it finally gives up and says, over and over:
> Sep 16 20:41:24 server kernel: [ 121.190971] sd 6:0:0:0: Device offlined - not ready after error recovery
then
> Sep 16 20:41:24 server kernel: [ 121.191104] sd 6:0:0:0: [sdf] Unhandled error code
> Sep 16 20:41:24 server kernel: [ 121.191109] sd 6:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> Sep 16 20:41:24 server kernel: [ 121.191117] sd 6:0:0:0: [sdf] CDB: Write(10): 2a 00 00 00 0c 00 00 00 08 00
> Sep 16 20:41:24 server kernel: [ 121.191137] end_request: I/O error, dev sdf, sector 3072
> Sep 16 20:41:24 server kernel: [ 121.191144] Buffer I/O error on device sdf, logical block 384
> Sep 16 20:41:24 server kernel: [ 121.191149] lost page write due to I/O error on sdf
and over and over
> Sep 16 20:41:24 server kernel: [ 121.191167] sd 6:0:0:0: rejecting I/O to offline device
and
> Sep 16 20:41:24 server kernel: [ 121.206643] sd 6:0:0:0: [sdf] Unhandled error code
> Sep 16 20:41:24 server kernel: [ 121.206646] sd 6:0:0:0: [sdf] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Sep 16 20:41:24 server kernel: [ 121.206651] sd 6:0:0:0: [sdf] CDB: Write(10): 2a 00 00 00 64 10 00 00 08 00
> Sep 16 20:41:24 server kernel: [ 121.206662] end_request: I/O error, dev sdf, sector 25616
> Sep 16 20:41:24 server kernel: [ 121.206666] Buffer I/O error on device sdf, logical block 3202
> Sep 16 20:41:24 server kernel: [ 121.206668] lost page write due to I/O error on sdf
> Sep 16 20:41:24 server kernel: [ 121.206692] sd 6:0:0:0: [sdf] Unhandled error code
> Sep 16 20:41:24 server kernel: [ 121.206695] sd 6:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> Sep 16 20:41:24 server kernel: [ 121.206699] sd 6:0:0:0: [sdf] CDB:
> Sep 16 20:41:24 server kernel: [ 121.206701] sd 6:0:0:0: rejecting I/O to offline device
> Sep 16 20:41:24 server kernel: [ 121.206704] Write(10): 2a 00 00 00
> Sep 16 20:41:24 server kernel: [ 121.206709] sd 6:0:0:0: rejecting I/O to offline device
> Sep 16 20:41:24 server kernel: [ 121.206711] 08 00 00 04 00 00
> Sep 16 20:41:24 server kernel: [ 121.206717] end_request: I/O error, dev sdf, sector 2048
> Sep 16 20:41:24 server kernel: [ 121.206720] Buffer I/O error on device sdf, logical block 256
> Sep 16 20:41:24 server kernel: [ 121.206722] lost page write due to I/O error on sdf
> Sep 16 20:41:24 server kernel: [ 121.206726] Buffer I/O error on device sdf, logical block 257
> ...
> Sep 16 20:41:24 server kernel: [ 121.206898] sd 6:0:0:0: [sdf] Unhandled error code
> Sep 16 20:41:24 server kernel: [ 121.206901] sd 6:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> Sep 16 20:41:24 server kernel: [ 121.206905] sd 6:0:0:0: [sdf] CDB: Write(10): 2a 00 00 00 04 00
> Sep 16 20:41:24 server kernel: [ 121.206914] sd 6:0:0:0: rejecting I/O to offline device
and on and on.
I am using Linux (Fedora 14) with the latest kernel (2.6.35.14-95.fc14.x86_64).
I have downloaded and installed the latest LSI SAS2008 firmware and BIOS from Supermicro (ftp://ftp.supermicro.com/driver/SAS/LSI/2008/IT/Firmware/PH10.zip).
I have also downloaded, compiled and installed the latest mpt2sas driver (both from Supermicro and LSI - they are the same).
I connected 2 Western Digital RE4-GP 2TB drives
http://www.wdc.com/en/products/products.aspx?id=40
(I have tried some other old Maxtor 160GB drives as well, with similar results)
I checked that the LSI BIOS reported them as directly attached.
I successfully formatted both drives from the BIOS.
The kernel appears to detect them (sd{f,g} at boot time):
> Sep 16 20:39:41 server kernel: [ 6.037674] mpt2sas version 10.00.00.00 loaded
> Sep 16 20:39:41 server kernel: [ 6.038140] scsi6 : Fusion MPT SAS Host
> Sep 16 20:39:41 server kernel: [ 6.038715] mpt2sas 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> Sep 16 20:39:41 server kernel: [ 6.039028] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8187144 kB)
> Sep 16 20:39:41 server kernel: [ 6.039638] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 51
> Sep 16 20:39:41 server kernel: [ 6.039902] mpt2sas0: iomem(0x00000000fb43c000), mapped(0xffffc90011a40000), size(16384)
> Sep 16 20:39:41 server kernel: [ 6.040393] mpt2sas0: ioport(0x000000000000c000), size(256)
> Sep 16 20:39:41 server kernel: [ 6.114914] mpt2sas0: sending message unit reset !!
> Sep 16 20:39:41 server kernel: [ 6.117934] mpt2sas0: message unit reset: SUCCESS
> Sep 16 20:39:41 server kernel: [ 6.167613] mpt2sas0: Allocated physical memory: size(8051 kB)
> Sep 16 20:39:41 server kernel: [ 6.167884] mpt2sas0: Current Controller Queue Depth(3577), Max Controller Queue Depth(3712)
> Sep 16 20:39:41 server kernel: [ 6.168356] mpt2sas0: Scatter Gather Elements per IO(128)
> Sep 16 20:39:41 server kernel: [ 6.228980] mpt2sas0: LSISAS2008: FWVersion(10.00.02.00), ChipRevision(0x02), BiosVersion(07.19.00.00)
> Sep 16 20:39:41 server kernel: [ 6.229459] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
> Sep 16 20:39:41 server kernel: [ 6.230700] mpt2sas0: sending port enable !!
> Sep 16 20:39:41 server kernel: [ 6.233564] mpt2sas0: host_add: handle(0x0001), sas_addr(0x5003048000771f30), phys(8)
> Sep 16 20:39:41 server kernel: [ 6.241981] mpt2sas0: port enable: SUCCESS
> Sep 16 20:39:41 server kernel: [ 6.249474] scsi 6:0:0:0: Direct-Access ATA WDC WD2002FYPS-0 1G01 PQ: 0 ANSI: 6
> Sep 16 20:39:41 server kernel: [ 6.250033] scsi 6:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221100000000), phy(0), device_name(0x50014ee25b264649)
> Sep 16 20:39:41 server kernel: [ 6.250505] scsi 6:0:0:0: SATA: enclosure_logical_id(0x5003048000771f30), slot(0)
> Sep 16 20:39:41 server kernel: [ 6.251204] scsi 6:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
> Sep 16 20:39:41 server kernel: [ 6.254996] scsi 6:0:0:0: serial_number( WD-WCAVY6988280)
> Sep 16 20:39:41 server kernel: [ 6.255255] scsi 6:0:0:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
> Sep 16 20:39:41 server kernel: [ 6.262965] scsi 6:0:1:0: Direct-Access ATA WDC WD2002FYPS-0 1G01 PQ: 0 ANSI: 6
> Sep 16 20:39:41 server kernel: [ 6.263445] scsi 6:0:1:0: SATA: handle(0x000a), sas_addr(0x4433221101000000), phy(1), device_name(0x50014ee205d20340)
> Sep 16 20:39:41 server kernel: [ 6.263926] scsi 6:0:1:0: SATA: enclosure_logical_id(0x5003048000771f30), slot(1)
> Sep 16 20:39:41 server kernel: [ 6.264528] scsi 6:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
> Sep 16 20:39:41 server kernel: [ 6.268400] scsi 6:0:1:0: serial_number( WD-WCAVY6979293)
> Sep 16 20:39:41 server kernel: [ 6.268670] scsi 6:0:1:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
> Sep 16 20:39:41 server kernel: [ 6.269570] sd 6:0:0:0: Attached scsi generic sg5 type 0
> Sep 16 20:39:41 server kernel: [ 6.270045] sd 6:0:1:0: Attached scsi generic sg6 type 0
> Sep 16 20:39:41 server kernel: [ 6.273153] sd 6:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
> Sep 16 20:39:41 server kernel: [ 6.273726] sd 6:0:1:0: [sdg] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
> Sep 16 20:39:41 server kernel: [ 6.292464] sd 6:0:0:0: [sdf] Write Protect is off
> Sep 16 20:39:41 server kernel: [ 6.292931] sd 6:0:1:0: [sdg] Write Protect is off
> Sep 16 20:39:41 server kernel: [ 6.299462] sd 6:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Sep 16 20:39:41 server kernel: [ 6.299963] sd 6:0:1:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Sep 16 20:39:41 server kernel: [ 6.329423] sdf:
> Sep 16 20:39:41 server kernel: [ 6.329597] sdg: unknown partition table
> Sep 16 20:39:41 server kernel: [ 6.330713] unknown partition table
> Sep 16 20:39:41 server kernel: [ 6.393289] sd 6:0:1:0: [sdg] Attached SCSI disk
> Sep 16 20:39:41 server kernel: [ 6.396883] sd 6:0:0:0: [sdf] Attached SCSI disk
However, when writing to them
> dd if=/dev/zero bs=4M of=/dev/sdf
things go wrong.
I get the following messages over and over:
> Sep 16 20:41:19 server kernel: [ 115.875188] sd 6:0:0:0: attempting task abort! scmd(ffff88021ee01400)
> Sep 16 20:41:19 server kernel: [ 115.875194] sd 6:0:0:0: [sdf] CDB: Write(10): 2a 00 00 00 0c 00 00 00 08 00
> Sep 16 20:41:19 server kernel: [ 115.875211] scsi target6:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0)
> Sep 16 20:41:19 server kernel: [ 115.875216] scsi target6:0:0: enclosure_logical_id(0x5003048000771f30), slot(0)
> Sep 16 20:41:19 server kernel: [ 115.875226] mpt2sas0: sending diag reset !!
> Sep 16 20:41:19 server kernel: [ 116.026998] mpt2sas0: diag reset: FAILED
> Sep 16 20:41:19 server kernel: [ 116.027005] sd 6:0:0:0: task abort: FAILED scmd(ffff88021ee01400)
until it finally gives up and says, over and over:
> Sep 16 20:41:24 server kernel: [ 121.190971] sd 6:0:0:0: Device offlined - not ready after error recovery
then
> Sep 16 20:41:24 server kernel: [ 121.191104] sd 6:0:0:0: [sdf] Unhandled error code
> Sep 16 20:41:24 server kernel: [ 121.191109] sd 6:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> Sep 16 20:41:24 server kernel: [ 121.191117] sd 6:0:0:0: [sdf] CDB: Write(10): 2a 00 00 00 0c 00 00 00 08 00
> Sep 16 20:41:24 server kernel: [ 121.191137] end_request: I/O error, dev sdf, sector 3072
> Sep 16 20:41:24 server kernel: [ 121.191144] Buffer I/O error on device sdf, logical block 384
> Sep 16 20:41:24 server kernel: [ 121.191149] lost page write due to I/O error on sdf
and over and over
> Sep 16 20:41:24 server kernel: [ 121.191167] sd 6:0:0:0: rejecting I/O to offline device
and
> Sep 16 20:41:24 server kernel: [ 121.206643] sd 6:0:0:0: [sdf] Unhandled error code
> Sep 16 20:41:24 server kernel: [ 121.206646] sd 6:0:0:0: [sdf] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Sep 16 20:41:24 server kernel: [ 121.206651] sd 6:0:0:0: [sdf] CDB: Write(10): 2a 00 00 00 64 10 00 00 08 00
> Sep 16 20:41:24 server kernel: [ 121.206662] end_request: I/O error, dev sdf, sector 25616
> Sep 16 20:41:24 server kernel: [ 121.206666] Buffer I/O error on device sdf, logical block 3202
> Sep 16 20:41:24 server kernel: [ 121.206668] lost page write due to I/O error on sdf
> Sep 16 20:41:24 server kernel: [ 121.206692] sd 6:0:0:0: [sdf] Unhandled error code
> Sep 16 20:41:24 server kernel: [ 121.206695] sd 6:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> Sep 16 20:41:24 server kernel: [ 121.206699] sd 6:0:0:0: [sdf] CDB:
> Sep 16 20:41:24 server kernel: [ 121.206701] sd 6:0:0:0: rejecting I/O to offline device
> Sep 16 20:41:24 server kernel: [ 121.206704] Write(10): 2a 00 00 00
> Sep 16 20:41:24 server kernel: [ 121.206709] sd 6:0:0:0: rejecting I/O to offline device
> Sep 16 20:41:24 server kernel: [ 121.206711] 08 00 00 04 00 00
> Sep 16 20:41:24 server kernel: [ 121.206717] end_request: I/O error, dev sdf, sector 2048
> Sep 16 20:41:24 server kernel: [ 121.206720] Buffer I/O error on device sdf, logical block 256
> Sep 16 20:41:24 server kernel: [ 121.206722] lost page write due to I/O error on sdf
> Sep 16 20:41:24 server kernel: [ 121.206726] Buffer I/O error on device sdf, logical block 257
> ...
> Sep 16 20:41:24 server kernel: [ 121.206898] sd 6:0:0:0: [sdf] Unhandled error code
> Sep 16 20:41:24 server kernel: [ 121.206901] sd 6:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> Sep 16 20:41:24 server kernel: [ 121.206905] sd 6:0:0:0: [sdf] CDB: Write(10): 2a 00 00 00 04 00
> Sep 16 20:41:24 server kernel: [ 121.206914] sd 6:0:0:0: rejecting I/O to offline device
and on and on.