I'm cross posting this with the freenas forums.
Hoping someone here might now anything about this.
Today I noticed one of my freenas servers was in a degraded state. I found out a bit late it seems because my email moved the messages to "clutter" (sigh). Anyway, I'm just trying to determine if anyone might see something other than a disk issue.
When I logged into the web GUI (and initially in the zpool status shown below) the disk had a few hundred write errors showing for that disk. (Mind you, I've seen something like this if a disk falls out of the zfs raid while data is being written, it tries for a while before it realizes the disk isn't available).
In dmesg it shows;
The volume initially showed the disk as unavailable;
I rebooted the server but no change. So I had someone on site remove the disk for me, 1 to check the S/N and second to see if I could online it and have it rebuild itself. After removing it the status of the disk changed to "removed". Subsequent reboots of the server have made the volume show as "resilvering" but the disk never came online, even after trying to force it online through zfs online command. It seems the disk initially shows as "unavailable" after reboot and during resilvering, but the disk is now back to showing "removed".
Further more I can't even see the disk in smartctl. It just seems like it is being removed per the dmesg shown above, "(da5:mps0:0:13:0): Periph destroyed". I had hoped to try to check the smartctl readings, but can't since the disk isn't showing up at all.
My gut says the disk went bad. I filed a RMA for it and will go down to check it tomorrow. But perhaps someone might have an idea.
Some more information that I'm sure people will be looking for:
The disks are all the same make/model.
The server was put together in Late March/Early April.
The server was stress tested using the scripts jgreco posted in the freenas forums somewhere. Had no issues.
This is the first real issue I've had with it.
Specifications;
The errors at the top of dmesg look a lot like the ones in THIS thread.
Hoping someone here might now anything about this.
Today I noticed one of my freenas servers was in a degraded state. I found out a bit late it seems because my email moved the messages to "clutter" (sigh). Anyway, I'm just trying to determine if anyone might see something other than a disk issue.
When I logged into the web GUI (and initially in the zpool status shown below) the disk had a few hundred write errors showing for that disk. (Mind you, I've seen something like this if a disk falls out of the zfs raid while data is being written, it tries for a while before it realizes the disk isn't available).
In dmesg it shows;
Code:
(da5:mps0:0:13:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 555 command timeout cm 0xffffff8000b02718 ccb 0xfffffe004101f000
(noperiph:mps0:0:4294967295:0): SMID 1 Aborting command 0xffffff8000b02718
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 93 50 00 00 40 00 length 32768 SMID 337 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 93 10 00 00 40 00 length 32768 SMID 363 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 92 d0 00 00 40 00 length 32768 SMID 841 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 92 90 00 00 40 00 length 32768 SMID 220 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 92 50 00 00 40 00 length 32768 SMID 748 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 92 10 00 00 40 00 length 32768 SMID 321 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 91 d0 00 00 40 00 length 32768 SMID 515 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 91 90 00 00 40 00 length 32768 SMID 745 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 91 50 00 00 40 00 length 32768 SMID 868 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 10 74 8e a0 00 00 40 00 length 32768 SMID 632 terminated ioc 804b scsi 0 state c xfer 0
(da5:mps0:0:13:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 466 terminated ioc 804b scsi 0 state c xfer 0
mps0: IOCStatus = 0x4b while resetting device 0xf
(da5:mps0:0:13:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da5:mps0:0:13:0): CAM status: Command timeout
(da5:mps0:0:13:0): Retrying command
da5 at mps0 bus 0 scbus0 target 13 lun 0
da5: <ATA TOSHIBA MG03ACA3 FL1A> s/n 53K7K7JPF detached
(da5:mps0:0:13:0): Periph destroyed
Code:
[root@freenas] ~# zpool status -v store
pool: store
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 0h26m with 0 errors on Sun Jul 19 00:26:39 2015
config:
NAME STATE READ WRITE CKSUM
store DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/1c383e96-d315-11e4-98c7-0cc47a335ac4 ONLINE 0 0 0
gptid/90b50eaf-d315-11e4-98c7-0cc47a335ac4 ONLINE 0 0 0
gptid/284a6fc3-d316-11e4-98c7-0cc47a335ac4 ONLINE 0 0 0
gptid/c66e0391-d317-11e4-98c7-0cc47a335ac4 ONLINE 0 0 0
gptid/14a02475-d318-11e4-98c7-0cc47a335ac4 ONLINE 0 0 0
559548462891584750 UNAVAIL 3 246 0 was /dev/gptid/5178ef38-d319-11e4-98c7-0cc47a335ac4
Further more I can't even see the disk in smartctl. It just seems like it is being removed per the dmesg shown above, "(da5:mps0:0:13:0): Periph destroyed". I had hoped to try to check the smartctl readings, but can't since the disk isn't showing up at all.
My gut says the disk went bad. I filed a RMA for it and will go down to check it tomorrow. But perhaps someone might have an idea.
Some more information that I'm sure people will be looking for:
The disks are all the same make/model.
The server was put together in Late March/Early April.
The server was stress tested using the scripts jgreco posted in the freenas forums somewhere. Had no issues.
This is the first real issue I've had with it.
Specifications;
Code:
Case: SuperMicro CSE-826E16-R1200LPB
Backplane: BPN-SAS2-826EL1
Motherboard: SUPERMICRO MBD-X10SL7-F-O
HBA: onboard LSI 2308 (firmware P16, as recommended by FreeNAS)
CPU: Intel Xeon E3-1231V3
RAM: Crucial CT2KIT102472BD160B (2 x 8GB)
HDD: 6 x Toshiba MG03ACA300 3TB Enterprise SATA
Norcoo SFF8087 reverse breakout cable
Last edited: