Constellation ES3 drives retired from Solaris

Eduard · Jun 16, 2014

Hello everyone,
Really happy that the site and forum is back online!
I run into this issue a week ago and STH was not working anymore...so I felt a little lost

Anyways, I just added 6 Constellation ES3 SAS 4TB to my ZFS Server.
The drives are not NEW but they seems in a perfect condition and the warranty expires in 2018.
For some reasons just 2 drives are working...RED LED on the other 4.
I checked the drives in windows and seems everything ok, but no way in Solaris.
When I boot the system the onboard LSI controller recognize all the connected drives but once OmniOS is booted I see a "One or more I/O devices have been retired" message.
So I checked with fmadm faulty if something was wrong and that's the result:

Code:

--------------- ------------------------------------ -------------- ---------

TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Jun 06 23:40:00 e50e15b5-19d4-49d2-f1df-a7c715df3106 DISK-8000-12 Major
Host : server
Platform : X9SRH-7F-7TF Chassis_id : 0123456789
Product_sn :
Fault class : fault.io.disk.over-temperature
Affects : dev:///:devid=id1,sd@n5000c50057b034b3//pci@0,0/pci8086,e08@3/pci15d9,691@0/iport@f0/disk@w5000c50057b034b2,0
faulted and taken out of service
FRU : "Slot 09" (hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=5003048000b4aa3f:serial=Z1Z2G3X10000C416CR4X:part=SEAGATE-ST4000NM0023:revision=0003/ses-enclosure=0/bay=8/disk=0)
faulty
Description : A disk's temperature exceeded the limits established by its manufacturer.
Refer to DISK-8000-12 for more information.
Response : None.
Impact : Performance degradation is likely and continued disk operation beyond the temperature threshold can result in disk damage and potential data loss.
Action : Ensure that the system is properly cooled, that all fans are functional, and that there are no obstructions of airflow to the affected disk.

It's weird since I get this message also if the drives are not warm or totally cold, so seems not related to temp...
Another thing I noticed is that the two working drives comes with an "A001" firmware and the others with a "0003" firmware. It's just a coincidence?
I sent a mail to the Seagate support, they don't know the differences between the two firmwares...! I really don't know what to do, Solaris doesn't like my enterprise drives...What do you guys think? I should RMA the not working drives? Thanks in advance.

PS: I'm also having hard time to read the smart info of the retired drives.

Diavuno · Jun 17, 2014

plug them into a windows box, run a diskpart clean. run seatools for windows.

that should let you know where the drives are at including smart.

clean:

open cmd as admin.
diskpart "enter"
list disk "enter"
select disk (number that represents your ES3 [example: "select disk 2"]) "enter"
clean "enter"

When successful it should wipe any MBR/GPT/RAID card sectors.

Eduard · Jun 17, 2014

Thanks a lot for the quick reply.
Actually before moving the drives to Solaris I cleaned them using the LSI drive erase utility.
Btw today I connected them to an HBA on a win machine and finally with SeaTools i'm able to check the smartinfo, the drives seems in a very good condition.
I just cleaned them using diskpart as you suggested but still the same temperature related error.

I have a pool in raid10 with 6 simple constellation CS 3TB, since the day I built the ZFS Server I never had a single problem with them...never imagined to have all these issues with "proper" enterprise drives...

starshooter10 said:
plug them into a windows box, run a diskpart clean. run seatools for windows.

that should let you know where the drives are at including smart.

clean:

open cmd as admin.
diskpart "enter"
list disk "enter"
select disk (number that represents your ES3 [example: "select disk 2"]) "enter"
clean "enter"

When successful it should wipe any MBR/GPT/RAID card sectors.

Diavuno · Jun 17, 2014

update your controller card firmware?

Eduard · Jun 18, 2014

No way...

also after flashing to the latest IT firmware (19.00.00.00 - Apr, 2014) still the same issue, the 2 drives with A001 works but the others with 0003 firmware are always retired...
I tried to install solaris again, just because i'm desperate but I ended up always with the same result.
Could be the expander maybe? But why 2 drives are working?

UPDATE: Connecting the drives direcly to a LSI 9207 HBA, everything works fine...I'm still trying to figure out whats going on...

Seymour · Jun 19, 2014

power issue?

tl;dr

One way to test the power hypothesis is to split the power in your server while changing as little else as possible. Power your expander/backplane using the same PS from the successful drive test.

Do make sure there's a solid direct ground between the two power supplies ( don't trust the IEC or ac outlet connection ). You can use a paperclip to turn on your second power supply w/o a motherboard ( i.e How do I manually turn on an ATX power supply? | techPowerUp and many others )

"When you have eliminated the impossible, whatever remains, however improbable, must be the truth"

mwm2000 · Jun 19, 2014

We had this problem with the Seagate ES3 disks. I would guess they are on firmware version 0003. This has a nasty bug the 'Reference temperature" is set to 40C.

Seagate has released 0004 firmware which corrects this to 60C which is their highest operating temperature as published.

I would suggest you download this firmware as our disks have been fine since we updated the firmware.

Hope this helps.

Eduard · Jun 19, 2014

mwm2000 said:
We had this problem with the Seagate ES3 disks. I would guess they are on firmware version 0003. This has a nasty bug the 'Reference temperature" is set to 40C.

Seagate has released 0004 firmware which corrects this to 60C which is their highest operating temperature as published.

I would suggest you download this firmware as our disks have been fine since we updated the firmware.

Hope this helps.

Thanks a lot man!! After days speaking with the Supermicro and Seagate customer support I finally solved the issue.
I had to reinstall Omnios though. Solaris was retiring the drives also with the new 0004!

Not sure why Seagate Support told me:

the drive you referenced (ES3 4TB with 0003 firmware) does not need a firmware update all is fine and has the most up to date firmware.
You will need to consult with ZFS Server (Solaris). Why you are having issues there is nothing wrong or updates needed for firmware.

I asked them also about the A001 firmware , since 2 drives with this firmware were working fine and they replied:

Unfortunately, we do not receive any notes on firmwares. Below is the only link we have regarding firmwares.

I'm a bit disappointed, I was expecting a much better support from Seagate...especially for enterprise stuff...

Thanks again

Eduard · Jun 20, 2014

After few hours I noticed that the ES3 in the bays of my SC846 chassis were kinda hot..like 60-62 degrees.
Now 3 drives have been retired, but this time because they were really damn hot.
I managed to leave empty the bays over each hdd but of course this is just temporary.
Now the temp is 55-56 degrees but seems I can't fill up all the bays with this kind of drive...
I have 6 stacked Constellation CS 3TB since i built the server and the temp never exceed 45-47 degrees.

Search

Constellation ES3 drives retired from Solaris

Eduard

New Member

Diavuno

Active Member

Eduard

New Member

Diavuno

Active Member

Eduard

New Member

Seymour

New Member

mwm2000

New Member

Eduard

New Member

Eduard

New Member