OOPS - PERC H740 - All drives (16) Unconfigured Bad (Unsupported) after botched sg_format

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

IonutZ

New Member
Jan 23, 2017
14
7
3
98
Long story short... SSHd into the machine with the PERC, sg_formatted 16 drives at once in the background, and lost the session around 25% lol. So all the drives are showing up as Unsupported Bad in BIOS. I thought the background & formats would be fine and that I had learned from the botched format the day before (on 1 drive by aborting 1.5% in) which somehow self-healed 10 hours later.

All the drives look like this in BIOS:
1687349004653.png

I tried clearing configuration, resetting a few times, no luck. These are all 6TB drives and I was getting a DIF error when I tried to add them to a ZFS pool.

I'm sure someone else has screwed up before (maybe not to the same order of magnitude) - but what did you do in that case? What would you do right now? :)

Cheers!

(Also can no longer see them at the OS level, PERC H740 is in Enhanced HBA mode currently)
 

sko

Active Member
Jun 11, 2021
246
129
43
sg_formatted 16 drives at once in the background, and lost the session around 25%
doesn't matter. sg_format issues a scsi-command, everything else is then handled by the drive firmware. sg_format only polls the drive and reports a status. If you'd read the sg_format manpage:
Code:
NOTES
[...]
When the --format, --preset=ID or --tape=FM option is given without the
--wait option then the corresponding SCSI command is issued with the
IMMED bit set which causes the SCSI command to return after it has
started the format operation. The --early option will cause sg_format
to exit at that point. Otherwise the DEVICE is polled every 60 seconds
or every 10 seconds if FFMT is non-zero. The poll is with TEST UNIT
READY or REQUEST SENSE commands until one reports an "all clear" (i.e.
the format operation has completed). Normally these polling commands
will result in a progress indicator (expressed as a percentage) being
output to the screen. If the user gets bored watching the progress
report then sg_format process can be terminated (e.g. with control-C)
without affecting the format operation which continues. However a
target or device reset (or a power cycle) will probably cause the
format to cease and the DEVICE to become "format corrupt".
So what actually killed the formatting process probably was you rebooting the host.

But if you'd read the manpage for sg_format:
Code:
[...] This may leave the disk in a "format corrupt"
state requiring another format to remedy the situation.
And in case "MODE SENSE/SELECT" aren't working any more on those drives: again, RTFM:
Code:
       -F, --format
[...]
When used three times (or more) the preliminary MODE SENSE and
SELECT commands are bypassed, leaving only the initial INQUIRY
and FORMAT UNIT commands. This is for emergency use (e.g. when
the MODE SENSE/SELECT commands are not working) and cannot
change the logical block size.
[...]

Also: those PERC controllers are a PITA. For low-level operations as well as ZFS just use a standard HBA that doesn't interfere by using proprietary on-disk formats/drive headers and hides drives just because it can't detect their format...
 
  • Like
Reactions: zac1

IonutZ

New Member
Jan 23, 2017
14
7
3
98
Amazing. Thanks for the response.

For low-level operations as well as ZFS just use a standard HBA that doesn't interfere by using proprietary...
What is an example of such a standard HBA you are speaking of?

@zac1 lives out of his garage.
 

i386

Well-Known Member
Mar 18, 2016
4,245
1,546
113
34
Germany
  • Like
Reactions: zac1

IonutZ

New Member
Jan 23, 2017
14
7
3
98

Whaaat

Active Member
Jan 31, 2020
315
167
43
Also avoid "smart" hbas that broadcom and microchip introduced with sas4/24gbit; it can produce the same behaviors as normal raid controllers
Even the SAS3816 (LSI 9500) is smart enough to reply with 'f*ck off, try again later' to the polling requests, lol:
Code:
C:\Dell\Drivers\984W0>perccli64_1910.exe /c0 show all
CLI Version = 007.1910.0000.0000 Oct 08, 2021
Operating system = Windows 10
Status = Failure
Description = At least one controller is busy, try after sometime


C:\Dell\Drivers\984W0>perccli64_1910.exe /c0 show all
CLI Version = 007.1910.0000.0000 Oct 08, 2021
Operating system = Windows 10
Controller = 0
Status = Success
Description = None


Basics :
======
Controller = 0
Adapter Type =   SAS3816(A0)
Model = Dell HBA350i Adp
...
LSI 9400 flashed to IT mode (sas/sata profile) - does that qualify?
yep