Can I resurrect this seemingly dead HGST NVME SN200?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

jbv1982

Member
Nov 5, 2017
32
16
8
49
I have acquired a couple of those Cisco HGST SN200 Ultrastar U.2 drives (I think they came out of a Cisco UCS-M5). One of them works great in my Dell, it boots right up, but the other does a PCI-E training error with my R730 and R740. My newer (14700k/Z790) gaming PC doesn’t complain about it, it just doesn’t see it.
From what I understand these can get into a diagnostic mode that bricks them but they can be recoverable. The WD hdm cli program shows nothing about the failed drive on any system.
These do have a micro USB port on them which I assume are for diagnostics etc. Is it possible to flash firmware with that or kick the drive out of diagnostics mode? I have a couple of UART/console MicroUSB.cables on my way.
Or is my best bet to find someone with a Cisco machine, or a machine that can accept hot-swapped NVME drives, and try to get it updated that way?

thank you!
I really don’t want to return this to the seller because it was $200 for 6.4TB.

right now my main problem is I can’t get it to detect on anything. I’m using a U.2 to PCI-E card which I know works from the other drive, but if I can’t see it in the OS there’s not much I can do.
 
Last edited:

CyklonDX

Well-Known Member
Nov 8, 2022
1,820
659
113
it sounds like hardware brick issue.
You can send it over someone like northwestrepair but it prob will cost you $200... and without donor board its going to be really hard.
 

NathanM3

New Member
Dec 4, 2016
9
0
1
I also have a Cisco SN200 (SN260 HHHL 3.2tb) that is having issues. It doesn't show in dm-cli or nvme list, but /dev/nvme0 exists, lspci shows it on the PCIe bus, and nvme list-subsys -vvvvv does show the drive.
dmesg shows that the drive is continuously resetting.
Code:
nvme nvme0: resetting controller due to persistent internal error
nvme nvme0: 2/0/0 default/read/poll queues
I wonder if your issue is similar to mine, and if you get anywhere with the USB cable. I tried the diagnostic mode command and it just says the device is unavailable.
 

jbv1982

Member
Nov 5, 2017
32
16
8
49
Hah yours will at least show up! The only thing I get is failure to boot on Dell servers and no error or device on my “normal” pc, and an “unknown usb device descriptor” with a normal micro USB cable. I’ll report back once I try out the UART cables
 

NathanM3

New Member
Dec 4, 2016
9
0
1
I also bought a USB uart cable. I'm not sure what baud rate or other settings to use though. I couldn't get any output from it.
 

mytime34

New Member
Aug 20, 2013
8
1
1
HGST/WDC Ultrastar SN200 Enterprise NVMe Recovery – Successful Recovery from Reset Loop / Diagnostic State

Hardware:

* HGST/WDC Ultrastar SN200 7.68TB
* Model: HUSMR7676BDP3Y1
* Firmware: KNGND110
* Initial recovery environment:

* Dell PowerEdge R740XD
* Ubuntu Linux
* Final successful recovery environment:

* Windows workstation (7950X system)
* Drive moved onto a dedicated PCIe 3.0 U.2 adapter card
* Adapter provided direct PCIe access to the SSD without enterprise backplane/riser complexity

Original Symptoms:

* Linux repeatedly logged:
"resetting controller due to persistent internal error"
* Controller appeared/disappeared every ~4.7 seconds
* No namespaces existed
* No nvmeXn1 device nodes
* BIOS did not see the drive
* Windows initially did not expose storage
* HDM on Linux could not enumerate the device
* Firmware activation attempts via nvme-cli failed
* Drive appeared stuck in recovery/diagnostic/SBL state

Important Early Findings:
Linux intermittently allowed:

* nvme id-ctrl
* fw-log
* firmware download transport
* valid identify data:

* SN: SDM0000882DA
* Model: HUSMR7676BDP3Y1
* FW: KNGND110

Firmware package inspection:

* KNGND122.bin was NOT a raw firmware image
* It was a packaged/containerized enterprise firmware bundle
* Package contained:

* FWHEADER.bin
* PROC0-15.bin
* SECURITY.bin
* FCC.bin
* StringTable.csv.gz

Extracted strings strongly suggested recovery/diagnostic behavior:

* "SYS: Go into SBL mode"
* "SYS: Crash Occurred"
* "Overlay Init Done"
* "Error: Invalid Overlay"

Key Discovery:
The controller itself was NOT dead.

Evidence:

* PCIe enumeration always worked
* Controller firmware executed repeatedly
* NVMe admin queues initialized repeatedly
* Firmware management subsystem remained functional
* Firmware slot support existed
* Controller validated firmware structures

Linux Recovery Attempts:
Tried:

* nvme fw-download
* nvme fw-activate
* namespace commands
* PCIe ASPM disable
* APST disable
* PCIe secondary bus reset
* HDM on Linux

Results:

* Firmware transport succeeded once
* Activation failed with invalid image
* Linux HDM could never enumerate device
* PCIe bridge reset triggered GHES fatal hardware error

Critical Hardware Change:
Recovery behavior improved dramatically after:

* moving the drive out of the Dell server
* installing it into a Windows workstation
* connecting it through a dedicated PCIe 3.0 U.2 adapter card

This appeared to provide:

* cleaner direct PCIe access
* more stable PCIe initialization
* better compatibility with HGST HDM tooling
* fewer enterprise backplane/PLX complications

Most Important Discovery:
Moving to Windows completely changed recovery behavior.

Windows Findings:

* Device Manager successfully detected:
"WD Ultrastar SN2xx PCIe SSD Controller"
* HDM scan succeeded
* HDM firmware management worked
* Firmware slots became visible

Initial firmware slot state:

* Running from Slot 5

Firmware slots:

* Slot 1 (RO)
* Slot 2
* Slot 3
* Slot 4
* Slot 5

All reported KNGND110 firmware.

Critical Recovery Step:
Activated alternate firmware slots using HDM.

Commands:

* activate Slot 2
* reboot
* activate Slot 3
* reboot
* eventually stable on Slot 4

Major Behavioral Changes:
Before:

* endless reset loops
* no namespace
* BIOS invisible
* no disk exposure

After slot changes:

* BIOS detected drive
* Windows detected drive
* Namespace Count became 1
* Full 7.68TB capacity exposed
* Disk became operational and formatable

Final Stable State:
HDM reports:

* Running Firmware Version = KNGND110 (Loaded from Slot 4)
* Namespace Count = 1
* Capacity = 7681501126656
* Stable PCIe Gen3 x4 link
* Stable controller enumeration

Final Conclusion:
The drive was NOT physically dead.

Root cause appears to have been:

* bad operational runtime firmware slot/state
* failed namespace/FTL initialization
* controller trapped in recovery/fallback runtime bank (Slot 5)

Switching to alternate operational slots restored:

* namespace initialization
* BIOS visibility
* stable storage exposure
* normal operation

Important Notes:

* KNGND122 firmware package could NOT be directly loaded using this HDM version
* Slot activation alone restored operation
* Do NOT assume these drives are dead simply because:

* BIOS cannot see them
* namespaces are missing
* Linux shows reset loops

Windows HDM recovery plus direct PCIe access through a dedicated PCIe 3.0 U.2 adapter card was the key breakthrough.