I bought 4x drives from THIS deals post. (Which I posted)
3x work fine.
1x is...strange.
The 3x that work were DEFINITELY used for "read intensive" workloads.
But as I said. One of them is acting up.
As boot starts, all 4x have the onboard amber LED lit.
Then the LEDs all go out.
Then they all have the green LED lit.
THEN the 1 drive lights the amber LED again after a few seconds.
All 4x drives show the same results with
Same number of active lanes. Same power states. I can't find any differences.
All 4x drives show up with
All 4x drives respond to
Except... the one weird drive has a bunch of the log values zeroed out.
It also has entries in the error log.
The non-working drive just throws a bunch of errors if you try to create a partition on it.
I've tried the normal troubleshooting steps.
The broken one stays the same.
Is this thing just dead?
I'm assuming so.
Any Ideas?
3x work fine.
1x is...strange.
The 3x that work were DEFINITELY used for "read intensive" workloads.
Code:
Data Units Read: 109,454,739,956 [56.0 PB]
Data Units Written: 577,844,157 [295 TB]
Host Read Commands: 88,021,380,192
Host Write Commands: 653,025,118
...
Power On Hours: 42,956
But as I said. One of them is acting up.
As boot starts, all 4x have the onboard amber LED lit.
Then the LEDs all go out.
Then they all have the green LED lit.
THEN the 1 drive lights the amber LED again after a few seconds.
All 4x drives show the same results with
lspci -vv
.Same number of active lanes. Same power states. I can't find any differences.
All 4x drives show up with
lsblk
.All 4x drives respond to
smartctl -a DRIVE
.Except... the one weird drive has a bunch of the log values zeroed out.
It also has entries in the error log.
Code:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: HUSPR3238ADP301
Serial Number: STM0001A9185
Firmware Version: KMGNP131
PCI Vendor/Subsystem ID: 0x1c58
IEEE OUI Identifier: 0x000cca
Controller ID: 3
NVMe Version: <1.2
Number of Namespaces: 1
Namespace 1 Size/Capacity: 3,820,752,101,376 [3.82 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 000cca 0060074b80
Local Time is: Thu Oct 24 22:33:41 2024 EDT
Firmware Updates (0x09): 4 Slots, Slot 1 R/O
Optional Admin Commands (0x0006): Format Frmw_DL
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x01): S/H_per_NS
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 25.00W - - 0 0 0 0 15000 15000
1 + 20.00W - - 1 1 1 1 15000 15000
2 + 15.00W - - 2 2 2 2 15000 15000
3 + 10.00W - - 3 3 3 3 15000 15000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
1 - 512 8 2
2 - 4096 0 0
3 - 4096 8 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 109,454,739,956 [56.0 PB]
Data Units Written: 577,844,157 [295 TB]
Host Read Commands: 88,021,380,192
Host Write Commands: 653,025,118
Controller Busy Time: 2,403,146
Power Cycles: 78
Power On Hours: 42,956
Unsafe Shutdowns: 72
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged
Code:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: HUSPR3238ADP301
Serial Number: CJH001000DB6
Firmware Version: KMGNP131
PCI Vendor/Subsystem ID: 0x1c58
IEEE OUI Identifier: 0x000cca
Controller ID: 3
NVMe Version: <1.2
Number of Namespaces: 1
Namespace 1 Size/Capacity: 3,820,752,101,376 [3.82 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 000cca 0061164801
Local Time is: Thu Oct 24 22:33:57 2024 EDT
Firmware Updates (0x09): 4 Slots, Slot 1 R/O
Optional Admin Commands (0x0006): Format Frmw_DL
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x01): S/H_per_NS
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 25.00W - - 0 0 0 0 15000 15000
1 + 20.00W - - 1 1 1 1 15000 15000
2 + 15.00W - - 2 2 2 2 15000 15000
3 + 10.00W - - 3 3 3 3 15000 15000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
1 - 512 8 2
2 - 4096 0 0
3 - 4096 8 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 45 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 0
Data Units Written: 0
Host Read Commands: 0
Host Write Commands: 0
Controller Busy Time: 0
Power Cycles: 0
Power On Hours: 42,290
Unsafe Shutdowns: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 2
Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 2 - - 0xdead - 0 0 -
1 1 - - 0xdead - 0 0 -
The non-working drive just throws a bunch of errors if you try to create a partition on it.
I've tried the normal troubleshooting steps.
- Switch the ports
- Use a different cable/card
The broken one stays the same.
Is this thing just dead?
I'm assuming so.
Any Ideas?