Intel SSD DC P4500 15.3TB - "Disable Logical State" "NVMe Status 0x4006"- anything to do ?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

netswitch

Member
Sep 24, 2018
44
18
8
Hello

I have a pair of Intel SSD DC P4500 15.3Tb that was running a softwaire raid1 that just seem to have died.

SmartCtl returns the following :
Code:
 smartctl -x /dev/nvme0
smartctl 7.1 2020-08-23 r5080 [x86_64-linux-4.18.0-305.el8.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE2KX160T7
Serial Number:                      PHLF7366004116PRGN
Firmware Version:                   QDV10161
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Total NVM Capacity:                 69,793,218,560 [69.7 GB]
Unallocated NVM Capacity:           0
Controller ID:                      0
Number of Namespaces:               0
Local Time is:                      Wed Oct 11 07:13:08 2023 EDT
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W       -        -    0  0  0  0        0       0

=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x4006
I can see the drive with nvme list-subsys :

Code:
 nvme list-subsys
nvme-subsys0 - NQN=nqn.2014.08.org.nvmexpress:80868086PHLF7366004116PRGN  INTEL SSDPE2KX160T7
\
 +- nvme0 pcie 0000:43:00.0 live
but nothing seems possible to get it back on track.

The drive is only 4 years old, I think it is just some software/firmware issue but I don't see many poeple having the same errors.
I also don"t know who is supporting this drive today, Solidigm or Intel, none of them is referencing it as a supported product, I can't open a ticket at any of them.

Anybody has an idea of what I could do ?
 

gb00s

Well-Known Member
Jul 25, 2018
1,190
602
113
Poland
Oddly, there's no namespace. Usually, with error 0x4006 you should see something like " ....disable logical state ...". If so this is a catastrophic failure of a drive. But both at the same time? Was it a VROC RAID1 they were setup in?

Latest Solidigm fw version is QDV101D1 for these drives and they are still supported. Can you boot up the firmware update tool from them and still update firmware? I would try that and if this works then continue work with the Solidigm storage tool (SST), reset the drive, delete all namespaces and start over. At least you can give it a try.

EDIT: Check SSD Warranty and RMA Information | Solidigm Warranty
 
  • Like
Reactions: homeserver78

netswitch

Member
Sep 24, 2018
44
18
8
Thank you for your input, they wer is a simple linux mdadm RAID1, nothing fancy.
Both died at the same time yes.

I will look for the Solidigm tools and check if I can do anything.
 

netswitch

Member
Sep 24, 2018
44
18
8
Tested both solidigm and intel firmware update iso but they don' want to perform the update, I also tested with the SST tool and same issue.
The " Selected drive is in a disable logical state " status seems to prevent the software to work on the drive.

I read that these drives had a huge failure rate, I am convinced this is software related as two drives failing "at the same" time is very unlikely.
I ll keep digging but beside RMA which I won't qualify for, I guess there is no clear solution.
 
  • Like
Reactions: T_Minus

joerambo

New Member
Aug 30, 2023
10
0
1
I read that these drives had a huge failure rate, I am convinced this is software related as two drives failing "at the same" time is very unlikely.
Indeed. And "about 4 years" smells like yet another 32K hours bug in firmware?
 

netswitch

Member
Sep 24, 2018
44
18
8
From the smartctl values they had "zero" hours or runtime when I plugged them.
But this is very fishy as I contacted Intel and Solidigm and none of them is able to locate the serial numbers in their database.
(yet they can identify an HP Branded one).

So out of 4 drives 3 dead ones and I dont't dare to restart the server with the surviving one.
Performance wise it is giving expected results and the mdadmraid1 sync was done so they do have the 15.36TB capacity.

These were ebay purchase from China, trust me I ll never order any hardware from there anymore :).
 
  • Like
Reactions: Iaroslav

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
From the smartctl values they had "zero" hours or runtime when I plugged them.
But this is very fishy as I contacted Intel and Solidigm and none of them is able to locate the serial numbers in their database.
(yet they can identify an HP Branded one).

So out of 4 drives 3 dead ones and I dont't dare to restart the server with the surviving one.
Performance wise it is giving expected results and the mdadmraid1 sync was done so they do have the 15.36TB capacity.

These were ebay purchase from China, trust me I ll never order any hardware from there anymore :).
There's your problem.

I've seen a LOT of mention that no one should ever buy Intel NVME out of China because they're fake or the smart info is reset and they're drives near end of life but come across as new.
 
Last edited:

netswitch

Member
Sep 24, 2018
44
18
8
Really looks like some firmware bug, the drive is detected but "logical state" prevents access to data, so yes instant dead "by firmware".
These drive have a very high TBW endurance, even if they are considered read intensive, I don't think that they are worn out.
I also had Intel ssd being used and they switch to read only mode.

The strangest thing is the Serial Number that is unknown to intel and solidigm.

I am aware of nvme format command that can reset all counters (runtime / boots / wearout) on samsung ssd when the formware is in ERRMOD status..
(https://forums.servethehome.com/index.php?threads/samsung-sm963-960gb-m-2-to-1gb-issue.21405/ )
 

netswitch

Member
Sep 24, 2018
44
18
8
One died, mdadm was still working on the survivor, I did a cold shutdown / reboot and both drives where dead.
(and on the scond pair of drives I have the same case, one of the raid1 members is dead, but I still have the other running. I am keeping it online as long as possible to run tests)
 

gb00s

Well-Known Member
Jul 25, 2018
1,190
602
113
Poland
First of all and @Patrick, STH side is loading suuuuuper slow. Coffee break takes less time.

However, you can check the legitimacy of your drives via Solidigm link. I was a little bit cautious because all the 25 drives I bought recently from eBay - Chinese seller in this above mentioned clam shell plastic containers. All had 'Warranty Expiration' reported at May 2028 which confused me, because why there shall be Gen3 drives be produced in May 2023. So I re-checked together with Solidigm and these were reported as produced earlier, but where never sold. So these were registered by the Chinese seller to Solidigm as sold for the first time. This would not be possible if the drive would have been sold and used before.

Further to the above, Solidigm in writing confirmed that besides all the internet statements about Chinese sellers resetting smart data, this has been denied by Solidigm. According to them, SMART data, on any SSD/NVME can't be reset. I take it as written confirmation as it is. I also take from this, if a warrenty shows the drive was sold way earlier already and SMART data show zero usage, I would just return the drive to avoid drama at a later stage.
 

Attachments

netswitch

Member
Sep 24, 2018
44
18
8
Thank you for your input.

I have checked my ssd's againt Solidigm online tool and as you can see below, no data is available for these serials.
I have Solidigm support by chat, they also checked into their database and they can't find any reference to my serial numbers.
But yes, we did not check the drives toroughly when we recieved them in february, we should have checked the warranty against Solidigm tool.
solidigmsn1.JPG
Can you eventually PM me the name of your chinese seller ?

As for the smart data reset, I will not swear it 100% but I quite remember that on Samsung SSd's when they have the " 960GB M.2 to 1GB issue" and you recover them using "nvme format", you reset all smart data values. Now this is for Samsung; maybe Intel put some protection against this in their drives.
 

netswitch

Member
Sep 24, 2018
44
18
8
Nothing at my side so far, but any information about your drive that that you can give is interesting (model / source / age / serial number known from Solidigm / runtime hours, .....)
 

RobstarUSA

Active Member
Sep 15, 2016
233
104
43
I also have the same problem with a P4600.....Basically I bought on ebay used, Qty 2 same seller. One has a namespace & works fine. I updated the firmare. The other is in disabled logical state. I shows up as 69.7GB. I've got a message out to my ebay seller who listed the drive as "used" but "no returns". I'm ok with used/no returns, but DOA is unacceptable.
 

netswitch

Member
Sep 24, 2018
44
18
8
Unfortunately doesn't work 8 month after the transaction ;)
(I tested them when I recieved them, they were perfectly fine - zero hours, zero bytes written)
 

gb00s

Well-Known Member
Jul 25, 2018
1,190
602
113
Poland
Unfortunately doesn't work 8 month after the transaction ;)
(I tested them when I recieved them, they were perfectly fine - zero hours, zero bytes written)
Are you living in the EU and bought the nvme's from an EU seller or China?