We're facing the same issue, but with Dell R6515s. This doesn't appear to be an isolated incident with the PM9A3s and recent Dell servers, as seen here as well:
R6515 and samsung PM9a3
So far, we have observed similar failures; NVMEs on the GDC5402Q & GDC5502Q firmwares appear to go offline in the iDRAC, and from the OS-perspective they drop down to 1GB available. Listing the firmware shows "ERRORMOD". GDC5302Q and older appears unaffected.
The firmware files for the PM9A3 are not readily available on the Samsung website; we were forced to reach out to our distributor - Samsung refuses to provide support directly it seems, and also requires an NDA to be signed to obtain the firmware files, which is beyond infuriating. Do you happen to have a contact from where you purchased your PM9A3s to ask for assitance in getting the firmware files from Samsung?
Regardless, here are the results so far of what we have seen:
Upgrading to version GDC5602Q (latest) appears to fix the issue, however in the Dell forum post above another poster mentions the issue still persists. We have yet to see the issue re-appear since performing the firmware upgrade.
We weren't able to use Samsung Magician to perform the upgrade, we had to utilize the "Samsung SSD Toolkit for Data center" from their download page:
Samsung Magician & SSD Tools & Software Update | Samsung Semiconductor Global
Once downloaded, we ran the following to load the firmware onto the disks (again, the .bin file needed to be acquired from Samsung after signing an NDA):
First, get the disk ID (needed in the next command):
Code:
~/DCToolkit -L
================================================================================================
Samsung DC Toolkit Version 2.1.L.Q.0
Copyright (C) 2017 SAMSUNG Electronics Co. Ltd. All rights reserved.
================================================================================================
--------------------------------------------------------------------------------------------------------------------------------------------------
| Disk | Path | Model | Serial | Firmware | Optionrom | Capacity | Drive | Total Bytes | NVMe Driver |
| Number | | | Number | | Version | | Health | Written | |
--------------------------------------------------------------------------------------------------------------------------------------------------
| 3:c | /dev/nvme3 | SAMSUNG MZQL23T8HCLS-00A07 | XXX | GDC5502Q | LNUSRG39 | 32 GB | N/A | N/A | Unknown |
--------------------------------------------------------------------------------------------------------------------------------------------------
| 4:c | /dev/nvme4 | SAMSUNG MZQL23T8HCLS-00A07 | XXX | GDC5602Q | LNUSRG39 | 6 GB | GOOD | 0.00 TB | Unknown |
--------------------------------------------------------------------------------------------------------------------------------------------------
| 5:c | /dev/nvme5 | SAMSUNG MZQL23T8HCLS-00A07 | XXX | GDC5502Q | LNUSRG39 | 32 GB | N/A | N/A | Unknown |
--------------------------------------------------------------------------------------------------------------------------------------------------
Load the firmware on the disk:
Code:
~/DCToolkit --disk 3:c --nvme-firmware-download --path General_PM9A3_U.2_GDC5602Q_Noformat.bin --action 1 --slot 2 --force
"Commit" the firmware to disk:
Code:
~/DCToolkit --disk 3:c --nvme-firmware-commit --action 2 --slot 2
A reboot may be required, we don't see the firmware version change from the OS otherwise.
It also appears to be possible to rescue disks that are running GDC5502Q and entered "ERRORMOD" state by running the following format command (note data will be lost), although this has been hit or miss for us, as some drives appear to get stuck on "ERRORMOD" indefintely:
Code:
~/DCToolkit --disk 3:c --nvme-format-namespace --user-data-erase
P.S. One other helpful command I found out after wasting way too much time on this - run the following to view all the available firmware slots on the drive:
Code:
~/DCToolkit --disk 3:c --nvme-get-log-pages --firmware
Hopefully GDC5602Q really did fix the issue; so far in our lab it's been stable, I'll post back if we find any further issues. Best of luck acquiring the firmware file, it's unfortunate Samsung makes the process so difficult.