Samsung PM1643 2019.09 date code - 32k hour bug

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

jordanl17

New Member
Oct 29, 2022
4
0
1
I have a new fear unlocked after discovering the "32k hour bug". Yep, I just bought a bunch of cheap 3.84tb PM1633 ssd's from ebay (returning them now). the researching into those drives taught me about the 32k hour bug. I DO have PM1643 samsungs in production. Samsung brand, not dell brand, in my Poweredge 630 in a raid on a Perc. in production. I found an old photo, they are dated 2019.09. should be safe, right???
 

mr44er

Active Member
Feb 22, 2020
135
45
28
they are dated 2019.09. should be safe, right???
The better question: is firmware available to patch that bug and which version show your samsungs?

Besides that, flashing firmware on drives in production server is likely pain, but doable without downtime. Smoother with hot-swap, one after another and you should do the flash progress on a spare machine, then test, then plug it back and let the rebuild start.
1.) to not disturb the server more than necessary, a pulled disk is already a risk
2.) sometimes the flash tool from the vendor comes delivered on a boot .iso/.img, so booting is forced anyway

Risky mode: grab only the firmware-file (sometimes possible, when the $firmware.zip has a .bin, .lod) and do it live on the server and wait a good amount of time after the first disk to resettle. Possible, that a rebuild starts instant. I have done that live on disks in a ZFSpool on FreeBSD with camcontrol. As I expected a massive fail and was prepared to kill the pool that way, everything went smooth. On Linux sg_write_buffer is your friend. If you use the PERC in IR-mode, it's eventually not possible at all.

Safe mode: plug all your disks (one after another!), replace and rebuild it with spares temporarily (cheaper SAS-spinners should do). When the flashing and testing was good, put them back.
 

jordanl17

New Member
Oct 29, 2022
4
0
1
The better question: is firmware available to patch that bug and which version show your samsungs?

Besides that, flashing firmware on drives in production server is likely pain, but doable without downtime. Smoother with hot-swap, one after another and you should do the flash progress on a spare machine, then test, then plug it back and let the rebuild start.
1.) to not disturb the server more than necessary, a pulled disk is already a risk
2.) sometimes the flash tool from the vendor comes delivered on a boot .iso/.img, so booting is forced anyway

Risky mode: grab only the firmware-file (sometimes possible, when the $firmware.zip has a .bin, .lod) and do it live on the server and wait a good amount of time after the first disk to resettle. Possible, that a rebuild starts instant. I have done that live on disks in a ZFSpool on FreeBSD with camcontrol. As I expected a massive fail and was prepared to kill the pool that way, everything went smooth. On Linux sg_write_buffer is your friend. If you use the PERC in IR-mode, it's eventually not possible at all.

Safe mode: plug all your disks (one after another!), replace and rebuild it with spares temporarily (cheaper SAS-spinners should do). When the flashing and testing was good, put them back.
from what I'm reading here, no one has obtained a "native" Samsung firmware. you apparently can only get firmware for Samsung enterprise drives for EMC, HPE, etc... it's beyond me why samsung doesn't have a "Firmware Download" page for these enterprise drives. insane. "oh you bought the drive directly from us? you're screwed then."
 

mr44er

Active Member
Feb 22, 2020
135
45
28
Yes, it's a pain. They count firmware for enterprise stuff to 'service' they want you to pay. Maybe legit for the time before EOL, but after that...come on, just dump all the files in a folder on the ftp-server. This saves headaches and the environment. :)

Meanwhile some info for PM1633, that should be DELL:

MZILS1T9HCHP | MZILS3T8HCJM | MZILS480HCGR | MZILS960HCHP sentenced to death on CQPG, CQPH -> fixed on CQPJ
MZILS15THMLS0D4 | MZILS7T6HMLS0D4 sentenced to death CQN1 -> fixed on CQN3
MZILS1T9HEJH0D4 | MZILS3T8HMLH0D4 | MZILS480HEGR0D4 | MZILS480HEGR0D4 sentenced to death on CQL1, CQL3 -> fixed on CQL5

Edit:
Attached files, ripped out from scos-official-R07.01.04.004.03.zip
I think they should work as .bin or .lod when extracted
 

Attachments

Last edited:

yukas

Member
Jun 3, 2022
36
2
8
Hi,

I have aslo 3 EMC Unity SSD'd with same problem - please see attach and below.
We use LSI2008 and DELL perc310 SAS HBA with IT mode and Samsung SSD Manager and other SG_utuls. 134734150.png
We've already used all the options, and can send anywhere for disk data recovery.

all drives with this errors:

inquiry cdb: 12 00 00 00 24 00
spt_indirect, adapter: \\.\PhysicalDrive2 Length=56 ScsiStatus=0 PathId=0 TargetId=0 Lun=0
CdbLength=6 SenseInfoLength=64 DataIn=1 DataTransferLength=36
TimeOutValue=60 DataBufferOffset=124 SenseInfoOffset=60
SAMSUNG P043S7T6 EMC7680 ESV8 peripheral_type: disk [0x0]
PROTECT=0
inquiry cdb: 12 01 00 00 24 00
spt_indirect, adapter: \\.\PhysicalDrive2 Length=56 ScsiStatus=0 PathId=0 TargetId=0 Lun=0
CdbLength=6 SenseInfoLength=64 DataIn=1 DataTransferLength=36
TimeOutValue=60 DataBufferOffset=124 SenseInfoOffset=60
inquiry: pass-through requested 36 bytes (data-in) but got 19 bytes
inquiry cdb: 12 01 80 01 00 00
spt_indirect, adapter: \\.\PhysicalDrive2 Length=56 ScsiStatus=0 PathId=0 TargetId=0 Lun=0
CdbLength=6 SenseInfoLength=64 DataIn=1 DataTransferLength=256
TimeOutValue=60 DataBufferOffset=124 SenseInfoOffset=60
inquiry: pass-through requested 256 bytes (data-in) but got 16 bytes
Unit serial number: 5BNY0KA04703
inquiry cdb: 12 01 83 01 00 00
spt_indirect, adapter: \\.\PhysicalDrive2 Length=56 ScsiStatus=0 PathId=0 TargetId=0 Lun=0
CdbLength=6 SenseInfoLength=64 DataIn=1 DataTransferLength=256
TimeOutValue=60 DataBufferOffset=124 SenseInfoOffset=60
inquiry: pass-through requested 256 bytes (data-in) but got 76 bytes
LU name: 5002538b08a449d0
mode sense(10) cdb: 5a 00 01 00 00 00 00 00 fc 00
spt_indirect, adapter: \\.\PhysicalDrive2 Length=56 ScsiStatus=0 PathId=0 TargetId=0 Lun=0
CdbLength=10 SenseInfoLength=64 DataIn=1 DataTransferLength=252
TimeOutValue=60 DataBufferOffset=124 SenseInfoOffset=60
mode sense(10):
Fixed format, current; Sense key: Hardware Error
Additional sense: Logical unit failure
Raw sense data (in hex), sb_len=64, embedded_len=18
70 00 04 00 00 00 00 0a 00 00 00 00 3e 01 00 00
00 00
MODE SENSE (10) command: Medium or hardware error, type: sense key (plus blank check for tape)

all locked -- and i cant flash new FW

also we try flash in EMC Unity factory utility - and use KB 541103 Encl 0_0 Drive 0: Lifecycle State Fail - FBE_BASE_DISCOVERED_DEATH_REASON_ACTIVATE_TIMER_EXPIRED
but drives don't flash

also we email in Samsung with this problem - but case reject because drives is OEM ... (
 

mr44er

Active Member
Feb 22, 2020
135
45
28
AFAIK the rescuing flash process is only possible before the timer runs out, same sh*t with other models :confused:
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,143
597
113
New York City
www.glaver.org
from what I'm reading here, no one has obtained a "native" Samsung firmware. you apparently can only get firmware for Samsung enterprise drives for EMC, HPE, etc... it's beyond me why samsung doesn't have a "Firmware Download" page for these enterprise drives. insane. "oh you bought the drive directly from us? you're screwed then."
I believe their position is "if you buy it through official channels, you can get support via those same official channels". I don't know of anyone officially buying non-OEM'd parts for resale, and if they are they probably don't want to provide support to their customers.

I don't know if the EU's warranty / support directives apply to used drives, but that is possibly a path to take.
 
  • Like
Reactions: yukas

jordanl17

New Member
Oct 29, 2022
4
0
1
thanks for all comments... so, it's correct to say only the Samsung PM1633 drives have the bug???
 

dgwillim

New Member
Jun 20, 2023
9
12
3
Yes, it's a pain. They count firmware for enterprise stuff to 'service' they want you to pay. Maybe legit for the time before EOL, but after that...come on, just dump all the files in a folder on the ftp-server. This saves headaches and the environment. :)

Meanwhile some info for PM1633, that should be DELL:

MZILS1T9HCHP | MZILS3T8HCJM | MZILS480HCGR | MZILS960HCHP sentenced to death on CQPG, CQPH -> fixed on CQPJ
MZILS15THMLS0D4 | MZILS7T6HMLS0D4 sentenced to death CQN1 -> fixed on CQN3
MZILS1T9HEJH0D4 | MZILS3T8HMLH0D4 | MZILS480HEGR0D4 | MZILS480HEGR0D4 sentenced to death on CQL1, CQL3 -> fixed on CQL5

Edit:
Attached files, ripped out from scos-official-R07.01.04.004.03.zip
I think they should work as .bin or .lod when extracted
You sir, are a saint. I have quite a few of these Dell drives (MZILS3T8HCJM) in a third-party array (Synology) after decommissioning our Compellent units. This firmware fixed a performance stutter & bad GC bug, along with cutting write latency in more than half.

We were fighting Dell for a firmware fix while the array was under support, before they EOS/EOL'd it. 6 months after we took it offline, the fix was posted. Dell refused any/all requests for the firmware update (as expected).

Command Used: sg_write_buffer -m dmc_offs_save -I FMDELMZILS3T8HCJMCQPJ_NS /dev/sg14
 
Last edited:

varad

New Member
Jul 29, 2023
5
0
1
Hi, I am sorry to revive this, but I just got a couple of MZILS3T8HCJM disks and wanted to update the drivers using a dell R340 with H730P controller. I am having trouble getting the drives to show up on an OS in the first place. In the controller the drives show up as 520B logical units. Does anyone know how to at least get the drives to show up on the OS side to perform the desired FW update?

Thank you in advance
 

mrpasc

Well-Known Member
Jan 8, 2022
581
325
63
Munich, Germany
It depends on what OS you use.
Windows and ESXi do not work with 520B.
Linux will do. Might help to set the 730P into „JBOD“ mode first before updating firmware.
 

varad

New Member
Jul 29, 2023
5
0
1
It depends on what OS you use.
Windows and ESXi do not work with 520B.
Linux will do. Might help to set the 730P into „JBOD“ mode first before updating firmware.

I tried booting into a live ubuntu disk and it won't pick up the disk even though the controller is in HBA mode. Do I need to install the OS directly on a separate drive?
 

mrpasc

Well-Known Member
Jan 8, 2022
581
325
63
Munich, Germany
You should not need to install the OS directly, live boot should be good.
How do you determine if the drives became picked up? What’s the output of sg_scan?
 

varad

New Member
Jul 29, 2023
5
0
1
You should not need to install the OS directly, live boot should be good.
How do you determine if the drives became picked up? What’s the output of sg_scan?

I apologize if it seems like I need hand holding through all his process. Out of the 5 disks would the 3 520B disks show up as [em]?
Is there a better way to get more info from sg_scan?

Edit : Used lsscsi and got these 5 devices. The 4TB drives aren't showing, at least not on lsscsi.
 

Attachments

Last edited:

dgwillim

New Member
Jun 20, 2023
9
12
3
I apologize if it seems like I need hand holding through all his process. Out of the 5 disks would the 3 520B disks show up as [em]?
Is there a better way to get more info from sg_scan?

Edit : Used lsscsi and got these 5 devices. The 4TB drives aren't showing, at least not on lsscsi.
/dev/sgX should be the devices.
You can pull device info with: smartctl -i /dev/sgX (0-5).

Example:

#Check the Drive to confirm serial number and current firmware rev.
root@host:/volume1/infrastructure/storage/samsung# smartctl -i /dev/sgX
smartctl 6.5 (build date Mar 2 2021) [x86_64-linux-3.10.105] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SAMSUNG
Product: MZILS3T8HCJM
Revision: CQPH
Compliance: SPC-4
User Capacity: 3,840,755,982,336 bytes [3.84 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x12345678901234
Serial number: 12345678901234
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Jun 20 10:56:02 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled

#Update the firmware.
root@host:/volume1/infrastructure/storage/samsung# sg_write_buffer -m dmc_offs_save -I FMDELMZILS3T8HCJMCQPJ_NS /dev/sgX

#Check to make sure the update was successful.
root@host:/volume1/infrastructure/storage/samsung# smartctl -i /dev/sgX
smartctl 6.5 (build date Mar 2 2021) [x86_64-linux-3.10.105] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SAMSUNG
Product: MZILS3T8HCJM
Revision: CQPJ
Compliance: SPC-4
User Capacity: 3,840,755,982,336 bytes [3.84 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x12345678901234
Serial number: 12345678901234
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Jun 20 10:56:16 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled

root@host:/volume1/infrastructure/storage/samsung#

For the formatting, you can follow the sticky. The block change WILL COMPLETELY ERASE THE DRIVE.

#Check the drive serial number, confirm the firmware.

root@host:/volume1/infrastructure/storage/samsung# smartctl -i /dev/sgX
smartctl 6.5 (build date Mar 2 2021) [x86_64-linux-3.10.105] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SAMSUNG
Product: MZILS3T8HCJM
Revision: CQPH
Compliance: SPC-4
User Capacity: 3,840,755,982,336 bytes [3.84 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x12345678901234
Serial number: 12345678901234
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Jun 20 10:56:02 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled

The command that worked for me for 520 to 512 was: sg_format -v --format --fmtpinfo=0 --pfu=0 --size=512 --six /dev/sgX
This may take up to 10min+ to complete. Reboot the host after the task completes, for it to take affect.
 
Last edited:
  • Like
Reactions: varad

varad

New Member
Jul 29, 2023
5
0
1
/dev/sgX should be the devices.
You can pull device info with: smartctl -i /dev/sgX (0-5).

Example:

#Check the Drive to confirm serial number and current firmware rev.
root@host:/volume1/infrastructure/storage/samsung# smartctl -i /dev/sgX
smartctl 6.5 (build date Mar 2 2021) [x86_64-linux-3.10.105] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SAMSUNG
Product: MZILS3T8HCJM
Revision: CQPH
Compliance: SPC-4
User Capacity: 3,840,755,982,336 bytes [3.84 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x12345678901234
Serial number: 12345678901234
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Jun 20 10:56:02 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled

#Update the firmware.
root@host:/volume1/infrastructure/storage/samsung# sg_write_buffer -m dmc_offs_save -I FMDELMZILS3T8HCJMCQPJ_NS /dev/sgX

#Check to make sure the update was successful.
root@host:/volume1/infrastructure/storage/samsung# smartctl -i /dev/sgX
smartctl 6.5 (build date Mar 2 2021) [x86_64-linux-3.10.105] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SAMSUNG
Product: MZILS3T8HCJM
Revision: CQPJ
Compliance: SPC-4
User Capacity: 3,840,755,982,336 bytes [3.84 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x12345678901234
Serial number: 12345678901234
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Jun 20 10:56:16 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled

root@host:/volume1/infrastructure/storage/samsung#

For the formatting, you can follow the sticky. The block change WILL COMPLETELY ERASE THE DRIVE.

#Check the drive serial number, confirm the firmware.

root@host:/volume1/infrastructure/storage/samsung# smartctl -i /dev/sgX
smartctl 6.5 (build date Mar 2 2021) [x86_64-linux-3.10.105] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SAMSUNG
Product: MZILS3T8HCJM
Revision: CQPH
Compliance: SPC-4
User Capacity: 3,840,755,982,336 bytes [3.84 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x12345678901234
Serial number: 12345678901234
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Jun 20 10:56:02 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled

The command that worked for me for 520 to 512 was: sg_format -v --format --fmtpinfo=0 --pfu=0 --size=512 --six /dev/sgX
This may take up to 10min+ to complete. Reboot the host after the task completes, for it to take affect.
Thank you ! I recently came across a reddit post that eluded to why my drives weren't passing the controller itself. Even in 'HBA' mode, the card prevents, 520B drives to pass through.

In a good news, I bought an external SAS enclosure to check if that would allow me to see the drives on the OS side. If that works, I shall reformat the drives using the steps listed. Again, much appreciate the hand holding.

In more good news, I had my older H330 card still and it turns out according to this post it is possible to flash it to 'IT Mode (HBA330)'. As a last resort I shall approach the issue from that perspective.

I shall report my findings.

 

varad

New Member
Jul 29, 2023
5
0
1
In a good news, I bought an external SAS enclosure to check if that would allow me to see the drives on the OS side. If that works, I shall reformat the drives using the steps listed. Again, much appreciate the hand holding.
No go on the SAS enclosure. Although I tried it with a 6gb/s limited SAS enclosure. Also firmware or readability of disks might be an issue.
On the other hand, flashing the H330 did the trick. I was able to see the disks on the OS side and was able to format them to the 512B size as the screenshot shows.
 

Attachments

dgwillim

New Member
Jun 20, 2023
9
12
3
Looks like those disks are EMC-branded Samsungs, mine are Dell (Compellent). The firmware above won't work.

Glad you were able to get them re-formatted to 512.
 

yukas

Member
Jun 3, 2022
36
2
8
has anyone solved the problem with old disks with the problem?
FBE_BASE_DISCOVERED_DEATH_REASON_ACTIVATE_TIMER_EXPIRED
 

mrpasc

Well-Known Member
Jan 8, 2022
581
325
63
Munich, Germany
Nope, this error message states the SSD died as the bug (timer expired after 32k hours runtime) happened. As far as I know nobody was ever able to re-activate any disk which died that way.
Sorry, but it’s gone forever.