How does one go about testing that a storage devices PLP is actually working?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BLinux

cat lover server enthusiast
Jul 7, 2016
2,669
1,081
113
artofserver.com
Question in title. Products marketed with PLP are fine, and one can open up the drive and examine the components to see if the expected hardware is there, but how do you actually go about testing that PLP is working?

Do you just write a continuous stream of data and yank the power cord and see what happens? How do you record the last bit of data that hit the drive's cache?

Sorry if this is a dumb question, but please educate me...

particular use cases I'm thinking about:
1) simple one: product X claims to have PLP / confirm it works.
2) refurbished product has PLP, but backup power source may have degraded capacity, so how to test it still works?
 

Stephan

Well-Known Member
Apr 21, 2017
920
698
93
Germany
For Intel DC type products check https://www.intel.com/content/dam/w.../ssd-power-loss-imminent-technology-brief.pdf page 4, there are smart attributes for it.

No idea about NVME, I just looked at some SSDs I have but couldn't make anything out. Maybe errors will show up in error-log.

Short of desoldering the capacitors to check for SMART errors, which I wholeheartedly do not recommend unless you have Louis Rossmann level skills, you probably have to trust the caps work and trust the firmware checking it regularly. The caps employed are usually good quality Nichicon or Panasonic rated for 105degC and 2000-5000 hours, which at server operating temperature of 40 degC means 10x more or 50.000 hours, before they go out of spec. That would be ~6 years. Presumably the PLP functionality even works at 50% capacity still, so we are talking more like 10-20 years of capacitor lifetime.
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,669
1,081
113
artofserver.com
That's (almost) how a (mobile gaming) company from serbia tested plp with different ssds in 2015: Power Failure Testing with SSDs
yeah, i found that too in my own search for an answer. it's an interesting approach... they log the I/O transaction over the network , pull the plug, the "other" node that stays up has a record of what I/O happened, then they boot up the downed system and verify against the log.
 

UhClem

just another Bozo on the bus
Jun 26, 2012
434
247
43
NH, USA
Question in title. Products marketed with PLP are fine, and one can open up the drive and examine the components to see if the expected hardware is there, but how do you actually go about testing that PLP is working?
"Trust--but verify." ... a wise maxim :)
Do you just write a continuous stream of data and yank the power cord and see what happens?
That should work ... [since you say "power cord", I assume (in script below) you mean system power (vs SSD power)]
How do you record the last bit of data that hit the drive's cache?
[assuming Linux] (shame on @Byou if that's a bad assumption)
Note, also, that we need to write directly to the tested SSD device itself (or a partition on it); else our "log" could lose sync with the SSD cache. dev-name (below) should be, e.g., /dev/nvme0n1 or /dev/nvme0n1p3.
Code:
for i in {0..1000000}
do
echo -n $i "" ; dd if=/boot/vmlinuz of=dev-name oflag=direct bs=1M seek=$i count=1 2> /dev/null ; echo == $i
done
This will output:
Code:
0 == 0
1 == 1
2 == 2
...
In an attempt to (completely/significantly) fill the device cache, don't pull the cord until N (== N) reaches ~ C / (B - W)
where
C cache size (in MiB)
B bus speed (in MiB/sec)
W estimated write speed (in MiB/sec)
[Edit:] Hence, e.g., >>IF<< C is 100000 (~100GB) and if B is 3.2 (PCIe gen3 x4) and W is 1.2, then N = C / 2
Will probably take a few minutes several seconds.
[Make note of that last N; at cord-pull -- so best to use actual console (or ssh in from a different system) which won't go blank at cord-pull.]

Then, after a fresh boot, do:
Code:
od /boot/vmlinuz | head -1 | cut -d " " -f 7-9 | tr " " x > /tmp/cut-me
od -Ad dev-name | tr " " x | grep `cat /tmp/cut-me`
That last command will run (and spew) for a long time. Go have a meal, watch a show ... [It could be made a lot faster, by having originally written to a file (instead of device), but it would risk accuracy of N, if at the first ^C, the actual file metadata written lagged the (O_DIRECT) output.]

When it stops spewing, you can ctrl-C it.
For the last line output: compare the leading (decimal) number [preceding the first "x"] divided by 1048576 (1 Mi) with that N (just before cord-pull). How's your PLP? -- Did you get it all???

[Disclaimer: yes, I know I could have added " | tail -1 | ... cut ... | bc ..." instead of the ^C, but ... :)]
 
Last edited:

UhClem

just another Bozo on the bus
Jun 26, 2012
434
247
43
NH, USA
Consider the above hack (an attempt at) a proof of concept. If it appears viable, there are tweaks to make it faster and more robust.

Note: edited above. C (cache size) of 100GB ?? Wrong! (for at least 5-10 years) Actually, I was thinking DRAM + SLC(?) -- but PLP only needs to protect DRAM, right?

@BLinux : rsvp w/ bugs, results, etc.
 

i386

Well-Known Member
Mar 18, 2016
4,220
1,540
113
34
Germany
but PLP only needs to protect DRAM, right?
Powerloss protection protects the controller, dram and nand. The ram contains data from the os for the io and data about the mapping of pages for the controller to manage the nand.