Fusion-io ioDrive 2 1.2TB Reference Page

JohnMLTX · Apr 29, 2022

Posting in this thread since it seems like acquacow/Dave is probably the best person to ask this question.

I got a SX350-3200 installed in my Windows 10 Pro workstation. I'm running the drivers from the Dell/Sandisk page on version 4.3.7 and firmware 8.9.9.

I initially had errors for persistent LEB map, which went away after using fio-sure-erase, but now I can't format the drive, and I get the invalid input error. I also can't attach the drive, and get an error that there's data on the drive from an older driver version.

Not really sure what to do next, I can post logs/cmd outputs on request.

Thank you so much!

gb00s · May 15, 2022

Weird things happening with all my HP versions of the 785GB ioDrive2s. One pic from one of the drives below.

Once the drives (4 of them now) hit life endurance of <44% they all start to show read errors in the syslog. Shortly after that the drives stop working with 'fio-status -a' showing hardware error messages like

Found 4 ioMemory devices in this system
Driver version: 3.2.16 build 1731

Adapter: Single Controller Adapter
HP 785GB MLC PCIe ioDrive2 for ProLiant Servers, Product Number:673644-B21, SN:3UN242D032
ioDrive2 Adapter Controller, PN:674326-001
External Power: NOT connected
PCIe Power limit threshold: 24.75W
PCIe slot available power: 25.00W
PCIe negotiated link: 4 lanes at 5.0 Gt/sec each, 2000.00 MBytes/sec total
Connected ioMemory modules:
fct0: Product Number:673644-B21, SN:1240D3513

fct0 Status unknown: Driver is in MINIMAL MODE:
Device has a hardware failure
ioDrive2 Adapter Controller, Product Number:673644-B21, SN:1240D3513
!! ---> There are active errors or warnings on this device! Read below for details.
ioDrive2 Adapter Controller, PN:674326-001
SMP(AVR) Versions: App Version: 1.0.35.0, Boot Version: 0.0.9.1
Located in slot 0 Center of ioDrive2 Adapter Controller SN:1240D3513
Powerloss protection: not available
PCI:06:00.0, Slot Number:2
Vendor:1aed, Device:2001, Sub vendor:1590, Sub device:6e
Firmware v7.1.17, rev 116786 Public
Geometry and capacity information not available.
Format: not low-level formatted
PCIe slot available power: 25.00W
PCIe negotiated link: 4 lanes at 5.0 Gt/sec each, 2000.00 MBytes/sec total
Internal temperature: 36.42 degC, max 36.42 degC
Internal voltage: avg 1.02V, max 1.02V
Aux voltage: avg 2.49V, max 2.49V
Rated PBW: 11.00 PB
Lifetime data volumes:
Physical bytes written: 0
Physical bytes read : 0
RAM usage:
Current: 0 bytes
Peak : 0 bytes

ACTIVE WARNINGS:
The ioMemory is currently running in a minimal state.

I'm unable even to detach the drives from the system. They are just dead. No firmware upgrades, low-level-format or secure erase were processed. Just noticed all VM's hosted on the drives were hanging and the pool that had a SLOG assigned in RAID1 on two if these drives did not come up anymore. The temps of the drives are far away from the normal temps so I guess the drives are totally dead. As mentioned above, read errors were noticed one day before.

Tried everything that I had in mind but without detaching the cards from the system, I can't do anything. Tried in 2 other motherboards, same issue. Noticed these HP drives had much better performance than the Cisco/DELL/IBM versions. But I also had problems with 2.4TB HP drives. I love these that would like to get them running again.

Any ideas?

acquacow · May 16, 2022

Once your endurance starts dropping like that, the card should be putting itself into a read-only mode, as it's retired a lot of cells and doesn't have any additional remaining to handle any more EB failures. Sounds like maybe it tried to probably protect itself but then continued to encounter errors and might be dead now.

I'd be more interested to figure out what your write workload is, or what your server is and why you are repeatedly burning up cards. I've never even seen an endurance less than 80-90% on a card ever. Multiple repeat failures of the same kind is also strange.

As for John, see if you can set the driver to load the card in minimal mode and sure-erase/format it that way.

-- Dave

gb00s · May 16, 2022

Hi Dave,

thanks for the quick reply. I now just checked drive in another server and at least got an output now with the old numbers. But the card does not even attach anymore if I'm not mistaken.

acquacow said:
I've never even seen an endurance less than 80-90% on a card ever. Multiple repeat failures of the same kind is also strange.

Here you go ...

As you can see here, the card does not attach anymore. Have to investigate this.

root@tester:~/Git/fio_utils# fio-detach /dev/fct0
Detaching: [====================] (100%)
/dev/fct0 - already detached.

I then tried to secure-erase the card and it just hangs with error message ...

root@tester:~/Git/fio_utils# fio-sure-erase /dev/fct0
WARNING: sanitizing will destroy any existing data on the device!
Do you wish to continue [y/n]? y
Erasing blocks: [ ] ( 0%) /
Message from syslogd@tester at May 16 22:27:33 ...
kernel:[ 431.470774] usercopy: Kernel memory overwrite attempt detected to SLUB object 'fusion_user_ll_request' (offset 0, size 3960)! Erasing blocks: [ ] ( 0%)

There it is. Nothing happens and it's just stuck. I guess if I interrupt the process I will never get the card unattached during system reboot via fio-config.

acquacow said:
I'd be more interested to figure out what your write workload is, or what your server is and why you are repeatedly burning up cards.

All cards where in a RAID10 config. I bought these with around 72% life-endurance very cheap before Chia. I'm using them to record tick data incl L2/3 data for several financial futures/options of different exchanges and move all the data between servers als PLP protective gear. With their huge IO they are perfect for it. They all failed not 1-by-1 but all together in 1-2 days starting with EB failures when they passed the 44% life endurance mark.

Mike

EDIT: After a reboot the card is 'gone' again. different life endurance, read/write figures etc...

acquacow · May 17, 2022

Oh... you were talking about the "Rated PBW" stat... that's just a marketing # and shouldn't mean anything.

What's odd is that Kernel message...

What OS/kernel/etc is this? Can you load up a current Centos 7 image on something and see if you have the same issues?

Thanks,

-- Dave

tx12 · Jun 1, 2022

gb00s said:
I don't know anyone who could revive a card from LEB-map-errors.

Usually its a software-only fault. Even then you sure-erase on gen3 to get rid of LEB map errors, you'll probably end up with present, but empty (thus, invalid) lebmap.

Factory tools like nucleon or fio-fbb should be able to recreate LEB-maps. The main idea is to exclude factory bad blocks and erase failures from linearized block map. Unfortunately, I've never heard of any downloadable version of these tools. Would be so nice to get one

Demoh · Jun 7, 2022

Here's another one for Dave. Im dumping some info for the good of everyone but I have put any questions at the very end.

I know hedehodo had an HP 1.0TB drive you were wanting a picture of. Well I have come across an interesting situation. I have an HP 1.3TB that is showing up as the 1.0. Model number and all. Im thinking the factory put the wrong stickers on it OR they mis-programmed the card. I have 2 cards, both purchased from the same vendor at the same time.

The cards are identical in design and markings with the only differences are the serial stickers, some of the silkscreening where it is probably production / line codes, and the Micron FBGA code on the cards.

The correctly detected card is HP 763834-B21 / 764125-001, s/n 1405G0114 and uses FBGA code NW603 which is micron MT29F512G08CMCABH7-6C:A chips.
The problem card is detected in fio-status as an HP 775666-B21 / 775677-001, s/n 1406G0653 and uses FBGA code NW621 which is micron MT29F512G08CMCBBH7-6C:B
I believe the chips are the same chip, the only difference I can find is digikey lists the packaging as Bulk and Tray, respectively. Correct me if I am wrong, 24x chips gives (512Gb*24)/8=1536GB raw capacity.

Through great pain (so much help from this entire thread) I was able to get these cards upgraded to 8.9.9 rev 20200113 and running VSL 4.3.7 under ESXi 6.7. HP's site only has 4.2.5 for ESXi 6.0 published and a redacted 4.3.1 for windows. I thought I found a windows 4.3.4 on HPs site somewhere which allowed me to grab the firmware to jump from 4.2.5 to 4.3.4.

4.3.4 to 4.3.7 was a bit more complex. I had to go through Dave's instructions on post #145 (page8) of this thread and add PA006014001 (775666-B21) and PA005831103 (763834-B21) to the manifest.

Early on while running 4.2.5 I issued a fio-format -b 4K /dev/fct0 and it errored. I issued it again with -b 512B and it didnt error, but I pressed no and it formatted the drive anyways. I checked the eui ID and it was different so it actually did format the drive. snippet below. I didnt really care that it formatted the drive and I figured this bug was fixed in a later version, but I didnt test it. Id offer this word of caution: dont run risky commands in a production environment because you dont know what bugs may have slipped through the cracks. Lets just say theres some other place going wild over my (could have been) misfortune.

Code:

[root@localhost:~] fio-format -b 4096B /dev/fct0

/dev/fct0: Error: Sector size 4096 bytes is out of range. Sector size must be between 512 and 512 bytes.
[root@localhost:~] fio-format -b 512B /dev/fct0

/dev/fct0: Creating block device.
Block device of size 1250.00GBytes (1164.15GiBytes).
Using block (sector) size of 512 bytes.

WARNING: Formatting will destroy any existing data on the device!
Do you wish to continue [y/n]? n

WARNING: Do not interrupt formatting!

Formatting: [====================] (100%) /
/dev/fct0 - format successful.
[root@localhost:~]

All is happy, or so I thought.

So here are my questions:

I wanted to play around with 4K block sizes but I am unable to get the unit to format it as such. Even under 4.3.7 I get the same error as above about it needing to be between 512 and 512. Is this a card limitation? I wanted to perform a low-level mentioned around posts #33 - 36 but havent figured out how. I saw a lot of talk about fio-sure-erase but some research and reading the flags, both the Clear and Purge functions of this command both state "metadata required for operation is not destroyed, but user-specific metadata is destroyed." which leads me to believe this wont do this. (I pulled it from This Sandisk published DoD/NIST document. ) If this isnt a card limitation how could I get these formatted over to 4K?

Related to above, I was hoping to see if I could take the 1.0TB card and actually use it as a 1.3TB card. I originally thought I could use the secure-erase to wipe out the LEB map which may help? I dont think so because its probably programmed at a way different area than that table.

Or, could I use the advanced features of fio-format to overprovision using -s -r or -e with maybe -f?

Below is the output of fio-status -a for these cards, do these with similar bytes written appear to be new? Why are the rated PBW vastly different, is it because theoretically the 1.0 card has an extra 200GB of unused capacity it can spare to?:

Code:

[root@localhost:~] fio-status -a

Found 1 VSL driver package:
   4.3.7 build 1205 Driver: loaded

Found 2 ioMemory devices in this system

Adapter: ioMono  (driver 4.3.7)
        HP 1.2TB HH/HL Value Endurance PCIe Workload Accelerator, Product Number:763834-B21, SN:1405G0114
        ioScale3 Adapter Controller, PN:764125-001
        Product UUID:5d58a7e5-5327-5c22-8154-848039eddf41
        PCIe Bus voltage: avg 12.25V
        PCIe Bus current: avg 0.76A
        PCIe Bus power: avg 9.31W
        PCIe Power limit threshold: 24.75W
        PCIe slot available power: unavailable
        Connected ioMemory modules:
          fct0: 12:00.0,        Product Number:763834-B21, SN:1405G0114

fct0    Attached
        ioScale3 Adapter Controller, Product Number:763834-B21, SN:1405G0114
        ioScale3 Adapter Controller, PN:764125-001
        Microcode Versions: App:0.0.44.0
        Powerloss protection: protected
        PCI:12:00.0
        Vendor:1aed, Device:3002, Sub vendor:1590, Sub device:a3
        Firmware v8.9.9, rev 20200113 Public
        1250.00 GBytes device size
        Format: v501, 2441406250 sectors of 512 bytes
        Format: v501, 1953125000 sectors of 512 bytes
        PCIe slot available power: 25.00W
        PCIe negotiated link: 8 lanes at 5.0 Gt/sec each, 4000.00 MBytes/sec total
        Internal temperature: 51.19 degC, max 53.65 degC
        Internal voltage: avg 1.01V, max 1.01V
        Aux voltage: avg 1.79V, max 1.81V
        Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
        Active media: 100.00%
        Rated PBW: 4.00 PB, 99.99% remaining
        Lifetime data volumes:
           Physical bytes written: 547,309,228,160
           Physical bytes read   : 29,026,565,568
        RAM usage:
           Current: 545,395,584 bytes
           Peak   : 545,395,584 bytes
        Contained Virtual Partitions:
          fioiom0:      ID:0, UUID:e6c19b3a-adf3-4b7c-a956-69c74f428520

fioiom0 State: Online, Type: block device, Device: /dev/disks/eui.e6c19b3aadf34b7c002471c74f428520
        ID:0, UUID:e6c19b3a-adf3-4b7c-a956-69c74f428520
        1250.00 GBytes device size
        Format: 2441406250 sectors of 512 bytes
        Sectors In Use: 0
        Max Physical Sectors Allowed: 2441406250
        Min Physical Sectors Reserved: 2441406250

Adapter: ioMono  (driver 4.3.7)
        HP 1.0TB HH/HL Value Endurance PCIe Workload Accelerator, Product Number:775666-B21, SN:1406G0653
        ioScale3 Adapter Controller, PN:775677-001
        Product UUID:abd4a1ec-525c-5963-a063-8cede7465b81
        PCIe Bus voltage: avg 12.25V
        PCIe Bus current: avg 0.77A
        PCIe Bus power: avg 9.46W
        PCIe Power limit threshold: 24.75W
        PCIe slot available power: unavailable
        Connected ioMemory modules:
          fct1: 13:00.0,        Product Number:775666-B21, SN:1406G0653

fct1    Detached
        ioScale3 Adapter Controller, Product Number:775666-B21, SN:1406G0653
        ioScale3 Adapter Controller, PN:775677-001
        Microcode Versions: App:0.0.44.0
        Powerloss protection: protected
        PCI:13:00.0
        Vendor:1aed, Device:3002, Sub vendor:1590, Sub device:b2
        Vendor:1aed, Device:3002, Sub vendor:1590, Sub device:a3
        Firmware v8.9.9, rev 20200113 Public
        1000.00 GBytes device size
        Format: v501, 1953125000 sectors of 512 bytes
        PCIe slot available power: 25.00W
        PCIe negotiated link: 8 lanes at 5.0 Gt/sec each, 4000.00 MBytes/sec total
        Internal temperature: 52.17 degC, max 54.63 degC
        Internal voltage: avg 1.01V, max 1.01V
        Aux voltage: avg 1.79V, max 1.81V
        Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
        Active media: 100.00%
        Rated PBW: 12.00 PB, 100.00% remaining
        Lifetime data volumes:
           Physical bytes written: 547,689,149,464
           Physical bytes read   : 19,245,502,080
        RAM usage:
           Current: 85,640,000 bytes
           Peak   : 545,395,584 bytes

[root@localhost:~]

Glad to see these cards are still kind of popular, I am curious to see what everybody thinks about this.

acquacow · Jun 9, 2022

Heh, interesting background and story in the post, I appreciate all the detail.

It's buried in here somewhere, but a thing to note is that there's no difference between our scale/endurance cards and the regular cards other than over-provisioning and marketing. The NAND is going to be the same last I checked, and is as you confirmed.

If you look up the endurance on the NAND chip itself and do the math aggregated out across all the chips, you can get the real PBW and such. Anything in fio-status is just a marketing # for the most part.

Not sure why you can't low-level format it to 4k, I thought all of those series of cards, at least in the 3 and 6TB capacities only did 4k, I wasn't even aware that the < 3TB cards had 512b support. I have three 3.2s powered up with no data on them and I guess I can try formatting them at 512 to see what happens.

Keep us posted with any other results you come up with.

Thanks!

-- Dave

Demoh · Jun 10, 2022

Im currently waiting for more cards to come in.

The more I read the more I am confused. Is there a function of fio-sure-erase or fio-format that is a low level? Regardless of yes or no, does this wipe out the LEB map? From the sandisk docs it alludes to fio-sure-erase does not perform a low level, at least not when operated to 'Clean' or 'Purge'. Or am I missing how to actually perform a low level?

Up to this point in my experience with VSL3 and VSL4 cards, maybe 20-30 of them, Ive never formatted any of them as 4K. Part of the instructions that I built for myself before deployment was to firmware update, detach, format 512B, attach before building any datastores. I dont remember why after so many years or if any of these cards ever had 4K on them out of the box, but I know I never hit an error when issuing the format with 512B set. I know ive never issued a 4K format command until these ones, now I wish I had just to see. Im curious if you issue a format with something like 20B what range it gives you when it errors out.

Back to the mis-stickered 1.0 that I have, I assume theres nothing I can do with it to unlock it to 1250 like the others? I checked the manifest but because the cards are essentially identical they both get the same .bin .rom and .pdi files. I knew that attempt was in vain before checking.

acquacow · Jun 10, 2022

I took all of the sx350-3200 cards I have on hand tonight and formatted them to 512B and back to 4K just fine, so I guess that is supported just fine.

fio-format will definitely leave your lebmap, but will also technically low-level the drive from a data perspective.

fio-sure-erase has a few extra steps, but I think at a low level, the LEBmap needs to exist to keep track of any EBs that are marked as bad.

When you update the ioDrive, you might be able to throw an --update-mid to fio-update-iodrive and see if the midprom data controls the effective drive size in the FPGA. I'm not sure the exact flag, I'll have to dig.

Demoh · Jun 13, 2022

What firmware and build date on the cards and version of VSL are you running? What OS?

Also get this. The replacement card I received. Factory sealed from HP, I broke the seal on the box. 0 bytes written on fio-status -a. Its also a 1.0 instead of a 1.3 like the other one.

It gets better. I have all 3 cards side by side right now. I have HP serial numbers, they are almost sequential. I have 5 8 and 9 (where I am willing to bet my source has 6 and 7 still.) The fusion serial numbers which show up inside fio-status are a bit more different though. the 1.3 is 1405G01xx while the 2 1.0s are 1406G05xx with only 8 numbers in between.

The other differences from the 2 1.0 cards and 1.3 card: QA stamp locations on both sides (4 black 1 white), NAND codes as stated above, capacitor batch codes, silkscreening codes right above the PCIe on the back for what I assume is earlier stages of manufacturing.

Depending on how the FPGA is programmed Im going to wager that some cards got mixed up / put in the wrong box after programmed and before labeling by HP.

acquacow · Jun 14, 2022

Demoh said:
What firmware and build date on the cards and version of VSL are you running? What OS?

Also get this. The replacement card I received. Factory sealed from HP, I broke the seal on the box. 0 bytes written on fio-status -a. Its also a 1.0 instead of a 1.3 like the other one.

It gets better. I have all 3 cards side by side right now. I have HP serial numbers, they are almost sequential. I have 5 8 and 9 (where I am willing to bet my source has 6 and 7 still.) The fusion serial numbers which show up inside fio-status are a bit more different though. the 1.3 is 1405G01xx while the 2 1.0s are 1406G05xx with only 8 numbers in between.

The other differences from the 2 1.0 cards and 1.3 card: QA stamp locations on both sides (4 black 1 white), NAND codes as stated above, capacitor batch codes, silkscreening codes right above the PCIe on the back for what I assume is earlier stages of manufacturing.

Depending on how the FPGA is programmed Im going to wager that some cards got mixed up / put in the wrong box after programmed and before labeling by HP.

Firmware 8.9.9 20190313
VSL 4.3.6
Windows 10

gb00s · Jun 26, 2022

I'm confused, a little bit.

I was migrating saving VM backups from a host on my backup server and felt the speed of the copying process was slowing down significantly and the transfer rate dropped down to ~100MB and lower from 900MB. So I checked the drives themselves and my ioDrive2. Then I found some concerning numbers of the 'Reserve space status' as ~83% while it was at 100% when I started to copy the VMs to the backup host. The unusual thing is that the drive showed just 149TB written with 'Rated PBW: 4.00 PB, 96.25% remaining' which is kind of ok math-wise.

I uploaded a screen record showing how quick the 'Reserve space status' deteriorates:

>> Link: ioDrive1.gif

Is this card dying? I had a lot of bad luck with HP's ioDrive2's recently all dying like flies

While I'm writing and copying another ~500GB, this 'Reserve space status' is down < 80%.

acquacow · Jun 27, 2022

That is definitely very odd to see, esp in real-time. I don't know if I've ever seen that number change from 100%

Here's some of my 1.2TB ioDrive 2s in my inventory that are currently powered up:

Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
Active media: 100.00%
Rated PBW: 17.00 PB, 98.87% remaining
Lifetime data volumes:
Physical bytes written: 192,912,281,574,232
Physical bytes read : 117,866,200,401,144

Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
Active media: 100.00%
Rated PBW: 17.00 PB, 99.94% remaining
Lifetime data volumes:
Physical bytes written: 9,886,406,032,192
Physical bytes read : 18,263,958,804,376

Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
Active media: 100.00%
Rated PBW: 17.00 PB, 99.91% remaining
Lifetime data volumes:
Physical bytes written: 15,286,218,700,416
Physical bytes read : 7,418,911,223,320

Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
Active media: 100.00%
Rated PBW: 17.00 PB, 99.95% remaining
Lifetime data volumes:
Physical bytes written: 9,079,143,440,512
Physical bytes read : 11,123,870,783,128

Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
Active media: 100.00%
Rated PBW: 17.00 PB, 99.77% remaining
Lifetime data volumes:
Physical bytes written: 39,659,012,134,952
Physical bytes read : 41,770,490,398,656

Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00%
Active media: 100.00%
Rated PBW: 17.00 PB, 99.32% remaining
Lifetime data volumes:
Physical bytes written: 114,911,372,996,336
Physical bytes read : 121,804,500,802,176

ecosse · Jun 27, 2022

Apologies - quick check. fio-status provides the following against one of my iodrive2 cards:

PCIe Bus power: avg 11.36W

Does this mean that this is the average consumption of the card per hour or does this mean something else?

gb00s · Jun 27, 2022

acquacow said:
That is definitely very odd to see, esp in real-time. I don't know if I've ever seen that number change from 100%

I also have never seen anything below 100%. Until yesterday. Also noted the 'Active media' percentage went down from 100% to 97%.

I'll better change the drive. I'm not sure if this is really the case, but since I migrated almost any Proxmox 6.3/4 to a Proxmox 7.2 with Proxmox kernel 5.15.xx these drives keep causing issues every week now. I don't expect hardware to die due to a kernel issue, but something is odd.

accountname · Jul 8, 2022

Hi, super new user who was directed to post in this thread.

I have a Fusion-io ioScale 3.2TB, product number F11-002-3T20-CS-0001

I'm using the newest driver I could find (although, at the moment I can't tell you which one that is - but I know it came from Sandisk)

When the drive works, it is AWESOME.

My problem is when I restart my PC.

Every single time I perform a full reboot/boot, I must pray to the gods that the drive's amber light changes to green.

Otherwise the drive is unseen, and attempting to use any of the tools found in VSL Utils gives me:

Code:

fio-status requires the driver to be loaded on this platform. Exiting.

I have no idea if this is just hocus pocus, but it seems to have a better chance of working from an "extra cold" boot.

That is to say, turning everything off, flipping my PSU off, holding the power button down, and then plugging everything back in.

Doing a simple "Restart" almost guarantees that my drive won't be seen.

It is in a full sized PCIe slot. I took the additional step of upping the amount of power available to the slot (75w).

The drive doesn't have a power cable.
I have moved the drive into different slots on this mobo, it is being used alongside a GPU which works just fine in any of the slots available.

I have added a delay to my BIOS to allow time for the drivers to load.

I have turned off Windows/BIOS fast boot option.

I'm posting here today because I just experienced a power outage... Now after a dozen power cycles, I still don't have my drive back. ANY advice at all would be amazing. Thank you!

acquacow · Jul 20, 2022

Saw some 6.4s on ebay for $249 with free shipping... they are just amazing...

Couldn't get fio-status from the seller beforehand =P

So if you ever wandered what one of these looks like once the firmware has decided the drive has had enough... here ya go.

-- Dave

mmk · Aug 31, 2022

Thought I'd write here as this seems to be the general Fusion-IO thread

My 1.65T IOscale2 decided not to want to mount anymore after a normal system reboot today. The driver is complaining about a pad having failed. Does anyone have any ideas as to whether this can be fixed, at least so I can manage to grab the data off of the drive? It's not a disaster but my latest backup is old enough for this to be annoying.

The magical log entries seem to be:

Code:

Aug 31 14:21:53 medusa kernel: [   11.537552] <3>fioerr ioDrive 0000:0a:00.0.0: Some pads are write protected:
Aug 31 14:21:53 medusa kernel: [   11.537555] <6>fioinf ioDrive 0000:0a:00.0.0:   pad 11 is write protected

This is on a Proxmox (latest version as of the reboot) host with the github-sourced vsl3 driver. To add the driver has worked fine up until now and this smells like a hardware issue. Restarting and powering off the system does not seem to have any effect on anything.

EDIT: also tested in CentOS7 with a an intentionally old kernel and the driver downloaded from WD/Sandisk some years ago. Same result.

Full kernel messages:

Code:

Aug 31 14:21:53 medusa kernel: [   10.820565] <6>fioinf ioDrive 0000:0a:00.0.0: PMP Address: 1 1 1
Aug 31 14:21:53 medusa kernel: [   10.878886] <3>fioerr ioDrive 0000:0a:00.0.0: SMP Controller in BIST Mode
Aug 31 14:21:53 medusa kernel: [   10.878888] <6>fioinf ioDrive 0000:0a:00.0.0: SMP Controller Firmware APP  version 1.0.20 0
Aug 31 14:21:53 medusa kernel: [   10.878891] <6>fioinf ioDrive 0000:0a:00.0.0: SMP Controller Firmware BOOT version 1.0.5 1
Aug 31 14:21:53 medusa kernel: [   11.354860] <6>fioinf ioDrive 0000:0a:00.0.0: Required PCIE bandwidth 2.000 GBytes per sec
Aug 31 14:21:53 medusa kernel: [   11.354866] <6>fioinf ioDrive 0000:0a:00.0.0: Board serial number is removed
Aug 31 14:21:53 medusa kernel: [   11.354868] <6>fioinf ioDrive 0000:0a:00.0.0: Adapter serial number is removed
Aug 31 14:21:53 medusa kernel: [   11.354871] <6>fioinf ioDrive 0000:0a:00.0.0: Default capacity        1650.000 GBytes
Aug 31 14:21:53 medusa kernel: [   11.354873] <6>fioinf ioDrive 0000:0a:00.0.0: Default sector size     512 bytes
Aug 31 14:21:53 medusa kernel: [   11.354874] <6>fioinf ioDrive 0000:0a:00.0.0: Rated endurance         8.00 PBytes
Aug 31 14:21:53 medusa kernel: [   11.354876] <6>fioinf ioDrive 0000:0a:00.0.0: 85C temp range hardware found
Aug 31 14:21:53 medusa kernel: [   11.354877] <6>fioinf ioDrive 0000:0a:00.0.0: Maximum capacity        1650.000 GBytes
Aug 31 14:21:53 medusa kernel: [   11.354900] <6>fioinf ioDrive 0000:0a:00.0.0: Firmware version 7.1.17 116786 (0x700411 0x1c832)
Aug 31 14:21:53 medusa kernel: [   11.354903] <6>fioinf ioDrive 0000:0a:00.0.0: Platform version 19
Aug 31 14:21:53 medusa kernel: [   11.354905] <6>fioinf ioDrive 0000:0a:00.0.0: Firmware VCS version 116786 [0x1c832]
Aug 31 14:21:53 medusa kernel: [   11.354914] <6>fioinf ioDrive 0000:0a:00.0.0: Firmware VCS uid 0xaeb15671994a45642f91efbb214fa428e4245f8a
Aug 31 14:21:53 medusa kernel: [   11.357563] <6>fioinf ioDrive 0000:0a:00.0.0: Powercut flush: Enabled
Aug 31 14:21:53 medusa kernel: [   11.537552] <3>fioerr ioDrive 0000:0a:00.0.0: Some pads are write protected:
Aug 31 14:21:53 medusa kernel: [   11.537555] <6>fioinf ioDrive 0000:0a:00.0.0:   pad 11 is write protected
Aug 31 14:21:53 medusa kernel: [   11.835225] <3>fioerr ioDrive 0000:0a:00.0.0: MINIMAL MODE DRIVER: hardware failure.
Aug 31 14:21:53 medusa kernel: [   11.918634] <6>fioinf ioDrive 0000:0a:00.0: Found device fct0 (Fusion-io 1.65TB ioScale2 0000:0a:00.0) on pipeline 0
Aug 31 14:21:53 medusa kernel: [   11.918867] <6>fioinf Fusion-io 1.65TB ioScale2 0000:0a:00.0: probed fct0
Aug 31 14:21:53 medusa kernel: [   11.918871] <6>fioinf Fusion-io 1.65TB ioScale2 0000:0a:00.0: Attaching explicitly disabled
Aug 31 14:21:53 medusa kernel: [   11.918877] <3>fioerr Fusion-io 1.65TB ioScale2 0000:0a:00.0: auto attach failed with error EINVAL: Invalid argument

The usual fio status output (which seems to have stopped displaying certain things like the endurance stats):

Code:

# fio-status  -a|less

Found 1 ioMemory device in this system
Driver version: 3.2.16 build 1731

Adapter: ioMono
        Fusion-io 1.65TB ioScale2, Product Number:F11-003-1T65-CS-0001, SN:removed, FIO SN:removed
        ioDrive2 Adapter Controller, PN:PA005004003
        External Power: NOT connected
        PCIe Power limit threshold: 24.75W
        PCIe slot available power: unavailable
        PCIe negotiated link: 4 lanes at 5.0 Gt/sec each, 2000.00 MBytes/sec total
        Connected ioMemory modules:
          fct0: Product Number:F11-003-1T65-CS-0001, SN:removed

fct0    Status unknown: Driver is in MINIMAL MODE:
                Device has a hardware failure
        ioDrive2 Adapter Controller, Product Number:F11-003-1T65-CS-0001, SN:removed
!! ---> There are active errors or warnings on this device!  Read below for details.
        ioDrive2 Adapter Controller, PN:PA005004003
        SMP(AVR) Versions: App Version: 1.0.20.0, Boot Version: 1.0.5.1
        Powerloss protection: not available
        PCI:0a:00.0
        Vendor:1aed, Device:2001, Sub vendor:1aed, Sub device:2001
        Firmware v7.1.17, rev 116786 Public
        Geometry and capacity information not available.
        Format: not low-level formatted
        PCIe slot available power: unavailable
        PCIe negotiated link: 4 lanes at 5.0 Gt/sec each, 2000.00 MBytes/sec total
        Internal temperature: 44.30 degC, max 44.79 degC
        Internal voltage: avg 1.01V, max 1.01V
        Aux voltage: avg 2.48V, max 2.48V
        Rated PBW: 8.00 PB
        Lifetime data volumes:
           Physical bytes written: 0
           Physical bytes read   : 0
        RAM usage:
           Current: 0 bytes
           Peak   : 0 bytes

        ACTIVE WARNINGS:
            The ioMemory is currently running in a minimal state.

acquacow · Aug 31, 2022

Oof, that looks rough.

What do your logs look like from the few days prior to this? Were they getting spammed with fioinf or fioerr messages?

Fusion-io ioDrive 2 1.2TB Reference Page

New Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Member

New Member

Well-Known Member

New Member

Well-Known Member

New Member

Well-Known Member

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

New Member

Well-Known Member

Active Member

Well-Known Member