[Solved] LSI 9341-8i L2/L3 Cache Error.

Discussion in 'RAID Controllers and Host Bus Adapters' started by Apil, Feb 14, 2017.

  1. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    Hi! :)

    I have just experienced what i presume is a hardware malfunctioning on my Sas9341-8i raid card.
    While server was running, the raid suddenly got alot of I/O errors from a program running, that was writing to the raid, and then the raid disappeared in windows.
    After reboot, i now get I/O error in device manager in Windows Server 2012 R2.
    And after POST, it tells me:L2/L3 Cache error was detected on the RAID controller.
    "Please contact technical support to resolve this issue. Press 'X' to continue or else power off the system, replace the controller and reboot. [​IMG]
    From that message i presume that the card has to be replaced to get my raid up and running again.
    Is there any way to get this card running again?
    If not, then i have a few questions:
    Could a FW upgrade possible solve this problem ? Since it seems that everything els is fine (it detects all disks in the MegaRaid Config etc)
    Will I be able to simply just replace the card, with no loose of data etc?
    Does it need to be the exact same model / FW version etc, for me to get the raid up again, incase i replace the raid.?
    Any other way you can help me, or advise on this situation?
    Thank you for your time and help.
    Best regards
    Apil
     
    #1
  2. Tom5051

    Tom5051 Active Member

    Joined:
    Jan 18, 2017
    Messages:
    230
    Likes Received:
    28
    These cards are generally pretty reliable when they report a problem like this. I would suggest replacing the controller with another one that is known to work, hopefully you can borrow one off a friend?
    Otherwise replace the controller, a firmware update is unlikely to be successful nor cure the card.
    Has it got the correct airflow over the card, they get pretty hot.

    Also you didn't say what level of RAID the array was built with. It's possible that with a replacement controller, the array will still be optimal but there is always the chance that it has degraded or failed.
    You may need backups.
    The replacement controller will attempt to get the array configuration from the disks if it is still not corrupt.
    Quite often I move an array of 8 disks between servers and the RAID cards pick up the array config and boots no problem.
     
    #2
  3. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    Hi Tom.

    Thanks for your reply.

    The only option i have to replace it, is to buy a brand new one, so its not that easy.
    Here i have some doubts, of what will happen to the excisting raid, if i replace the card ?

    Its a raid 5 with 8 disks, and all disks seems to be fine.
    But again, i cannot open the Mega Raid software in windows anymore.
    Though in the Megaraid config <ctrl - p> during boot, it says that the array is optimal etc, and findes all disks.

    What hits me as strange, with this "L2/L3 cache error", is that the card dosnt have any Cache ?

    The card from what i read is known to run hot normaly, and it has been, at around 90' degree celcius.
    There has been dedicated fan straight on the card, but plenty of cabin air flow, that passes the card, which i though was sufficient.
    I have now though set a 120 mm fan straight on the card, but it is probably to late.

    Something still tells me, that it would be strange if this is a hardware fault, and permenent damage ?
    Espesically since the card does not have any cache ?

    Thanks again for any help, it is very much appreciated! since my raid is currently down :-(

    Best regards
    Apil
     
    #3
  4. Tom5051

    Tom5051 Active Member

    Joined:
    Jan 18, 2017
    Messages:
    230
    Likes Received:
    28
    That is a strange error, google has nothing about any LSI cards with this error.
    Are you able to check if there is a read cache enabled on the raid card? if so, try disabling. Same for write cache which should already be disabled since you don't have BBU.
     
    #4
  5. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    I agree, the first thing i did, was to try and google it, and got absolutely nothing.

    Do you have any idea, how i check if the read/write cache is enabled/disabled ?
    Is is a jumper on the board, or a bios setting ?

    Any ideas :) ?

    Best regards
    Apil
     
    #5
  6. Tom5051

    Tom5051 Active Member

    Joined:
    Jan 18, 2017
    Messages:
    230
    Likes Received:
    28
    Sorry, in the RAID card bios. I think you said you could still get in there?
    Also tell us a bit more about the rest of the hardware specs in this server if possible. Motherboard, cpu, any other pci-e cards.
    Have you updated anything recently? Motherboard settings, bios updates, new pci-e network card?
     
    #6
  7. Tom5051

    Tom5051 Active Member

    Joined:
    Jan 18, 2017
    Messages:
    230
    Likes Received:
    28
    Most RAID cards that I have experienced over the years have the ability to enable the on-board write cache even if the backup battery (or capacitors) are not present, usually an optional extra. Likewise the on-board read cache is also enabled by default.
    You can also turn off each disks read cache but I doubt this will help with your problem.
    I think it's either the on-board read/write cache or possibly the RAID card's dedicated CPU has some sort of L2/L3 cache.
    There isn't much difference between a processor CPU and a dedicated controller CPU, they just build the controller CPU to do a specific task rather than being a general purpose CPU.
    Does that make sense?
     
    #7
  8. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    Yea i can still access the <ctrl - p> after this error, i was looking around there yesterday, and didnt find anything interesting.
    Do you know what option in there, to disable/enable ?
    Els ill try to have a look around again :)

    The server is a Dell T20 that i pull out, and placed in a custom rack mounted case, with added case fans.
    An Xeon E3- 1225 v2/3 (cant remember), An Intel Dual Gig NiC, and 8 x WD Red 3 TB disks, with a Kingston 120 SSD as System disk.

    I did update the bios and firmware of the motherboard, and Raid card, when i build it 6-12 months ago, because i was having problems getting the card to work, (classic cannot start hardware Error 10 in device manager), seemed to be because of that the card does not have any ram/cache, so i had to disable/enable some settings in the bios of the motherboard, to get it to start, and since then it has been running flawlessly, untill now.
     
    #8
  9. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    Yea that makes sense, i just thought that since this is the 9341 version, and not the 9361, then there was no ram/cache on the board, and there for it utilized the ram as memeory or the cache of the cpu, since it has no dedicated memeory.

    By the way, sorry for my bad english, and lack of correct terms.
     
    #9
  10. Tom5051

    Tom5051 Active Member

    Joined:
    Jan 18, 2017
    Messages:
    230
    Likes Received:
    28
    Check under 'adapter properties' or 'virtual drives -> array properties'
     
    #10
    Apil likes this.
  11. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    Is this it ?
    [​IMG]
    If needed i can provide some more SS's.

    Thanks again :)
    -Apil
     
    #11
  12. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    I tested this both enabled, and disabled, and didnt change anything :(
     
    #12
  13. vanfawx

    vanfawx Active Member

    Joined:
    Jan 4, 2015
    Messages:
    302
    Likes Received:
    51
    Unfortunately I think it's talking about the on-board L2/L3 cache of the raid card CPU, not the on-board RAM cache. If the CPU L2/L3 cache has failed, then it's a sign the CPU itself might be failing on the raid card.

    Hope that helps.
     
    #13
  14. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    I just tried to update the Firmware on the raid card, and it seemed to have done something.
    Now the error dosnt come anymore, and i get the raid in windows.
    So that is something.
    Though now MegaRaid is giving me this :
    [​IMG]

    Any thoughts ?
     
    #14
  15. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    Tried another reboot.
    And everything seem to work now, access the raid now, browse the files etc, but except im getting this:
    [​IMG]

    Maybe i should try to downgrade the FW ?
    [Edit] Trying to update the driver in windows now.

    -Apil
     
    #15
    Last edited: Feb 14, 2017
  16. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    Yay, after newest driver is installed, and reboot, then no more "Pop up" from MegaRAID with errors, and everything seems fine!
    :):):):):):)
     
    #16
  17. vanfawx

    vanfawx Active Member

    Joined:
    Jan 4, 2015
    Messages:
    302
    Likes Received:
    51
    *phew* that's awesome! usually scary messages like you had indicate the worst.
     
    #17
  18. Tom5051

    Tom5051 Active Member

    Joined:
    Jan 18, 2017
    Messages:
    230
    Likes Received:
    28
    Nice work fixing it. Weird error for sure.
    Your right about no cache on that card, from your settings you can see it is set to write through.
    If the cache was available it would have the option for write back or write back with backup battery protection.
     
    #18
  19. Apil

    Apil New Member

    Joined:
    Feb 14, 2017
    Messages:
    11
    Likes Received:
    0
    Thanks again for your guys help again! :)
    -Apil
     
    #19
  20. stin9ray

    stin9ray New Member

    Joined:
    Jan 5, 2018
    Messages:
    1
    Likes Received:
    0
    Hi everybody,

    thank you for posting the above. It helped to figure out what was going on.

    And I have some good news as well: In my case I did not even have to re-flash the firmware. Here is a description of what happened to hopefully help others, but also for myself in case this happens again ;-)

    Setup: I am using the controller for a FreeNAS VM running on ESXi with the controller handed through to the VM. As preferred for zfs usage of course I use JBOD only, so there was no controller level raid that I had to worry about. In my case the controller is a 3008 SAS on the mobo.

    Situation: shutting down the FreeNAS VM hard reset or purple screened the ESXi server. On the next boot vSphere would restart the VM and I'd be back to square one. Disabling vSphere HA helped to finally get into ESXi maintenance mode. However, somewhere in the half a dozen crashes or so I am guessing that the configuration stored on the controller got corrupted.

    In FreeNAS I saw this in the system log:

    Jan 6 12:32:39 fns mfi0: <Fury> port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
    Jan 6 12:32:39 fns mfi0: Using MSI
    Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
    Jan 6 12:32:39 fns mfi0: Firmware fault
    Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
    Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
    Jan 6 12:32:39 fns mfi0: <Fury> port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
    Jan 6 12:32:39 fns mfi0: Using MSI
    Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
    Jan 6 12:32:39 fns mfi0: Firmware fault
    Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
    Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6


    To make the nested setup work, in the intel mobo BIOS I had the Oprom Control for the controller disabled. After I went into the bios and re-enabled the oprom:
    • F2 on boot to get into BIOS
    • "Setup Menu"
    • "Advanced"
    • "PCI Configuration"
    • "PCIe Port Oprom Control"
    • "Enabled" on all entries

    On boot I got exactly the same error during boot as Apil posted at the beginning of the thread:

    L2L3_cache_error.jpg

    Pressing X to continue and crtl-r to get into the raid controller bios I set the controller to factory defaults:
    • Ctrl-n twoce to get to the "Ctrl Mgmt" page
    • lots of tab to get to "Set Factory Defaults"
    • Ctrl-s to save
    • lots of esc to get all the way out to the prompt that tells you to use Alt-Crtl-Del

    factory_reset.jpg

    On the next boot the error was not there any more and it listed the connected physical (jbod) drives instead as per normal. Yes.

    Clean-up: back into the mobo bios to disable the oprom for the controller

    After booting ESXi, turning vSphere HA back on and booting the FreeNAS VM the controller, all the disks, and the zfs mirrored pool were back as if nothing had ever happened.

    :)

    Update 2018-09-08: I am glad I made this post because it just saved my bacon again. Somebody (kids) stacked boxes in front of my home server rack and I am assuming the controller overheated being cooked by all the disks. The LSI controller probably got into an inconsistent state when it did a thermally triggered emergency shut down, and I can't really blame it for that. Anyhow, with my own instructions I got everything back up and running, but boy is it scary when your disks go missing.
     
    #20
    Last edited: Sep 7, 2018
Similar Threads: [Solved] 9341-8i
Forum Title Date
RAID Controllers and Host Bus Adapters [SOLVED] Supermicro X11SSH-CTF IT mode flash successful BUT... Aug 24, 2019
RAID Controllers and Host Bus Adapters [SOLVED]need help finding breackout cables. Aug 21, 2018
RAID Controllers and Host Bus Adapters [SOLVED] Supermicro X11SPH-NCTPF SAS controller not recognized Jun 27, 2018
RAID Controllers and Host Bus Adapters [Solved] Install Windows7 on Raid0 on crossflashed D2607 Jun 4, 2017
RAID Controllers and Host Bus Adapters [Solved] Fujitsu D2607-A21 cannot be flashed May 30, 2017

Share This Page