Server keeps shutting down and dropping hard drives

Discussion in 'RAID Controllers and Host Bus Adapters' started by smidley, Feb 12, 2016.

  1. smidley

    smidley New Member

    Joined:
    Feb 7, 2011
    Messages:
    17
    Likes Received:
    0
    First, the specs.
    • Case: Norco 4220
    • Power Supply: Corsair 750W
    • Motherboard: Tyan S7012
    • CPU: Intel Xeon E5540
    • RAM: 64gigs ECC DDR3
    • Raid Card: LSI MegaRaid SAS 9260-16i
    • Hard Drives: 1x 256gig SSD, 19x 2TB spindle drives. All spindle drives are in a RAID 5. Passthrough is used to present the raid to the VM.
    • OS: ESXi 6 (Installed to USB thumb drive)
    • VM OS: Server 2012 R2
    This server has been super stable until the last couple of days. No changes have been made and no patching has been done lately. The server will be running just fine and then all of a sudden, the physical server will shut off. It will turn itself back on after a few minutes and boot up just fine. There is no reference to anything wrong in the error logs. Just today, I started getting another problem where one of my hard drives in the array will drop out of the array briefly and then start working again just fine. Array logs here: MegaRAID Storage Manager 15.08.01.02 Event Log - Generated on Wed Feb 10 20:41:0 - Pastebin.com

    While I was troubleshooting the hard drive issue, the entire raid controller seemed to shut down and then came back online. I'm wondering if there's an issue with the power supply? I also triggered an event in my server VM that caused the CPU to spike to 100% on purpose and after doing that for a minute, the server shut itself off again. This also leads me to believe it could be a power supply related issue. I'm leaning towards power supply or motherboard here. I'm on the latest version of the motherboard BIOS. Does anyone have any suggestions?
     
    #1
  2. FMA1394

    FMA1394 Active Member

    Joined:
    Jan 11, 2013
    Messages:
    616
    Likes Received:
    174
    I would agree with you on the power supply or motherboard. Check for bad caps. That's what that sounds like to me.
     
    #2
  3. smidley

    smidley New Member

    Joined:
    Feb 7, 2011
    Messages:
    17
    Likes Received:
    0
    What's the best way to check for bad caps?
     
    #3
  4. abundantmussel

    Joined:
    Jan 30, 2016
    Messages:
    35
    Likes Received:
    7
    Look for bulging in the caps or splits in them.
     
    #4
    FMA1394 likes this.
  5. DavidRa

    DavidRa Infrastructure Architect

    Joined:
    Aug 3, 2015
    Messages:
    258
    Likes Received:
    111
    Also check your cooling. The fact that it won't power up for a few minutes is potentially an indicator of overheating.
     
    #5
  6. Quasduco

    Quasduco Active Member

    Joined:
    Nov 16, 2015
    Messages:
    125
    Likes Received:
    46
    I think most importantly, you should be arranging backups not on that server asap. Those power losses could be causing some nasty data corruption.

    Also, not to give you too hard of a time, but you have all 19 spinners in a single raid 5? That is a recipe for problems all by itself...
     
    #6
  7. izx

    izx Active Member

    Joined:
    Jan 17, 2016
    Messages:
    113
    Likes Received:
    38
    Did you notice any issues in the BMC/IPMI System Event Log?

    The controller keeps warning about timeouts and resets on the last port (12-15) with reference to SAS address 0x50014380085EB6C7. Is that the problem disk (most probably), or the onboard expander?

    What manufacturer/model is the problem disk?
     
    #7
  8. smidley

    smidley New Member

    Joined:
    Feb 7, 2011
    Messages:
    17
    Likes Received:
    0
    The data on the server is not critical, it's just my home server with mostly video content. I had the drives in a raid 6 previously, but stepped down to a raid 5 because I needed the extra space :)
     
    #8
  9. smidley

    smidley New Member

    Joined:
    Feb 7, 2011
    Messages:
    17
    Likes Received:
    0
    I found out that my power supply is probably the issue. The fan isn't spinning and I'm guessing it's overheating when the server demands high power usage.
     
    #9
Similar Threads: Server keeps
Forum Title Date
RAID Controllers and Host Bus Adapters Dell M5110 ServeRAID SAS card question Sep 27, 2019
RAID Controllers and Host Bus Adapters Can an HP MicroServer N40L act as JBOD enclosure? Sep 20, 2019
RAID Controllers and Host Bus Adapters HGST 4u60 + LSI 9300 8i + Server 2019 = stuck at spinning logo Aug 15, 2019
RAID Controllers and Host Bus Adapters IBM x3650 M3 / ServeRaid M5014 : Specific SATA drives cannot work together May 2, 2019
RAID Controllers and Host Bus Adapters Best hardware RAID card for massive storage server (24 HDDs) Mar 30, 2019

Share This Page