IBM ServeRAID M5014 / LSI MegaRAID 9260-8i replacement for VMware ESXi 7

Rowan @ Jetboy

New Member
Oct 30, 2014
11
1
3
49
Home server:
ASRock H97M Pro4 motherboard
IBM ServeRAID M5014 RAID card with LSI MegaRAID 9260-8i 12.15.0-0189 firmware and BBU
4 x 3TB WD Red hard drives in RAID 1+0
2 x 512GB Crucial MX100 SSDs in RAID 1
VMware ESXi 6.7u3

In anticipation of a clean install of ESXi 7, I've picked up another two 3TB WD Reds to expand the RAID 1+0 array, and two 1TB Samsung 860 QVOs to replace the Crucials in the RAID 1 array.

I've read that due to VMKlinux drivers no longer being supported, the M5014 won't work in ESXi 7. What are my options for a cheapish 8-port RAID card with a decent cache and battery backup unit? I'm happy to go the OEM firmware flash route again if necessary.
 

Rowan @ Jetboy

New Member
Oct 30, 2014
11
1
3
49
I've rolled the dice on a used MegaRAID 9361-8i with CacheVault and a BBU. Price-wise, similar to the M5014 and BBU in 2015. If I end up having to get a new battery, they're relatively cheap.
 

Rowan @ Jetboy

New Member
Oct 30, 2014
11
1
3
49
An update:

I replaced the MX100s with a pair of 1TB Samsung QVO SSDs and added another two 3TB WD Reds to the RAID 10 array. That was in addition to swapping to a MegaRAID 9361-8i and a new install of ESXi 7.0 booting from a USB stick.

Initially, things seemed fine, but there were clearly brief drops in connectivity when copying files. Looking in the logs, ESXi was losing connectivity to both RAID arrays on a regular basis, sometimes for as long as 40 seconds. Booting in to the MegaRAID BIOS showed that one of the new WD Reds had been marked as foreign, leaving the RAID 10 array degraded. Reviewing the settings, I still had the WD Red caches switch on, so I changed that. I also disabled the settings to power down unused/spare drives. After rebuilding and restarting I've had no problems for 24 hours. I suspect the power settings were the culprit.
 
Last edited:

jeffshead

New Member
Feb 11, 2015
6
0
1
52
Rowan,

Thanks for sharing!

I have an old ESXi 6.5 install with a M5015 flashed to 9260-8i that I'd like to upgrade to ESXi 7.

I'm not running the vCenter Server. Do you know if all of the ESXi sensors and MegaRaid Manager software fully work with the 9361-8i? With my current ESXi 6.5 server, I get email alerts when there is a drive or battery issue. I want to make sure I don't loose that functionality if I update to v7.

After having the 9631-8i for a while, would you still recommend this model or another?

Cheers!
 

Rowan @ Jetboy

New Member
Oct 30, 2014
11
1
3
49
Hi @jeffshead,

Unfortunately I'm not able to offer any recommendations as the storage connectivity drops didn't go away and I haven't managed to fix them. To update the thread:
  • MegaRAID firmware was updated to the (then) latest May build before I started.
  • Playing with the various cache and power-saving settings on the card don't seem to help, although I've had over a week of no errors at all, followed by overnight clusters of dropping more than once a minute. At one point I found two WD Reds had dropped out of the RAID 10 array across two mirrors, which scared be enough to pull everything off the server, and start to try and eliminate issues.
  • The first suspects were the two new WD Reds, as they were bought just before the whole shingled drive fiasco started. Mine aren't shingled. I've had a failure in a mirror with only old Reds in (that worked for five years with no issues on the old MegaRAID and ESXi 5 - 6.7). I physically unplugged the new Reds from the RAID 10 array, recreated the array with just the four drives and didn't add it an ESXi datastore. I still got (admittedly less) dropouts with just the RAID 1 SSD array. IMHO that rules out the WD Reds.
  • I'd run out of SATA power cables, so was using a couple of Molex to SATA adaptors. These were replaced by by proper SATA power cables from Seasonic's OEM. I also swapped around the PSU sockets I was using, but it didn't fix things.
  • I upgraded the ESXi 7.0 stock lsi_mr3 driver with the latest one from Avago. No change.
  • I've just made the heartbeat change outlined here: VMware Knowledge Base. No change.
In the Events summary I get:

Code:
Lost access to volume 5ee4fd6e-8d5bcb5c-5804-d0509929925d (Red) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

Lost access to volume 5e9b48b4-25529905-e757-d0509929925d (QVO) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

Successfully restored access to volume 5ee4fd6e-8d5bcb5c-5804-d0509929925d (Red) following connectivity issues.

Successfully restored access to volume 5e9b48b4-25529905-e757-d0509929925d (QVO) following connectivity issues.
and in vobd.log:

Code:
2020-06-17T16:48:48.170Z: [vmfsCorrelator] 349846715856us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume 5ee4fd6e-8d5bcb5c-5804-d0509929925d (Red): [Timeout] [HB state abcdef02 offset 3932160 gen 73 stampUS 349846715778 uuid 5ee4f30e-02ffd6d9-9d1a-d0509929925d jrnl <FB 0> drv 24.82]

2020-06-17T16:48:48.170Z: [vmfsCorrelator] 349845260356us: [esx.problem.vmfs.heartbeat.recovered] 5ee4fd6e-8d5bcb5c-5804-d0509929925d Red

2020-06-17T16:48:48.170Z: [vmfsCorrelator] 349846715891us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume 5e9b48b4-25529905-e757-d0509929925d (QVO): [Timeout] [HB state abcdef02 offset 3932160 gen 31 stampUS 349846715832 uuid 5ee4f30e-02ffd6d9-9d1a-d0509929925d jrnl <FB 0> drv 24.82]

2020-06-17T16:48:48.170Z: [vmfsCorrelator] 349845260434us: [esx.problem.vmfs.heartbeat.recovered] 5e9b48b4-25529905-e757-d0509929925d QVO
With having changed so much in one go - ESXi version, RAID card, BBU, cables, SSDs and HDs - and and no ability to try the old card on the current ESXi version, I've made this very difficult to fix. I'm faced with some time-consuming and/or expensive choices.
 

Rowan @ Jetboy

New Member
Oct 30, 2014
11
1
3
49
The culprit was the used MegaRAID 9361-8i. I bought a new one and have been running it without CacheVault and BBU for a fortnight without any incident. A new CacheVault and BBU will go in this week, and hopefully that'll be it.