m5015 dropping drives+MSM hogging cpu

craze

New Member
Aug 7, 2014
13
2
3
28
I've picked up an m5015 off ebay along with adv feature key(for safestore), and its giving me some grief. I've got 4x seagate 3tb drives(ST3000NM0043) hooked up in raid 10, which were all working fine on a m1015 prior to this, and for reasons that escape me the controller is randomly failing all the drives in a span.

I first ran into this issue last night, while I left my machine on doing a restore operation(toggling drive security results in a drive wipe) and got up today to find it had bsod, upon rebooting I found two of my drives(in the same span) had 'failed'. After marking the drives as unconfigured good, I was able to import my existing configuration and boot back up seemingly without issue. Looking in MSM, this is the only information I could find(this all took place at exactly the same time):
Code:
Controller ID:  0  Unexpected sense:  PD  
  =  -:-:0Unknown Sense Code,  CDB  =  0x8a 0x00 0x00 0x00 0x00 0x00 0x5d 0x50 0xa3 0x00 0x00 0x00 0x01 0x00 0x00 0x00  ,  Sense  =  0x72 0x05 0x24 0x00 0x00 0x00 0x00 0x14 0x03 0x02 0x00 0x00 0x80 0x0e 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

Controller ID:  0  Unexpected sense:  PD  
  =  -:-:1Unknown Sense Code,  CDB  =  0x8a 0x00 0x00 0x00 0x00 0x00 0x5d 0x50 0xa3 0x00 0x00 0x00 0x01 0x00 0x00 0x00  ,  Sense  =  0x72 0x05 0x24 0x00 0x00 0x00 0x00 0x14 0x03 0x02 0x00 0x00 0x80 0x0e 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

Controller ID:  0  PD Error:  -:-:0
  ( Critical  2)

Controller ID:  0  PD Error:  -:-:1
  ( Critical  2)

Controller ID:  0  State change:  PD  
  =  -:-:0  Previous  =  Online
  Current  =  Failed

Controller ID:  0  State change:  PD  
  =  -:-:1  Previous  =  Online
  Current  =  Failed
The firmware was nearly 4 years out of date, and I was having an issue where it refused to use the on-drive cache despite what I set the policy too(resulting in abysmal write speeds), so I updated to the latest firmware figuring that would fix it. Then about an hour ago I get the problem again, but with the other two drives in my raid set(it did fix the disk cache issue however).

This time it logged no errors at all, even when the VD's went offline; I've got no record of that happening in my logs. I had the write cache on at the time so I ended up with preserved cache, but after 'fixing' the array this time and rebooting it would just hang after prompting for the drive security passphrase. If I reboot out of that, all of my changes had been undone, the drives were once again unconfigured bad, and the array was in the same failed state(tried this twice with the same results, as well as prolonged waiting). I ended up having to dump the preserved cache before re-importing the array to get it to boot up properly(I have the write cache turned off right now as a result).

Lastly, since switching from my m1015 to the m5015, the MSM java process has started hogging one of my cores. I've tried updating to the latest MSM as well as doing a clean install, no matter what I try its still using up a quarter of my cpu for no reason. I did not have this issue with MSM prior to switching to the m5015, but I'm not sure how that ties into it(its the MSM services doing it, its still going on while the gui is closed).

Anyone have any idea whats going on here? These drives were all fine on my old m1015 in an identical raid10 setup(however I was not using them with drive security enabled, due to a problem with the card). Could this be some weird hardware issue with the card? The BBU is toast according to MSM, so I could likely get ebay to have the guy refund my money if it came down to it, but to move back to my m1015 would require turning drive security back off, wiping everything in the process, plus then I'd have to restore several tb of data again(and then again after that upon securing a replacement m5015 and turning drive security back on). Running win7 x64, latest drivers, firmware, and MSM off lsi's website for 9260-8i.
 

craze

New Member
Aug 7, 2014
13
2
3
28
In case anyone else ever runs into this issue, and because nobody bothered to respond; the card was faulty. I secured a replacement and everything works fine, and MSM no longer hogs one of my cores.