LSI 9270 shredding my hard drives?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

lunadesign

Active Member
Aug 7, 2013
256
34
28
I've got a real mystery that's killing me and could use your help.

I have an LSI 9270-8i in a new Supermicro X9SRE-F that's running Windows 8.1. One of the virtual drives is a RAID 1 with two 4TB WD Se (WD4000F9YZ) drives.

I ran a test where I copied a folder with 350 GB of virtual machines to the virtual drive twice. Then, I used Beyond Compare to do a binary comparison of the two folders on the virtual drive.

For the first 95%, the controller is smart and is using one HDD to read the first folder and the other HDD to read the second folder. According to Task Manager, I'm seeing about 350 MB/s of read performance.

Towards the very end of the test, something happens and the controller is suddenly only reading from one HDD, the performance drops to 10 MB/s and I hear a strange mechanical groaning/growling noise from one of the hard drives. As soon as I cancel the test, the noise stops. It's also important to note that if only one drive was being used but things were otherwise normal, I should be seeing at least 80 MB/s.

If I stop the test and re-start it relatively soon, I can reproduce. If I re-start on the first 95% of data, it's great. If I re-start on the last 5% of data, it's super slow and the noise returns. What's interesting is if I let the system sit for a while and re-start on the last 5%, it's fine.

I've already tried swapping the controller and recreating the virtual drive and re-running and had the same behavior.

Does anyone have any ideas what's causing this? I desperately need to get this system up and running but don't want the controller shredding the drives.
 

HellDiverUK

Active Member
Jul 16, 2014
290
52
28
47
Sounds like one of the drives is faulty. I'd be pulling the drives and doing a proper surface scan of the drives individually.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
That's what I initially thought. I did what you suggested on a known, tested system and both drives had no errors. Neither showed any significant change in SMART values either.

I do have 2 more of those WD Se drives but am not sure I want to risk them on this controller. I guess I could do it and watch closely and stop at the first sign of trouble.
 

Darkytoo

Member
Jan 2, 2014
106
4
18
what did you use to test? I'm assuming you are running latest firmware, driver, and MSM? I'm also assuming you unplugged them from the LSI and directly to the motherboard to get SMART info? I use "crystaldiskinfo" which gives really good SMART info. I would take both drives off the LSI, put them on the onboard RAID and try the same thing. I had an LSI 9270 myself for a while and had issues with updating firmware and slow performance, I ended up sending it back after LSI support pronounced it defective. I replaced it with a 9271 and have been very happy. LSI support told me that the 9271 was the actual replacement for the 9270, and a 9270 will not use a cachevault module.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
what did you use to test? I'm assuming you are running latest firmware, driver, and MSM? I'm also assuming you unplugged them from the LSI and directly to the motherboard to get SMART info? I use "crystaldiskinfo" which gives really good SMART info. I would take both drives off the LSI, put them on the onboard RAID and try the same thing. I had an LSI 9270 myself for a while and had issues with updating firmware and slow performance, I ended up sending it back after LSI support pronounced it defective. I replaced it with a 9271 and have been very happy. LSI support told me that the 9271 was the actual replacement for the 9270, and a 9270 will not use a cachevault module.
Thanks for your response and ideas!

I connected the drives to a test system's motherboard and used WD's DOS-based diag tools...specifically the quick and extended tests. I used HD Tune Pro and Smartmontools to get the data from the drives before and after the tests and didn't see any significant changes in SMART data.

LSI thought both of my 9270's were defective based on some info I obtained using their LSIget tool and checking a serial number database. Apparently, they have had to do some re-work on some of the 9270's so they sent me two replacements that have the hardware fix. I've only tested one of the replacements so far and it still exhibits the problem.

I've also since tried swapping one of the hard drives and can still reproduce the problem.

I had not thought of trying with the onboard RAID....I'll give that a try.

I had not heard the 9271 is the replacement for the 9270. My understanding is that they are functionally the same but with different board layouts, just like the 9260 and 9261.
 

Darkytoo

Member
Jan 2, 2014
106
4
18
Thanks for your response and ideas!

I connected the drives to a test system's motherboard and used WD's DOS-based diag tools...specifically the quick and extended tests. I used HD Tune Pro and Smartmontools to get the data from the drives before and after the tests and didn't see any significant changes in SMART data.

LSI thought both of my 9270's were defective based on some info I obtained using their LSIget tool and checking a serial number database. Apparently, they have had to do some re-work on some of the 9270's so they sent me two replacements that have the hardware fix. I've only tested one of the replacements so far and it still exhibits the problem.

I've also since tried swapping one of the hard drives and can still reproduce the problem.

I had not thought of trying with the onboard RAID....I'll give that a try.

I had not heard the 9271 is the replacement for the 9270. My understanding is that they are functionally the same but with different board layouts, just like the 9260 and 9261.
I thought the same thing, but according to LSI tech support, the 9271 is the "newer" card and has different silicon than the 9270. The reason I know that to be true, is that if they were functionally the same, why does the 9270 not support cachevault? According to tech support, they 9270 does not support it, but the 9271 does.

As far as the drives have you tried them individually connected to the 9271 and run the same test? I ask because I had 2-3 calls with LSI to figure out why my SSD array was sooo slow, it ended up that one of the drives is either defective or incompatible with the LSI controller.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
Quick update:

I've tried all of the following and have been unable to stop this problem from happening:
1) Installing the 9270-8i in a PCIe 2.0 slot
2) Trying four different 9270-8i controllers (2 originals, 2 recently received from LSI Support)
3) Installing on a different motherboard (same model but with different proc and no other cards installed)
4) Trying with a base install of Win 8.1
5) Trying with a base install of Win 7
6) Disabling PCI-X slots in the motherboard BIOS
7) Trying a different app to trigger two simultaneous reads from the RAID 1
8) Trying different read ahead / write back settings
9) Trying different versions of the LSI driver
10) Trying another set of WD4000F9YZ drives
11) Trying a smaller set of test data

I'm running out of things to try although I'm currently setting up two 3TB WD Red drives (WD30EFRX) to see if somehow the drive model is part of the equation. I'll also try a single drive config as suggested above but I really doubt that's going to trigger it as the problem seems to be related to the controller's ability to read from two drives in a RAID 1 simultaneously to speed up read performance.

LSI Support is also trying to reproduce this.

Any other ideas?
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
Also, any thoughts on Adaptec Series 8 controllers? I'm currently using LSI exclusively but if I can't get this working ASAP, I'm tempted to give Adaptec a try. I'd be curious for any thoughts on the quality of their hardware, software and support as compared to LSI. (But please, no religious wars. ;))
 

Chuckleb

Moderator
Mar 5, 2013
1,017
331
83
Minnesota
I must say that you have a possessed collection of hardware. That's about it. You have definitely done good testing. Only thing left is different drives like you mentioned. Could be something with the cloud drives?
 

Chuckleb

Moderator
Mar 5, 2013
1,017
331
83
Minnesota
All the manufacturers have targeted drives to certain uses. You mentioned these are the Se drives which are targeted to "cloud" use IIRC. They have the Re drives and now pro drives. In addition, the red line which are targeted to NAS use. Most folks like the red drives, good cost, get features of the RE raid enhanced drives. Etc. I am not sure what the selling points of the Se drive were.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
Ah, by "cloud drives" you were referring to the WD Se drives. Got it.

The differentiation of the WD drives has gotten increasingly confusing. The Se drives are supposed to be like the Re drives but for lighter usage (scale out) scenarios. The Se drives are very similar to the new Red Pro drives.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
UPDATE:

I've made some progress in figuring this out.

It looks like I've got two problems:
1) The LSI controller should be reading from both drives in the RAID 1 VD but sometimes decides to only read from one. I can reproduce this on WD Se drives and WD Red drives.
2) The WD Se drives, even when connected to the motherboard, get angry when hit with two long read operations at the same time.

With regards to problem #2, consider the following test when connected to the motherboard's controller and reading large (10-15 GB) files
A) WD Red, one read process: Read rate is 150 MB/s
B) WD Red, two read processes: Read rate is 140 MB/s
C) WD Se, one read process: Read rate is 180 MB/s
D) WD Se, two read processes: Read rate is 10-15 MB/s and drive makes growling sound

I've put in a call to WD to see if there's a firmware update for this drive. I got to Level 2 support but apparently there's only one Level 3 guy and he was out today so I'll try again tomorrow.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
Yup. It definitely surprised me.

If anyone wants to try the same test, I can provide really easy instructions.

I'm still trying to get through to the elusive one and only Level 3 guy at WD. He was in a meeting when I called this morning.
 

Darkytoo

Member
Jan 2, 2014
106
4
18
i have the same kind of issue with my 9271. a samsung ssd performs poorly from the controller, but if i plug it directly onto the motherboard it transfers 3x as fast. glad you are making progress!
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
UPDATE:

It looks like I've got a fix for problem #2. WD's Level 3 support guy was really helpful and provided me with the latest firmware (01.01A02) for the Se drives. Now, when I hit a single WD drive with two long read ops, the drive deals with it very well. If I start both read ops at the same time, the drive struggles like before for about a second or two (with a little of the growling noise) and then quickly (within about 4-5 seconds) ramps up to 170 MB/s. If I start the second read op after the first one, it skips the struggling stage and settles in at around 170 MB/s. I'm still testing to make sure that this firmware didn't make a huge sacrifice in other areas to improve multi sequential read performance but so far it looks really good.

I'm still working with LSI on problem #1 (controller doesn't read from both drives when it should).
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
i have the same kind of issue with my 9271. a samsung ssd performs poorly from the controller, but if i plug it directly onto the motherboard it transfers 3x as fast. glad you are making progress!

Can you please provide more detail? What Samsung drive(s) and what RAID level are you using? And what kind of transfer are you doing and how are you measuring it?

In general, I've found that the 927x's are much better than the 926x's at keeping up with SSDs but still lag onboard controllers by roughly 10% in some areas. I experimented with Read Ahead and Write Back a lot and was surprised that in certain workloads they were great and in others really hampered performance. Since I got better all-around performance with Read Ahead and Write Back disabled, I've turned them off for my SSD-based virtual drives but am using both on my HDD-based virtual drives.
 

Darkytoo

Member
Jan 2, 2014
106
4
18
Initial a stripe set with a 840 EVO, 840 PRO and a toshiba q series (I know, but some were on sale when others weren't) and I was on the phone LSI tech support about slow transfers. In the event log I get a large amount of errors on bootup from the 840 PRO, and they were blaming that drive, but I had just purchased it a month ago. So I made a stripe set without it, and it had the same slow transfer rates. So I started to make stripe sets with the different drive and when I took out the 840 EVO, it went from 600MB reads to 1800MB reads. I then tried the drive all by itself and the performance was lower still than the others. I then made sure both it and the LSI had the latest firmware, which they did. I just got lazy and haven't RMAd the samsung as it's such a pain in the #%# to get a samsung replacement.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
Initial a stripe set with a 840 EVO, 840 PRO and a toshiba q series (I know, but some were on sale when others weren't) and I was on the phone LSI tech support about slow transfers. In the event log I get a large amount of errors on bootup from the 840 PRO, and they were blaming that drive, but I had just purchased it a month ago. So I made a stripe set without it, and it had the same slow transfer rates. So I started to make stripe sets with the different drive and when I took out the 840 EVO, it went from 600MB reads to 1800MB reads. I then tried the drive all by itself and the performance was lower still than the others. I then made sure both it and the LSI had the latest firmware, which they did. I just got lazy and haven't RMAd the samsung as it's such a pain in the #%# to get a samsung replacement.
Is this a three drive RAID 0?

What sort of errors were you seeing? From the LSI driver or Windows NTFS errors?

Did you try doing single drive RAID 0's to see which individual drives were working better with the LSI card?

Also, did you make sure the LSI card is properly cooled? The 927x's run very hot. LSI has told me its okay as long as they stay under 100 C but that seems way too hot. Both of my 9270's have 80 mm fans nearby blowing on them to keep temps in the 55 C range.