Weird slowdowns - what could be the cause?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Perry

Member
Sep 22, 2016
66
11
8
52
tl;dr - Disk usage is 100% in Windows 10 Task Manager, System memory shoots up, whenever doing disk-intensive write operations that are well within the capabilities of the RAID in question.

First a bit of background: We have an 8-disk RAID 5 that's inside of a motion picture scanner's host PC. The RAID controller is a Highpoint RocketRaid 2720, which we've used for many years. We were able to consistently get about 1200MB/sec when testing using the AJA Disk Speed utility, which is designed to simulate the kinds of files we work with (image sequences, one per frame, at various resolutions and bit depths). For many years this system ran fine, under Windows 7.

Last year we upgraded the scanner, and with that came a new PC. It runs Win10. When we got the system, I set up a new RAID, because we kept the old PC around since there were many jobs on that PC's array. The new system uses the Highpoint card. At first I used an LSI card, but it was overheating due to the dual GPUs in the new PC. The Highpoint is much smaller and simpler, and doesn't overheat. In any case, we had an issue with some drives where the scanner would get ahead of the speed at which the disks could write. Normally the scanner software understands this, and stops. Instead, the operating system's memory usage (according to the Task Manager) starts to shoot up when the disk array is saturated. It gets to 85%, the software stops scanning, and then we have to wait a long time for the data to be written from system memory to the drive.

The scanner starts to capture again, but now, the max speed the disks are writing at is much lower - under 100MB/s - so the only solution is to slow the scanner *way* down to reduce the throughput significantly. We thought the problem was with the drives. When the scanner was upgraded, I purchased some Seagate Terascale disks, which I'd never used before. The scanner manufacturer and I assumed that the problem was that the big caches in these disks were filling up, and that the actual native write speed is probably pretty low, which accounted for the fluctuations we were seeing.

Fast forward to a couple weeks ago, and one of those disks failed. I decided to use the opportunity to replace them all with faster disks (8x Toshiba N300 NAS drives, which we've used successfully in other arrays). Everything seemed good. I was able to get about 1350MB/s on my tests, and when scanning the problem went away.

Then I started having the same issue as described above, last week. I found that sometimes rebooting the PC fixed the problem, but it eventually came back. Today, it happened immediately after a reboot. I'm doing a scan that only requires about 450MB/s throughput, which isn't all that much, and this array should easily handle that. But it's not.

So the question now is - is this a problem with the RAID card? With Windows 10?

We have 8-disk RAIDs all over the office that we pound on daily, that don't show this issue, but most of our machines are either Windows 7, MacPro, or Linux boxes. We only have a couple Win10 machines (which I absolutely hate, for a variety of reasons).

Any ideas?
 

JSchuricht

Active Member
Apr 4, 2011
198
74
28
RocketRaid 2720 is not a great card for calculating parity. I would try to go RAID 10 if possible that would give the card a chance of keeping up. Alternatively you could try going back to the LSI card and stick a fan on it.
 

Lost-Benji

Member
Jan 21, 2013
424
23
18
The arse end of the planet
Highpoint RocketRaid 2720 is NOT a RAID card, its a glorified HBA with software RAID implemented over it. There is a reason these turds are avoided. 8-disk RAID-5 is bordline asking for issues. Can only handle a single bad drive before you in the poop. RAID-6 would be better option for Space Vs Capacity. As to the comment about 8-disk RAID-5's all over the office, that just suggests bad networking and IT implementation. Use decent network and decent file server with either Hardware RAID or ZFS and a backup target of same size.
 

Perry

Member
Sep 22, 2016
66
11
8
52
Highpoint RocketRaid 2720 is NOT a RAID card, its a glorified HBA with software RAID implemented over it. There is a reason these turds are avoided. 8-disk RAID-5 is bordline asking for issues. Can only handle a single bad drive before you in the poop. RAID-6 would be better option for Space Vs Capacity. As to the comment about 8-disk RAID-5's all over the office, that just suggests bad networking and IT implementation. Use decent network and decent file server with either Hardware RAID or ZFS and a backup target of same size.
Thanks for your condescension. Very helpful.

We've been using cheap RocketRaids for years in multiple machines. I'm well aware that they're not real RAID cards, but they work for what we need them to do, and they've been super reliable and cheap. When I say years, I mean 10-15 years across multiple models. This particular card worked fine in a previous system (Win 7), so maybe the Win 10 implementation isn't good - we don't have too many Win10 machines. I'm in the midst of reflashing an old LSI card back to its normal state so I can try that out.

FWIW, we have a 256 TB SAN on a 40GbE backbone that can read/write at over 2GB/second. But we work with files that require north of 1GB/second on a daily basis, all day long on multiple machines. For performance reasons we run local RAIDs for caching files within the applications we use, while reading and writing master files to the SAN when necessary. This is both cost effective and fast and doesn't bog down our network while we're working. It's actually a pretty well thought out setup that works well. It's also the setup that multiple software vendors recommend (SAN-based source and render, local caches).

I prefer RAID 5 because drives fail and it buys some time.
 

Perry

Member
Sep 22, 2016
66
11
8
52
Sound Like a cooling/airflow problem to me.
With the old LSI in the previous system it was definitely a cooling issue. The new machine has more airflow, both intake and exhaust fans (in a roomy Fractal Design case, with fan intake from the bottom, passive vents from the front, and exhaust fans at the top and back. The LSI might be ok in this one, because the other PCIe cards are different/smaller since the upgrade and I don't think they run as hot.

There has been no indication that the Highpoint card or any other components are overheating, and the machine is on 24/7.
 
Last edited:

Perry

Member
Sep 22, 2016
66
11
8
52
RocketRaid 2720 is not a great card for calculating parity. I would try to go RAID 10 if possible that would give the card a chance of keeping up. Alternatively you could try going back to the LSI card and stick a fan on it.
I have a 9240-8i card that I had tried to run FreeNAS on a few years ago. Hated FreeNAS and we built a SAN running on RAID 6 pools with different RAID controllers. So there are two of these on a shelf. One isn't recognized by the system, not sure why. The other is, and I just re-flashed from IT mode back to the default 9240/M1015 firmware using the instructions here, but it's hung on boot now. so I'll need to figure that out. Gets as far as showing me the firmware version, but doesn't proceed beyond that point, after 15 minutes. Hopefully I can figure that out tomorrow with a clear head.
 
Last edited:

Lost-Benji

Member
Jan 21, 2013
424
23
18
The arse end of the planet
Thanks for your condescension. Very helpful.
You asked the question, I gave my 2-cents, I am not here for warm and fuzzy feelings.....

We've been using cheap RocketRaids for years in multiple machines. I'm well aware that they're not real RAID cards, but they work for what we need them to do, and they've been super reliable and cheap. When I say years, I mean 10-15 years across multiple models. This particular card worked fine in a previous system (Win 7), so maybe the Win 10 implementation isn't good - we don't have too many Win10 machines. I'm in the midst of reflashing an old LSI card back to its normal state so I can try that out.

FWIW, we have a 256 TB SAN on a 40GbE backbone that can read/write at over 2GB/second. But we work with files that require north of 1GB/second on a daily basis, all day long on multiple machines. For performance reasons we run local RAIDs for caching files within the applications we use, while reading and writing master files to the SAN when necessary. This is both cost effective and fast and doesn't bog down our network while we're working. It's actually a pretty well thought out setup that works well. It's also the setup that multiple software vendors recommend (SAN-based source and render, local caches).

I prefer RAID 5 because drives fail and it buys some time.
So, a few things by your own admission, went cheap to start with, then spent heavier on a SAN that looking at the maths has either choked network or controller/array limits @ only 2GB/sec. While this is happening, you still have cheap, software based cards that were arround a long time ago and stand an extremely good chance of being turds on Windblows 10.

I suspect though that you (your client or users of the machines) are pulling job files from the SAN to local machines (Likely on either 1Gbps or 10Gbps network), then working on them and sending the ouput back?

If the SAN is a little slow with the multiple clients I/O and you need local caching of media, use HW RAID cards, good drives and move away from parity if you need local I/O in both directions. Highpoint gets no love from me sorry.
 

Perry

Member
Sep 22, 2016
66
11
8
52
You asked the question, I gave my 2-cents, I am not here for warm and fuzzy feelings.....
Yes, unfortunately, you weren't being very helpful so I mean, why even bother replying?

So, a few things by your own admission, went cheap to start with, then spent heavier on a SAN that looking at the maths has either choked network or controller/array limits @ only 2GB/sec
No, you misunderstand. The SAN has 6 pools of 10 disks, each of which is capable of doing more than 2GB/sec, on cheap disks. We have an office with typically 3 people in it at a time at most, usually two or three machines working off the SAN. There's more than enough bandwidth there. The manufacturers of our software all recommend the same basic layout and that's to use the SAN for source and render files, and local arrays for caching, so as not to bog down the network. We're talking about image sequences of hundreds of thousands of 140MB files, and it's a massive drain on any system, including large purpose-built systems for our industry that cost $75k or more.

I suspect though that you (your client or users of the machines) are pulling job files from the SAN to local machines (Likely on either 1Gbps or 10Gbps network), then working on them and sending the ouput back?

You know what they say about assumptions...

That's not at all what we're doing. I've now explained twice how we use the system. Also, it's a 40GbE network, as explained above.

Highpoint gets no love from me sorry.
That's fine. People don't like them, but the reality is that we have been using them for years and they're often faster than other RAID controllers I've used (LSI/Atto/Adaptec and others). In fact, I put an LSI controller into the machine yesterday and was getting about 150MB/s slower speeds. Enough for that machine, but slower than the cheap Highpoint card. Maybe Hihgpoint, or this card, is not good with Windows 10, which would be a shame because they worked great for us for about a decade on Windows 7. We've had both LSI and Adaptec cards fail on us in the past few years, and so far no hardware issues with the Highpoints.

This seems to be a Windows 10 thing, though since this card worked fine under windows 7. Tomorrow I will probably put it in the Windows 7 machine I pulled the LSI card out of yesterday and see if I can make it fail there. My bet is that it'll be fine.
 
Last edited: