Optimal RAID setup for older Dell R515/PERC H700/SAS Drives

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Perry

Member
Sep 22, 2016
66
11
8
52
Background: We are a film scanning and restoration service working primarily with archival film. We've been using a homegrown RAID with SATA drives for years as the RAID pool for our TigerStore SAN. I recently acquired eight Dell R515 servers from a friend's company that was decommissioning them. Each has a PERC H700 controller, and 12x 4TB SAS 6Gbps 7200RPM drives. Our goal is to migrate the older SATA RAID boxes over to these Dell servers. So I'm starting with one, running CentOS, which serves the volume up via iSCSI to our TigerStore server over our 40GbE network, (which in turn shares them with all the workstations that have the SAN drivers installed).

In our current setup we use RAID 6 volumes. This has been reliable and reasonably fast. RAID 5 was faster, but the extra level of redundancy with RAID 6 made me more comfortable because of course, drives fail. I would like to try to get better performance, so I'm looking at RAID 50, since that should be similar to RAID 6 in terms of safety, but with faster speeds. Right now we can do about 1GB/second Reads and Writes (faster on the reads usually) using the SATA RAIDs. I am hoping the SAS drives in the Dells will eke out a bit more speed, in part because the RAID pools are larger (12 drives vs 8). The sustained speed per disk is similar to the drives we have now.

The kinds of files we work with vary, but are all pretty big. They come in two forms:

1) Quicktime files: these are single movie files, and range in size from a few gigabytes to multiple terabytes (yes, per file). A typical file is 250-500GB in size. Most of the files we work with are Quicktime files.
2) Image sequences: these are folders containing sequentially numbered images, individual image sizes range from 12MB to 200MB, and there can be tens of thousands of them in a folder.

What I would like to get a handle on is the following:

- What are the optimal settings in the H700 controller for a middle ground that deals well with both formats? We need good read and write speeds, as these files are often played back in real time (sometimes faster than real time, such as when copying or rendering). I'm wondering specifically about read and write policies, as well as stripe size, but if there's other stuff to consider, I'm open.
- Is RAID-50 a good choice here? It seems to strike a good balance between speed, reliability, and capacity (we'll get about 30TB from each server).

I decided to just take a stab at it on Monday and set up a 12-drive pool to initialize on the controller as RAID-50 (background initialization because I saw no other option). More than 55 hours later, it's still only 75% done. The RAID is accessible, but I would imagine degraded because of the background init going on. That said, the performance is really, really bad. Very laggy (gaps of many seconds between files when copying a folder full of them) and with write speeds in the range of what I'd expect from a USB-2 external drive: bursts around 500MB/s for a moment then settling down to a 30Mbps or so average (according to Windows file copy dialog's stats). Is this because of the background init? Happy to wait until it's done but I would have expected more given the speed of the drives installed even with an init in progress.

Thanks,

-perry
 
  • Like
Reactions: Samir

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
if this is about film archival, i would recommend not using raid itself, but zfs instead.

if you need to use raid, i would recommend following
raid1 for OS (2x ssd's)

for data 10x 4TB SAS disks:

raid5 with 4TB disks would give you 36TB, 9x read speed, 1x write speed, and 1 disk failure tolerance. Raid6 could be better as it offers better data protections during the rebuild than raid5.

raid50 won't give you much in terms of performance (it may actually reduce it, real world you can expect performance to be around 7-8x read speed, and 0.7-1x write speed.)

raid10 will deliver 20TB would give you 10x read, and 5x write speed, potentially more than 1 disk failure - highly depended which one fails after 1st one fails.


Notes:
8TB SAS HDD disks can be bought from ebay for 58 usd makes little sense of not using them
IT mode controllers are much better option if you choose ZFS.
(with 10x 4TB raidz1 it will deliver more than 34TB of disk space, depending on compression - could result in more than 40TB of space usable depending on your files, and much higher performance)
Don't run system on same raid with your other disks.
Backplane has 4 subchannels, so you should use first 3 slots for system disks, last 3 disk slots will take much longer to access/refresh.
Disable caching, on system ssd's to increase performance of your sas hdd's.
 
Last edited:

Perry

Member
Sep 22, 2016
66
11
8
52
Thanks. We are a service, scanning and restoring film. We're not storing anything long term, things are on our machines for a few weeks then cleared off when the job is done and delivered to the clients but some stuff lingers for months, so volatile setups like RAID 0 are out except in specific systems (where we typically have a local RAID 0 when needed - like for caching large files in our restoration system - things that can be easily regenerated if need be, and need to be on a very fast disk).

I have looked at ZFS and don't have the time or energy to really get into it. been using RAIDs for about 30 years so they're something I'm comfortable with. We have also invested in a SAN metadata server which works well for us. We have a setup that works, I'm just looking to swap out the hardware we're using for something of higher quality than the slightly janky setup we have. Our current setup has been in regular use for some time but we're outgrowing the storage capacity we have. The Dells will get us a fair bit more space. I'm aware of the SAS drive options and yes, we will eventually upgrade those, but for now I'm sticking with what's in the machines and just want to maximize performance.

RAID 10 makes me nervous - we have had drive failures in the past and weren't alerted by the system in time to change it before there was another failure, and lost data. Thus RAID 6 in the current setup, which buys us enough time to deal with a recovery. Plus 20TB for a RAID 10 isn't big enough - we need more room and don't want to invest in the drives at this time.

Hmm.
 
  • Like
Reactions: Samir

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
raid10 is the safest in terms of disk failure. (in this setup its basically 5 disks with mirror)
raid6 provides better consistency during rebuilding but likely slower than raid5

Keep in mind, raid is not a backup.
Most often raid doesn't protect you in data consistency at all either. (zfs does more in this case) its rebuild process only works per block, not per file consistency.

also its worth investing into 4Kn disks instead. With normal 512/e you are loosing like 12% of disk space due to sector inefficiency. With correctly sized, and formated sectors to 4k you will loose only around 3% of space. (in windows case if you have 100GB of data to allocate on 512 it will take around 112GB to write it in logical space on disk, while on 4kn it will take around 103GB; as it doesn't read partition size correctly after using correct blocks - overall its messy, as form of compression can really skew those numbers up, and in doesn't always show correct numbers in windows case)
 
Last edited:
  • Love
Reactions: Samir

Samir

Post Liker and Deal Hunter Extraordinaire!
Jul 21, 2017
3,257
1,447
113
49
HSV and SFO
I've been working with raid since 1995, and if I was in your position, RAID0+1 would feel safer to me than RAID6 as the second drive failure in a R6 scenario is likely during a rebuild simply because of the sizes we're dealing with here. Yes, R6 can sustain any 2 drives failing, but as long as it isn't the second drive in a mirrored pair, a 0+1 setup can literally lose half the drives and still not completely fail. At a 3 drive failure R6 fails and 0+1 can survive. Plus, since speed is a concern, 0+1 is faster too.

You could also do some more exotic stuff like use xpenology and use synology's hybrid raid and then share that volume via iSCSI. This would probably give you pretty good performance as well as safety since the raid controller at that point isn't being used and all the system ram can be used for caching, which even with 8 slots can still be loaded up to 256GB on the cheap since ddr3 is well under $1/gb (and you could probably get it far under 50cents/gb if you purchase a large lot for all the servers).

I too agree that zfs is the way to go on something like this, but I've also never worked with it so I completely understand your hesitation to try it.
 
  • Like
Reactions: DavidWJohnston

Perry

Member
Sep 22, 2016
66
11
8
52
The background initialization on the array finished overnight, so I was able to do more testing this morning. Speed is an order of magnitude better, but still too slow. With Write Back and caching enabled, I'm able to get about 350MB/s writes (according to the Windows 10 file copy progress), with some spikes at around 500MB/s. This is roughly half of what we need.

Digging deeper, it looks like the SAS drives in these have slightly lower sustained speeds than the SATA drives we've been using.

I have the drive formatted with a 512k strip size.

On another identical machine (which will take all morning to set up, probably), I am formatting the array as RAID6, same strip size, fast init (found the setting for that), so hopefully I can compare speeds this afternoon.
 
  • Like
Reactions: Samir

Samir

Post Liker and Deal Hunter Extraordinaire!
Jul 21, 2017
3,257
1,447
113
49
HSV and SFO
Digging deeper, it looks like the SAS drives in these have slightly lower sustained speeds than the SATA drives we've been using.
Doesn't surprise me as the newest 7200 rpm sata/sas drives keep improving the transfer rates over the older ones. But still with 12 drives, that write performance seems slow (if it's all cached) as my HP DL380 G5 with a P400 and just 4x 600GB 10k sas drives hits over 1GB/sec when cached.

The other thing I just thought of that you could try is truenas as that's fairly robust and has good performance.
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
It seems fine for raid6 writeback 512M cache in sas2 backplane. (raid6 does not give him any write performance benefit)
This box is really dated. Increasing cache on the raid controller may help some, or using a newer controller with more cache.

10-15k sas disks more often avg out alone at 250MB/s r/w alone, but are limited in their capacity, and often are more expensive than consumer grade ssd's.



Just a note, r720xd (12bay) can be bought from ebay for like 250-400 usd; and its much more performant system (potentially under 1k with 12x4TB disks too). The R515/R710 backplane, and perc 700 is also limited to certain disk size if i recall... i don't think you can run disks above 6TB on it.
 
Last edited:

Perry

Member
Sep 22, 2016
66
11
8
52
It seems fine for raid6 writeback 512M cache in sas2 backplane. (raid6 does not give him any write performance benefit)
This box is really dated. Increasing cache on the raid controller may help some, or using a newer controller with more cache.
Well, have had more than double the performance in our homegrown setup for a few years now. The motherboard and controller card in the system I built are from the same era as the Dells - around 2014. The drives are newer and larger, but are consumer-grade 8TB 5400RPM WD Reds (shucked from EasyStore externals) in RAID 6 pools of only 8 drives. Yet the arrays are more than twice as fast as 7200RPM enterprise SAS drives from just a couple years earlier? I find it hard to believe that the age of the hardware is the problem here. This feels more like a configuration thing to me.

I set up two more Dells with identical configurations except for the strip size: one is 512 and one is 1M. Both are RAID 6. While the system will allow a fast init, once it's idle it starts to background initialize, so I won't be able to properly test this until Monday probably. I'll report back what I find.
 

Perry

Member
Sep 22, 2016
66
11
8
52
I was able to briefly test these two volumes today. With the 512k strip formatted with a 512k Allocation unit size, I get decent results when writing a 30GB Quicktime file to the volume (about 1GB/S for the first half, then about 750 for the rest). That's probably due to caching. The source drive is an NVME RAID 0 in our restoration system, via 40GbE.

With a 1M strip size, formatted with 1M Allocation unit size, I get a much more consistent speed - 1GB/S until about 80% of the file, then 750MB/s for the rest. All in all, not bad.

I think there's some caching happening here and I'm going to need to do tests with some bigger files to see.

I also tested Image sequences - about 55GB of 40MB Image files, and those copied very inconsistently to the 512k format. it was hard to get an average because it was all over the map (i'd guess about 250MB/s). It copied much more consistently, at about 350MB/s, to the RAID with 1M formatting.

Again, I think I need to test this further. I'd also like to reformat the RAID-50 to use a 512k Allocation unit size to match the formatting of the RAID, but can't until next week because we have some files on there we need to access early in the week.

Just to make clear what the setup is:

-RAIDs are created in BIOS or Dell OMSA web interface
-Read policy is adaptive, Write Back is on.
-Partitions made in gparted, gpt, unformatted
-Partitions are served via iSCSI to the TigerStore server
-TigerStore runs on Windows, so the mounted iSCSI targets are formatted as NTFS (for the RAID50, default AUS)
-The TigerStore clients see the shared volumes as if they were locally mounted, and in a format that looks like the native filesystem. Basically performance is roughly equivalent across platforms - Mac, Windows, Linux.

All connections tested are on a 40GbE network - a mix of copper DACs and fiber, soon to be all fiber.
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
best way to kill normal hdd performance is having multiple people writing or trying to read from it.
 

Perry

Member
Sep 22, 2016
66
11
8
52
best way to kill normal hdd performance is having multiple people writing or trying to read from it.
huh? What does that have to do with any of this testing?

for what it’s worth I’m not only well aware of this, but we have the SAN structured specifically to avoid this situation. Rarely is one person accessing a given RAID at the same time as someone else.

all testing I have done above was on raids only I have access to for now.
 

Moopere

New Member
Mar 19, 2023
26
3
3
Mmm I'd be cautious of reading too much into the results that taskmgr or Windows Explorer give you. Simple copies of huge files in the normal Windows GUI way will usually invoke massive caching and the system will generally try to use as much free RAM as is available which will skew your benchmarking attempts.

Any chance you could use a tool like Crystal Disk Mark? It has its own set of accuracy problems of course but I've found it to be really useful as a pretty repeatable rule-of-thumb when fooling around with RAID cards and systems trying to work out what you're trying to work out now.

I'm really unclear in my mind as to how the Dell R515's are connected to your SAN. Are you doing your RAID builds and tests on the individual R515 servers with DAS disks? Or are you connecting them to the SAN in some way that I can't (yet) get my head around?
 

Perry

Member
Sep 22, 2016
66
11
8
52
Any chance you could use a tool like Crystal Disk Mark? It has its own set of accuracy problems of course but I've found it to be really useful as a pretty repeatable rule-of-thumb when fooling around with RAID cards and systems trying to work out what you're trying to work out now.
Maybe. But I don't really put too much stock in benchmarks. In our day to day we use Windows and Mac built-in file copying tools so that's really all we care about - that we get enough speed using the tools we use on a daily basis. Benchmarking can be interesting and certainly useful in some situations, but doesn't always apply to real-world scenarios. I can't tell you how many people I know who built custom workstations based on benchmark results only to be disappointed with the actual results, and then to find that a different combination of less expensive hardware actually gets better results for their specific use case (I'm talking about things like CPU vs GPU for processing, but the idea is the same - a great system on paper isn't necessarily great in production). But sure, it'll be interesting to see what we get when I have some time to do that.

I'm really unclear in my mind as to how the Dell R515's are connected to your SAN. Are you doing your RAID builds and tests on the individual R515 servers with DAS disks? Or are you connecting them to the SAN in some way that I can't (yet) get my head around?
Each Dell R515 is running linux. The RAID arrays are built on that machine using the PERC H700 hardware controller. In Linux we partition the RAID volumes in gparted as GPT, but we do not format them in Linux. Instead, we create an iSCSI target for that partition, and serve it up to the TigerStore server. When it's first mounted on a the TigerStore server, since it's unformatted, Windows will ask you to format it. From the perspective of the TigerStore machine, it's as if the RAID is direct-attached, but it's actually connected over 40GbE/iSCSI. The volume is formatted as NTFS on the TigerStore server using the standard Windows disk formatting tools, and shows up there like any other attached disk or array.

TigerStore client workstations connect to the server via a driver installed on each workstation. The server can share specific volumes with specific workstations. For example, we'll have a couple volumes that are primarily used by our color correction systems, and a couple that are used by the restoration system. One of our film scanners uses one. Then we have a bunch of LTO7 and LTO-8 sized volumes that are used when preparing deliverables for clients or doing in-house project backups. On each TigerStore client, the driver makes it appear as if the mounted volume is a native format (kind of. On the mac it shows up as "unknown" but it behaves like a native drive, with performance like you'd get with an HFS+ volume). That is, it's not sharing an NTFS volume to all platforms, which would have severely degraded performance on the macs. It appears to each machine as if it was a native drive. So the performance is good regardless of the kind of workstation you're using. It's basically a virtual file system. TigerStore handles file locking so that things don't get overwritten if two machines are accessing the same volume at the same time.

There are a couple reasons we settled on this setup, after years of trying a bunch of arrangements: One is that having each RAID on a separate server helps to maximize performance because you don't have contention if two people are really pounding on the same underlying array at the same time - on a single-machine setup like we have now, one underlying array might have 2-3 partitions, so if two machines are hitting different partitions on the same array, there's a performance hit. Second is that if we have a RAID failure, it's isolated to just the volumes that RAID holds. That lets the TigerStore server keep working and serving up most of the volumes, while we repair the damaged RAID, which can take days to rebuild. Basically it doesn't shut us down for days on end, like a massive DAS RAID on the TigerStore that needs repair could. The third reason was cost. Capacity-locked pre-built systems from companies like Facilis or Quantum are massively expensive and proprietary, with very expensive ongoing maintenance contracts. We were able to build our own hardware backend and just use the TigerStore software, for a few thousand dollars, plus a few hundred bucks a year for support. Our setup allows for unlimited workstations and limited space (though we can upgrade the capacity - we currently have a 256TB license which is plenty for us). This lets us have lots of workstations connected to the SAN if we need. You could also have unlimited capacity and pay per-seat, but because we have a dozen or so computers that sometimes need access it made more sense to limit our space since we don't need unlimited storage.
 

Pete.S.

Member
Feb 6, 2019
56
24
8
RAID50 is less reliable than RAID6. RAID50 can't survive two disk failures if it's the wrong two drives.

I think you should do the math on the theoretical speed with your drives sustained transfer rate and raid config so see where you're at.

I would expect close to theoretical numbers for very large files or you have significant bottlenecks in your hardware or software.
 

Moopere

New Member
Mar 19, 2023
26
3
3
Thanks for the clarity on how you're mounting the RAID sets on your SAN. This is really interesting. I wonder if I can bodge a test lab together using this idea - just to see if I can (smile).

With the benchmark thing. Yes I understand your view and I agree with you. In this case though we don't really care what the numbers are - so long as they are consistent. What we need to do is create a repeatable artificial load on the disk subsystem so that we can attempt to measure the difference in performance when manipulating the various RAID card settings.

Despite inherent inaccuracy in any benchmark they do provide good broad results. So if CDM provided me a result of 800MB/s with a 256KB stripe but 1000MB/s with a 512kb stripe - and if that was repeatable ... then I'd be confident that I'm better off with a 512kb stripe. Note that CDM allows you to adjust the test parameters so you can try to estimate the type of load you're likely to be using.

I'm not all that concerned as to the voracity of the 1000MB/s ... just that its faster. Also, its got to be able to give me broadly sensible numbers so I can diagnose any actual problems.

I've spent a lot of time over the years trying to eek the best out of my raid systems. If your testing regime isn't rigorously repeatable it will drive you bonkers. Its a long data chain from your Windows front end all the way through the SAN and the connected servers. Latency and caching will be everywhere.

I'd start with the individual machines. Test the hell out of one of them (as they are the same) and then set the others up to match. Run a couple of benches on each once they're done to ensure there is nothing funny going on then plug them in and diagnose the rest of the system - chances are high at this point that you'll get the throughput you'd hoped for.

Because your use case is so specific you might well get some real-world benefits from a good amount of tweaking. For mine, I work mostly with VM servers and though I've spent the time fooling with RAID settings there is almost nothing to be gained imho from tweaking out past a RAID's default settings.

Talking of hoped for speed improvements - in my view the SATA versus SAS differences are likely to be unmeasurable ... assuming both are 7200RPM. People always say that SAS drives should be faster and drive manufacturers specs bear that out, but theres not much in it - minimum expected IOPS for a Seagate 7200 SATA (IronWolf 6TB) is about 84, a Seagate Exos 2TB 7E8 (7200RPM) is about 90 iops.

I would however expect a little something because you're adding 4 extra spindles: that should be good for an extra 60-70MB/s minimum (raid 6: write).
 
  • Like
Reactions: Perry

Pete.S.

Member
Feb 6, 2019
56
24
8
Considering the age of R515, your drives are probably from that era as well. They might be something like 4TB Seagate Constellation ES3 SAS. They have a max sustained transfer rate at 175 MB/s.

The max sustained transfer rate will decrease and decrease until you reach the center of the drive where you will have about 50% of max.

12 drives in RAID6 is 10 drives of data so theoretical max transfer rate is 1750MB/s down to 875MB/s.
 

ano

Well-Known Member
Nov 7, 2022
634
259
63
your probably gonna have a very nice performance gain ditching the hw raid and going zfs.

we have redone some older hw raid systems, just putting raid controller from IR to IT HBA, and seeing huge gains, even on old v1 v2 v 3 v4 xeons and ddr3 stuff.

recently changed a 72x900GB sas 10k from hwraid 10 x8 gen supermicro to ZFS rz2 with some vdevs on x10 supermicro, its massssivly faster, kinda funny really.

same for 24 disk 1200GB 10k sas allready on x10 gen
 

CyklonDX

Well-Known Member
Nov 8, 2022
784
255
63
here's a link to zfs storage benchmark i was doing at home some time ago, its flawed to a point as its limiting factor was reading files from single nvme. I think it should have been done from memory to get proper results. Still it shows very big edge over any hardware raid with hdd's.

what its doing:
reads a kvm disk image with windows etc (not zero's) thats 32GB and writes it down to a new location.
(The image sits on a storage cache that tops out at 1900MB/s thus that is limit)

then 2nd one is copying 4096 1M log files that are different in content but still sit on same storage cache capped to 1.9GB/s;
The 4096 x 1M though seems very realistic to what you would be getting without cache, as i do see lot of l2arc misses when this is being done.


each stream represents multiple instances of this action.
(in short lets say 2 people were doing same thing at the same time -> going up to 24 people which is significant load for just 4 disks.)


Typical raid would die on such test as most high-end controllers top out at 2G of cache ram.
(there are some really expensive ones that have 8+GB ... but get really expensive really fast.)
(while here in some cases it had 64GB of ram, and/or nvme as additional cache.)


Here's some interesting stuff from server during the test. (shows a lot of cpu usage for the 32G file instances, and that almost everything is actually being loaded into ARC, and L2ARC cache.
1679429996062.png

1679430009603.png
1679430070180.png
 
Last edited: