Anyone with 4 x Samsung 840 PRO's on RAID5 with LSI Card?

Andreas · Apr 25, 2013

renderfarmer said:
Thanks, Andreas. I'll give it a try this weekend and report back. What was interesting to me is that the samsung still did much better than teh sandforce Intel in spite of them being direct competitors and this being on an LSI 9271-8i.

p.s. your 16 drive setup is amazing. Nice DIY rack for them. What are you going to use them for btw?

My eperience is that the 3 ARM cores in the Samsung SSDs have a much more predictable performance characteristic than the Sandforce based SSDs. Especially important in raid arrays.

I wouldn't call it a DIY rack, it is rather a solution to the annoying "problem" to constantly reconfiguring systems in my "lab"-cabinet.

So, here it comes: My portable "RAID-pack".

One power plug and one PCI slot and off we go ....
I'll probable convert more of these. The effort is minimal. The 16x24inch sheet metal is 2$ at the local home depot, cutting it in 16 stripes is 20 min work, a pack of 400 M3x5 screws is 9$. Finally I found 4in1 spreaders for the power connectors, reducing the power cable mess with Molex Y-cables significantly.

16 connected drives and only 5 power cables

Andy

renderfarmer · Apr 25, 2013

Andreas said:
I wouldn't call it a DIY rack, it is rather a solution to the annoying "problem" to constantly reconfiguring systems in my "lab"-cabinet.

So, here it comes: My portable "RAID-pack".

One power plug and one PCI slot and off we go ....
I'll probable convert more of these. The effort is minimal. The 16x24inch sheet metal is 2$ at the local home depot, cutting it in 16 stripes is 20 min work, a pack of 400 M3x5 screws is 9$. Finally I found 4in1 spreaders for the power connectors, reducing the power cable mess with Molex Y-cables significantly.

Whatever you call it; it's pretty cool

Here are some IOmeter readings for 4MB Seq R/W with the drive having 77GB of Maya project files.

So at the moment: 450/436 R/W per SSD. Not sure if I can gain the remaining 10% by erasing the drives, rebuilding the array and doing clean tests.

What method do you recommend to put the drives back to "new" and rebuild the array to take clean iometer readings?

Thanks.

Andreas · Apr 25, 2013

renderfarmer said:
What method do you recommend to put the drives back to "new" and rebuild the array to take clean iometer readings?

Can't recommend anything. My SSDs aren't clean either. Sometimes they are 50-70% full, sometimes they are empty when I do IOMeter runs. The Samsungs are pretty insensitive to "load" factors.

I used Samsung Disk Magican with the 830 after these arrays had some petabytes of writes behind them. About 10% didn't recover back to full performance, even after multiple secure erases. Good you are reminding me, they are still covered by warranty. Thanks

When an array isn't performing as it should do, I do the following (had to do it with the older arrays a few times).

1) Break up the array
2) Don't format the drives
3) use the raw devices in IOMeter
4) Set up IOMeter for whatever test you want (like seq read, seq write)
5) cycle through all the individual drives of the former array. One by one. This is a fast exercise with IOMeter.
6) Remove the low performers, or try to recover them with secure erase
7) Depending on (6), either reduce the array size, replace the faulty drive, or bring the recovered drive back into the array.

To put in another way. If one of your drives maxes out at 450MB/sec, the array can't go faster than 4x450 MB/sec.

Andy

Patrick · Apr 25, 2013

Andreas said:
...

...

I just want to say wow to that background! Lots of cool hardware (and boxes thereof) in that pic.

mrkrad · Apr 25, 2013

have you considered that the power (droop) and heat affects the performance? The drives like samsung eat a ton of power and do not idle much, the poor performing crucials idle most of the time and only under heavy stress do they come out of "low perf mode" which is like running your esxi box on "OS CONTROL" rather than "MAX MAX MAX" speed.

Question 2: Using a FLASH BACK write cache could even out the drive that is spiking slower?

Question 3: What about not using raid or just raid-1

Andreas · Apr 25, 2013

mrkrad said:
have you considered that the power (droop) and heat affects the performance? The drives like samsung eat a ton of power and do not idle much, the poor performing crucials idle most of the time and only under heavy stress do they come out of "low perf mode"

My observation is a bit different. When I did the power consumption test with differnt SSDs last year, the Samsungs were among the lowest with idle power consumption.

BTW, the 840 Pro peak write power is about 30-40% lower than in the 830 generation.

QUOTE]Question 2: Using a FLASH BACK write cache could even out the drive that is spiking slower?[/QUOTE]
This would only work for relatively short transactions. Stream a massive amount of data from or to the SSDs and the cache is unfortunately of no help.

renderfarmer · Apr 25, 2013

mrkrad said:
have you considered that the power (droop) and heat affects the performance? The drives like samsung eat a ton of power and do not idle much, the poor performing crucials idle most of the time and only under heavy stress do they come out of "low perf mode" which is like running your esxi box on "OS CONTROL" rather than "MAX MAX MAX" speed.

Question 2: Using a FLASH BACK write cache could even out the drive that is spiking slower?

Question 3: What about not using raid or just raid-1

I think Andreas' suggesting of individually testing the drives will clear a lot of those questions up. I'll post my findings.

klree · Apr 25, 2013

renderfarmer said:
I decided to compare two sets of drives in RAID-0 on my LSI 9271-8i-CC:

4x Samsung 840 Pro SSDs (540MB/s - 520MB/s - 100k IOPS / 90k IOPS)
4 x Intel 520 180GB drives (550MB/s / 520MB/s - 50k IOPS / 80k IOPS).

Win 2008R2, both clean, No Read Ahead, Write Through, Direct IO, Stripe=64, on LSI 9271-8i-CC::

So the slow problem with fastpath is fixed?

mrkrad · Apr 26, 2013

Fastpath is very much misunderstood.

1. Enabled, the controller sets its queue depth to like 975

2. in a ESXi vm environment, the LSI PARALLEL/BUSLOGIC PARALLEL use QD1, with LSI SAS it is 16 (or 32) , and with PVSCSI it is 32 (but tweakable to 255).

3. You must also tweak the quantum's and depth for ingress and if permitted use SIOC. The fair sharing algorithm is completely stupid on esxi - For instance a vm with latency sensitive (java) timer will tick off at 1042/sec and a regular VM will run at 84/sec - the sharing algorithm was based on world changes (similar to ticks). Also the system penalizes you for seeks that are > 2000 sectors apart, assuming you are using hard drives , which you are not. The maximum setting is like 2 million, but really the linear doesn't really count much.

Just by tweaking the driver to PVSCSI QD 64, disk quantum 64, you can see immediate gains in benchmarks that uses QD64 as part of testing, which result in heavy queue depth performance. Without this you'll never get to QD64 due to inherit limits built into the O/S.

It does make you wonder if you need to tweak a bare metal o/s as well to perform? Buffering (easy removal, or performance), and system caching all play into the mix here.

Most people will find they cannot sustain a high queue depth for any period of time except for massive ETL jobs.

Something I've been wondering is if defragmentation (since trim is not an option) would help. Given that large database could have 6 million extents if auto-grown (log files too), there is a natural order of extra work to manage this at hypervisor and vm level. A lot of people use thin provisioning and auto-grow by default - sql server defaults.

I was only able to reach QD1 a few times without altering my server to AD-HOC Queries, forced parameterization, maxdop 1(per query maxdop 4), lock pages in ram 32gb, and enabling the resource governor with no script allowing 49% ram for query (!! reduces tempdb usage by 80% for me !!).

I feel defragmentation must be considered! If you are dealing with sharing worlds , the work to seek 100 fragments could cost you a couple of context switches, which due to [lack of consideration] for the application/ssd fastpath esxi will not let you stretch your legs out.

I'm going to do some benchmarks now that I have the LSI 2308 in the dl380e and see if it can perform any different (better) - The DL360e is a cheap server, and it comes with B120i sata raid controller on the mobo[optional 512fbwc], and a b320i raid controller on the riser[lsi2308 uses the same cache so I think only one can be active]. The riser has dual SAS connectors, and the B120i is on the motherboard with 1 SAS connector with only 6 sata ports enabled. The B320i has both ports enabled for 8 drives but you have to enable the KEY to allow for SAS. The key comes with the server if you buy it with the LSI riser board. Lastly you can throw in a P420/1GFBWC controller, but like most full featured raid controllers, the extra junk adds overhead.

The P420 can only perform with SSD with 100% cache to write, 0% to read. If you try to disable acceleration, it sucks butt. Benchmark testing is inadequate. In theory if you use READ-AHEAD you might increase performance (remember -defragment the drive!) at the cost of latency , but in essence you are forcing a high QD which until you reach 32 QD per drive in your raid, you are not being penalized much.

I don't have the FBWC for the B120i/B320i so I can't test that.

Also HP authenticates and boots any drive that does not match its signature if you choose to use their system of lights. There are ways around this, as some folks sell pir8 sleds now, but imo if you are going to do this, just live without the drive lights and label your drives with their serial/wwn and rely on agents to let you know what has failed.

HP is rather smart in the fact they default in simple mode to picking raid pairs (1+0) that are on separate cables and optimal to the physical layout for both Horizontal and Vertical raid. LSI not so much, you have to force the drives to be on separate cable pairs. There is something to this as I had a double drive failure on a raid-10 because I left 0/1 as an arm of raid-10 span. I should have mixed them up. I should have benchmarked if it is faster to have all drives on arms split evenly.

LSI recommends raid-0 only for fastpath performance. Which is really unacceptable given that their controllers love to reset with 840/840 PRO (unsupported remember) and drop like flies.

You can tell if you using an LSI controller with HP, it has "SMARTer" - drop drives on regular failer, or drop drives on "SMART error". I've tried SMARTer on this b320i.

It will be interesting to see if I can compare the B120i to the B320i and see if the on-motherboard intel C600 chipset sata is faster than the LSI.

If you put the B320i into IT mode, I suspect it will give the big brother P420/1GBFBWC a serious run for the money.

This dl360e was the cheapest model out there. I think it was like $999 with a simple e5-2403 quad core [no frills], with b320i and sas enabler key, real rails, no front video port dongle[grr].

renderfarmer · Apr 26, 2013

klree said:
So the slow problem with fastpath is fixed?

I suspect it was never really broken. Likely a benchmarking error on my part.

seang86s · Apr 26, 2013

That is pretty damn creative way of putting together an SSD RAID-pack. At first glance, I thought you were using sheetrock corner bead split down the middle for the mounting "brackets".

If you're looking for an even cleaner power cable, check these guys out. They can make you a custom power cable, down to the spacing between the connectors:

FrozenCPU BackBlaze Custom Power Cable - FrozenCPU.com

Andreas said:
So, here it comes: My portable "RAID-pack".

One power plug and one PCI slot and off we go ....
I'll probable convert more of these. The effort is minimal. The 16x24inch sheet metal is 2$ at the local home depot, cutting it in 16 stripes is 20 min work, a pack of 400 M3x5 screws is 9$. Finally I found 4in1 spreaders for the power connectors, reducing the power cable mess with Molex Y-cables significantly.

PigLover · Apr 26, 2013

Actually, FrozenCPU.com is one of my favorite sites for modding parts. You don't even really need to hire them to make the cable - just buy the parts.

Here's what I'd do:
- buy the Easy Crimp "vampire" tap SATA power plugs
- set up your SSD raid-pack with all the drives mounted
- plug the (unwired) Easy-Crimp connectors into them
- stretch the wires along the plugs and press them in
- snap on the end-caps

At the other end of the wire bundle either crimp on a standard Molex male or (better) the correct plug to go straight into your modular PSU, Make it all pretty by using cable sleeving and shrink-tube for the longer cable back to the Molex/PSU.

And Viola! A perfect it wire harness for your Raid Pack.

renderfarmer · Apr 26, 2013

Ok so I tested the Samsung 840 Pro 256GB SSDs individually using iometer in 4MB(Q16)/4K(Q1-Q4-Q16-Q64) R/W sequential and all 4 are very close to each other in performance so there is no weak link holding the array back.

I then did the same thing for the Intel 520 180GB SSDs and they were also consistent.

Finally, I formed RAID-0 arrays for each set and got the following results for standalone RAID-0 and RAID-0 cachecade with 4x Velociraptor 1TB RAID-5 array using recommended cachecade settings. I tested cachecade with the samsung drives both with 2x RAID-0 and 4x RAID-0. Here are all of the results:

All of this is on an LSI 9271-8iCC

My overall iometer settings were 20M sectors, 20s Duration for each run.

If anyone is interested in using Samsung 840 Pro's with cachecade then these results suggest that a dual RAID-0 array is the most efficient way to go.

mrkrad · Apr 26, 2013

do you think a hot spare might be wise with dual raid-0?

btw can you do a more realistic benchmark? it seems you are hitting the 9271 cache pretty hard?

what are your cachecade settings? 1.0/1.1 are little different than 2.0

renderfarmer · Apr 26, 2013

mrkrad said:
do you think a hot spare might be wise with dual raid-0?

btw can you do a more realistic benchmark? it seems you are hitting the 9271 cache pretty hard?

what are your cachecade settings? 1.0/1.1 are little different than 2.0

Probably. But CC2 only lets you RAID-0 or RAID-1, so if you want the performance of RAID-0 with failover you have to buy two pairs.

I like iometer because it's consistent and repeatable. When I run Anvil or AS SSD I get 10% swings for 4MB SEQ depending on the time of day.

Please note that I mistakenly labeled all of the 4KB tests as 4MB. I've since then corrected that.

I'd be happy to do some more realistic tests if you can suggest them.

My CC2 settings are the ones recommended by LSI: Never Read Ahead / Write Back / Cached IO / 64byte Stripe . I also set the disk cache policy to enabled. The actual cachecade SSDs only have two possible settings RAID-0/RAID-1 and Write Through or Write Back.

Andreas · Apr 27, 2013

seang86s said:
That is pretty damn creative way of putting together an SSD RAID-pack. ...

If you're looking for an even cleaner power cable, check these guys out......

2 x thank you

analog_ · Apr 30, 2013

I have benchmarks using 4 Crucial M4 128GB drives and a LSI9266-8i. Maybe interesting for comparitive data.
Lots of data: Zippyshare.com - run1-lsi9266-8i.zip

You do know using those silverstone 1sata to 4x sata power cable thingies you are putting quite a strain on your PSU. There's a supercap in those things.

renderfarmer · Apr 30, 2013

analog_ said:
I have benchmarks using 4 Crucial M4 128GB drives and a LSI9266-8i. Maybe interesting for comparitive data.
Lots of data: Zippyshare.com - run1-lsi9266-8i.zip

You do know using those silverstone 1sata to 4x sata power cable thingies you are putting quite a strain on your PSU. There's a supercap in those things.

Cool, thanks. Whats the diff between run A and Run B in your screen shots?

cryptz · May 11, 2013

Andreas, I have a very similar setup to you, i have 24 840 pros on 2 72405s. If i enable the adapter cache and turn on WB i get decent peformance. But if i go into hba mode and do a software raid (other then raid 0) the write speed is terrible, about 300Mbps for the entire array. Raid0 performance is ok, and performance with the WB cache enabled is ok. I feel like WB cache is just masking the problem. I am curious as to if you have tried raid 5 or 50 and seen these problems? My ultimate goal is to run all of the drives in hba mode and pass them to a illumos install but that has proved to be unreliable. In addition to the poor write performance the drives tend to hang and then be placed offline be the controller. There are many people on webhostingtalk seeing this so i am curious to see that you havent posted any problems.

mrkrad · May 15, 2013

Can you re-bench with DX05 that came out yesterday? We got drive activity light!! yah! Maybe they are giving out the SM843 firmware without OP for LSI stability.

Right now the 840/840 pro are not good with lsi controllers. They are dropping like flies, hanging (have to pull power to get controller to see the drive as configured bad)

Going to test with the $249 P420/1gb fbwc (they are still in stock folks, thats a supercap adaptec controller from HP for stupid cheap, and SAAP 2 enabled hp smart cache)

Anyone with 4 x Samsung 840 PRO's on RAID5 with LSI Card?

Member

Member

Member

Administrator

Well-Known Member

Member

Member

Member

Well-Known Member

Member

Member

Moderator

Member

Well-Known Member

Member

Member

New Member

Member

New Member

Well-Known Member