SuperMicro Performance hell - CSE-847E1C-R1K28JBOD

Chrisbrns

chrisbrns.com
Nov 13, 2011
29
5
3
Central Florida
www.chrisbrns.com
I'm going crazy over here troubleshooting a performance issue that I can't seem to iron out. Wondering if I am overlooking something simple.


Configurations:


Hard Drive: 24x SAS Seagate 12gbps connected, 7.2k 10TB Hard Drives – Model: ST10000NM0096 (revision: E001)

HBA: Dell Perc H830 – Latest firmware (25.5.0.0018) (Dell’s rebranded LSI LSISAS3108 chipset)
-Have 2 cards, both same configuration - Wipe config/recreate config - Same results)

Server: Dell R620 (2x servers attempted)

Chassis: CSE-847E1C-R1K28JBOD

OS: 2016 Standard – Fully patched/vanilla configuration

Raid configurations tested

o Raid 10 (24x drive)

o Block Size 1MB

Performance: Subpar 600-700MBps

Performance tests with DiskBench / IOMeter both show 600-700MBps
Copy/Paste from and two virtual disks in Windows, same/if not slower
Across network, same
Disk Queuing is very low - .0x-1.5 under high load (basic data copy)
CPUs 8 cores, E5 x3 quad cores (2x cpu)
196GB Ram - Swapped out with smaller amounts, for ecc memory tests - all came out clear
Tried different SAS DAC cables between controller and JBOD
Drives show up in Open Manage as connected 12gbps - Everything is green and looks happy
Tried few different driver versions as well
Swapping HBA with another working exact model
Swapped servers with another R620, similar CPU/Ram configurations

We called SuperMicro and looked for help from them as we are certain this could be enclosure issue - Perhaps the backplane firmware or some type of bottleneck on the JBOD itself. IMPI is basically useless and has no configurable functions that pertain or even could tell me what the hell is going on.

What do you guys think? Super Micro whitepaper says I should be getting 2.4GBps average with 24 drives at minimum performance. Getting nowhere near that and I HAVE to put this thing in production soon.
 

MiniKnight

Well-Known Member
Mar 30, 2012
3,037
941
113
NYC
The other side is the PERC controllers may not like the SM chassis expander firmware. Dell is eccentric like that. Dell puts lots of logic in their RAID firmware and server expander stacks.

Usually SM chassis are dumb expanders like the ones you have with very little logic so that SDS vendors can use a vanilla platform. Use their own software and have custom made faceplates. Dell has their own faceplates and firmware.
 

Chrisbrns

chrisbrns.com
Nov 13, 2011
29
5
3
Central Florida
www.chrisbrns.com
The other side is the PERC controllers may not like the SM chassis expander firmware. Dell is eccentric like that. Dell puts lots of logic in their RAID firmware and server expander stacks.

Usually SM chassis are dumb expanders like the ones you have with very little logic so that SDS vendors can use a vanilla platform. Use their own software and have custom made faceplates. Dell has their own faceplates and firmware.
interesting enough, I have 3 DataOn JBOD enclosures. Much more expensive over the SuperMicro, but they have no problem with the H830. If there was a problem with the JBOD, wouldn't the controller bark about it? It serves up the drives and has no complaints with our other configurations... This is why I am perplexed about this issue.
 

Chrisbrns

chrisbrns.com
Nov 13, 2011
29
5
3
Central Florida
www.chrisbrns.com
Updated: Purchased a MegaRAID SAS 9380-8e, according to SuperMicro's supported controller. Same performance, same speeds, same problems. So throwing money this isn't working (Another grand thrown down the tubes).

So I have the right controller, the right drives, right enclosure and still same issues. Where am I going wrong here?
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,118
569
113
New York City
www.glaver.org
So I have the right controller, the right drives, right enclosure and still same issues. Where am I going wrong here?
I didn't see where you specified the type of cable(s) you were using to connect the expander chassis to the Dell server. I would suggest using the LSI controller (not the Dell one) and going into the pre-boot menus (normally Control-C) . Go into the "SAS Topology" menu, which should show you physical links, expanders, and drives depending on what keys you press. You want to make sure that at least 4 SAS channels are in use and that everything has correctly negotiated 12Gbit/sec.

If that all looks good, grab a Live CD of your favorite non-Windows operating system and boot it. I use FreeBSD, but any of the other BSDs or Linux distros should work just fine. Get to the command line and find out what the operating system calls the drives. On FreeBSD this will be something like /dev/mfid0, /dev/mps0, etc. and on Linux it should be /dev/sga. You may have more than one unit, so you'll need to find the correct unit(s) for your expander drives.

Now use the command "dd if=yourdrive of=/dev/null bs=1m count=bignum" where yourdrive is the device name you found above and bignum is some big number. 1024 would read 1GB from the drive(s) - 1024 * 1M. This will do a read from the expander drive(s) and just throw the data away. At the end it should report the number of bytes transferred, the elapsed time, and the bytes/second. You should see the drive activity LEDs on the expander drives blinking during this test. If they don't, you're probably using the wrong device name. Here's an example:
Code:
(0:87) host:~terry# dd if=/dev/mfid0 of=/dev/null bs=1m count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 1.467066 secs (731897398 bytes/sec)
This will give you the read speed from your drive(s). If this looks good, you can do a write test. IMPORTANT - this will overwrite the drive(s) and you will lose any data you have on them. You will need to re-partition and re-format in Windows when you're done. Presumably this system is in testing and you don't have any valuable data on the drives, but you have been warned! Use "dd if=/dev/zero of=yourdrive bs=1m count=bignum" and see what speeds you get. Note: Do not use /dev/random even though it seems like a good idea - the random device is much slower as it needs to repeatedly gather entropy.

If read speeds are good but write speeds are bad, take a look at the volume cache settings. LSI RAID controllers normally offer "write-through" and "write-back" if there is non-volatile storage on the adapter. Write-back is generally faster as it returns the "I'm done" status to the operating system as soon as the data is in the controller's cache.

You can also try the command-line utility (if present on your Live CD) to look at cache status, drive status / topology, etc. Here are some FreeBSD examples:
Code:
(0:93) host:~terry# mfiutil show volumes
mfi0 Volumes:
  Id     Size    Level   Stripe  State   Cache   Name
 mfid0 (  744G) RAID-5      64K OPTIMAL Enabled  <SYSDISK>

(0:94) host:~terry# mpsutil show devices
B____T    SAS Address      Handle  Parent    Device        Speed Enc  Slot  Wdt
00   19   5000cca2604bf8e9 0011    0001      SAS Target    6.0   0001 03    1
00   20   5000cca2604db305 0012    0002      SAS Target    6.0   0001 02    1
00   21   5000cca2604ea531 0013    0003      SAS Target    6.0   0001 01    1
00   22   5000cca2604ddce5 0014    0004      SAS Target    6.0   0001 00    1
00   23   5000cca2604ea6a5 0015    0005      SAS Target    6.0   0001 05    1
00   24   5000cca2604ddf9d 0016    0006      SAS Target    6.0   0001 04    1
These are looking at two different controllers and sets of drives, by the way.
 

Chrisbrns

chrisbrns.com
Nov 13, 2011
29
5
3
Central Florida
www.chrisbrns.com
Terry, thank you for the write up! Really good advice - My colleague and I were about to head to the datacenter to attempt this Sunday evening when we made the quick call to swap out the host server with different generation instead of the same model. Bingo! The R620 for whatever reason does not perform no matter what configuration we used. We swapped out with R710 (older Generation everything) and boom! Everything started working 3/4 times faster.

I can't explain it. Troubleshooting performance sometimes takes weird measures in finding root cause.

Thanks for all your help!