Napp-it SuperStorage Server 6048R-E1CR36L Performance

Bronko

Member
May 13, 2016
102
7
18
101
I would like to share our new storage server build configuration associated with my question, whether the performance meets the expectations:
(like build example 2.5 from here: Napp-it storageserver build examples)

Barebone: Supermicro | Products | SuperStorage Servers | 4U | 6048R-E1CR36L
(HBA: SAS3 via LSI 3008 controller; IT mode)
CPU: 2x Intel® Xeon® Haswell-EP Series Processor E5-2630 v3, 2.40 GHz, 8-Core
RAM: 256GB (8x 32GB) Samsung DDR4-2133 CL15 (DDP2Gx4) LRDIMM
rpool: 2x 960GB Samsung SM863 Serie, SATA/600, 2,5“ (tantalum capacitors for power loss protection)
data-pool: 24x 8TB (P/N:0F23651-HUH728080AL4200) of 8TB & 6TB Helium Hard Drive | Ultrastar He8 | HGST
L2ARC: 1x 400GB Intel® Solid-State Drive DC P3700 Series NVMe
SLOG: 2x 120GB Samsung SM863 Serie, SATA/600, 2,5“ (tantalum capacitors for power loss protection)
OS: OmniOS 5.11 omnios-r151018-ae3141d April 2016
Napp-it: 16.05 PRO (evaluation)

Code:
# zpool status
  pool: rpool
state: ONLINE
  scan: resilvered 65.9G in 0h2m with 0 errors on Fri May 13 10:27:32 2016
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c5t4d0s0  ONLINE       0     0     0
            c5t5d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: tank1
state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank1                      ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c2t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
            c2t5000CCA23B0D18F9d0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c2t5000CCA23B0CDAE9d0  ONLINE       0     0     0
            c2t5000CCA23B0D0E11d0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c2t5000CCA23B0C20C9d0  ONLINE       0     0     0
            c2t5000CCA23B0CA94Dd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c2t5000CCA23B07B701d0  ONLINE       0     0     0
            c2t5000CCA23B0C9CD5d0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c2t5000CCA23B0BE229d0  ONLINE       0     0     0
            c2t5000CCA23B0C0935d0  ONLINE       0     0     0
          mirror-5                 ONLINE       0     0     0
            c2t5000CCA23B0BFDA9d0  ONLINE       0     0     0
            c2t5000CCA23B0D25C9d0  ONLINE       0     0     0
          mirror-6                 ONLINE       0     0     0
            c2t5000CCA23B0B9121d0  ONLINE       0     0     0
            c2t5000CCA23B0BFCA1d0  ONLINE       0     0     0
          mirror-7                 ONLINE       0     0     0
            c2t5000CCA23B0BDA41d0  ONLINE       0     0     0
            c2t5000CCA23B0BFBF1d0  ONLINE       0     0     0
          mirror-8                 ONLINE       0     0     0
            c2t5000CCA23B0CE5B9d0  ONLINE       0     0     0
            c2t5000CCA23B0CE7A9d0  ONLINE       0     0     0
          mirror-9                 ONLINE       0     0     0
            c2t5000CCA23B0C0901d0  ONLINE       0     0     0
            c2t5000CCA23B0D1BB5d0  ONLINE       0     0     0
          mirror-10                ONLINE       0     0     0
            c2t5000CCA23B0C00B1d0  ONLINE       0     0     0
            c2t5000CCA23B0C9BD5d0  ONLINE       0     0     0
          mirror-11                ONLINE       0     0     0
            c2t5000CCA23B0A3AE9d0  ONLINE       0     0     0
            c2t5000CCA23B0CF6D9d0  ONLINE       0     0     0
        logs
          mirror-12                ONLINE       0     0     0
            c1t5002538C401C745Fd0  ONLINE       0     0     0
            c1t5002538C401C7462d0  ONLINE       0     0     0
        cache
          c3t1d0                   ONLINE       0     0     0

errors: No known data errors

IOzone Performance:

(iozone -a -g 256G)

data-pool (tank1) compression = off:
iozone_write.png
data-pool (tank1) compression = lz4:
iozone_write_lz4.png
data-pool (tank1) compression = off:
iozone_read.png
data-pool (tank1) compression = lz4:
iozone_read_lz4.png
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,520
852
113
DE
Nice equipment you are playing with...

About the expectations.
You have 13 mirrors from disks that can give you a sustained r/w value of around 200 MB/s per disk what means that the whole pool can give a sustained write performance of 2600 MB/s and a read performance of around 5200 MB/s if you can read from both disks of a mirror on reads simultaniously.

When we now ignore the LZ4 value as this must be slightly worser with random data and much better with compressable data (we want tho check pool performance, not lz4 quality), your benschmarks seems like (ignore very small filesizes)

read: 6000-8000 MB/s
write: 2000 - 5000 MB/s with medium filesize.

These is as espected and shows that you are better than expected due the ZFS ARC readcache and the write cache.

Do you have forced sync?
I suppose no. If you need sync, you should set sync=always and test again.

Other comments:
your rpool is huge.
60 GB is enough, with enterprise needs an Intel S3510-80 is cheaper and perfect

Your P3700 as L2Arc is perfect (if you need an L2Arc), check L2arcstat for cache needs.
A slower L2Arc would limit write throughput with such a fast pool as you must write new data to the pool and the L2Arc

Do you need sync write?
Your Slog may be not fast enough, compare the P3700 as Slog on a write bench with sync=always
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,046
1,583
113
CA
Very nice build!

As @gea mentioned if you're using sync then you should replace the SLOG with a P3700 based on what you've used for the rest of your build it's really the only thing not up to matching performance of the rest.

The 400GB L2ARC seems a bit small for that much storage (if you're needing L2ARC), maybe swap the P3700 400GB for SLOG and get a 2TB p3700 for L2ARC.
 

sth

Active Member
Oct 29, 2015
304
49
28
Very good timing. Coincidently Ive just spec'd a very similar system running a pair of P3700's for SLOG and L2arc cache....I hope to pull some stats from it next week. Thanks for sharing.
 

Bronko

Member
May 13, 2016
102
7
18
101
Nice equipment you are playing with...

About the expectations.
You have 13 mirrors from disks that can give you a sustained r/w value of around 200 MB/s per disk what means that the whole pool can give a sustained write performance of 2600 MB/s and a read performance of around 5200 MB/s if you can read from both disks of a mirror on reads simultaniously.

When we now ignore the LZ4 value as this must be slightly worser with random data and much better with compressable data (we want tho check pool performance, not lz4 quality), your benschmarks seems like (ignore very small filesizes)

read: 6000-8000 MB/s
write: 2000 - 5000 MB/s with medium filesize.

These is as espected and shows that you are better than expected due the ZFS ARC readcache and the write cache.
Thanks gea for you reply!
(btw the mirror-12 is the log-mirror)

Do you have forced sync?
I suppose no. If you need sync, you should set sync=always and test again.
...
Do you need sync write?
Your Slog may be not fast enough, compare the P3700 as Slog on a write bench with sync=always
Currently we plan more cifs and iSCSI then nfs (sync) requests.
Ok, for these test I have removed the log mirror-12 (isn't possible with the napp-it web gui, because you have to remove the mirror, not the single SSD from within the mirror-12, after unconfiguring each SSD of course) and the L2ARC NVMe from the tank1 data-pool in order to extended the tank1 with the P3700 NVMe as SLOG consequently:
Code:
# zpool status tank1
  pool: tank1
state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank1                      ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c2t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
            c2t5000CCA23B0D18F9d0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c2t5000CCA23B0CDAE9d0  ONLINE       0     0     0
            c2t5000CCA23B0D0E11d0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c2t5000CCA23B0C20C9d0  ONLINE       0     0     0
            c2t5000CCA23B0CA94Dd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c2t5000CCA23B07B701d0  ONLINE       0     0     0
            c2t5000CCA23B0C9CD5d0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c2t5000CCA23B0BE229d0  ONLINE       0     0     0
            c2t5000CCA23B0C0935d0  ONLINE       0     0     0
          mirror-5                 ONLINE       0     0     0
            c2t5000CCA23B0BFDA9d0  ONLINE       0     0     0
            c2t5000CCA23B0D25C9d0  ONLINE       0     0     0
          mirror-6                 ONLINE       0     0     0
            c2t5000CCA23B0B9121d0  ONLINE       0     0     0
            c2t5000CCA23B0BFCA1d0  ONLINE       0     0     0
          mirror-7                 ONLINE       0     0     0
            c2t5000CCA23B0BDA41d0  ONLINE       0     0     0
            c2t5000CCA23B0BFBF1d0  ONLINE       0     0     0
          mirror-8                 ONLINE       0     0     0
            c2t5000CCA23B0CE5B9d0  ONLINE       0     0     0
            c2t5000CCA23B0CE7A9d0  ONLINE       0     0     0
          mirror-9                 ONLINE       0     0     0
            c2t5000CCA23B0C0901d0  ONLINE       0     0     0
            c2t5000CCA23B0D1BB5d0  ONLINE       0     0     0
          mirror-10                ONLINE       0     0     0
            c2t5000CCA23B0C00B1d0  ONLINE       0     0     0
            c2t5000CCA23B0C9BD5d0  ONLINE       0     0     0
          mirror-11                ONLINE       0     0     0
            c2t5000CCA23B0A3AE9d0  ONLINE       0     0     0
            c2t5000CCA23B0CF6D9d0  ONLINE       0     0     0
        logs
          c3t1d0                   ONLINE       0     0     0

errors: No known data errors

IOzone Performance:
(iozone -a -i 0 -g 256G)

data-pool (tank1) compression = off, sync = always
iozone_write_NVMe_SLOG.pngfor better scaling:
Ups, was it to assume because of that?
Robert Milkowski's blog: ZFS - synchronous vs. asynchronous IO
sync=always
For the ultra-cautious, every file system transaction is
written and flushed to stable storage by a system call return.
This obviously has a big performance penalty.
Next
Your P3700 as L2Arc is perfect (if you need an L2Arc), check L2arcstat for cache needs.
Will check it via "zpool iostat -v tank1 30" in further tests...

A slower L2Arc would limit write throughput with such a fast pool as you must write new data to the pool and the L2Arc
Is it true while Brendan said?:
ZFS L2ARC (Brendan Gregg)
What about writes - isn't flash memory slow to write to?
The L2ARC is coded to write to the cache devices asynchronously, so write latency doesn't affect system performance. This allows us to use "read-bias" SSDs for the L2ARC, which have the best read latency (and slow write latency).
Next
Other comments:
your rpool is huge.
60 GB is enough, with enterprise needs an Intel S3510-80 is cheaper and perfect
Yes, it is ;-) and not necessary for the storage rpool itself. But if I have a solaris option in my network I always have the zone playground in my mind, against the idea: a storage is a storage is a storage... ;-)
 
Last edited:

Bronko

Member
May 13, 2016
102
7
18
101
Very nice build!

As @gea mentioned if you're using sync then you should replace the SLOG with a P3700 based on what you've used for the rest of your build it's really the only thing not up to matching performance of the rest.

The 400GB L2ARC seems a bit small for that much storage (if you're needing L2ARC), maybe swap the P3700 400GB for SLOG and get a 2TB p3700 for L2ARC.
Thank you, T_Minus.
Currently I suppose the 256GB of RAM combined with 400GB L2ARC should be enough, but will check the l2arcstat in further test.
(Where is my second NVMe ;-)

But for now the performance comparison between mirrored SSD an NVMe for the SLOG.

IOzone Performance:
(iozone -a -i 0 -g 256G)

SLOG: 2x 120GB Samsung SM863 Serie, SATA/600, 2,5“ (first configuration)
data-pool (tank1) compression = off, sync = default: (picture from above)
iozone_write.png SLOG: 1x 400GB Intel® Solid-State Drive DC P3700 Series NVMe
data-pool (tank1) compression = off, sync = default
iozone_write_NVMe_SLOG_sync_stand.png

Cool, no performance lag with the mirrored slower SSD SLOG (even more performance), in these test scenario of course.
 
Last edited:

solaris12

Member
Jul 3, 2013
33
0
6
I would like to share our new storage server build configuration associated with my question, whether the performance meets the expectations:
(like build example 2.5 from here: Napp-it storageserver build examples)

Barebone: Supermicro | Products | SuperStorage Servers | 4U | 6048R-E1CR36L
(HBA: SAS3 via LSI 3008 controller; IT mode)
CPU: 2x Intel[emoji768] Xeon[emoji768] Haswell-EP Series Processor E5-2630 v3, 2.40 GHz, 8-Core
RAM: 256GB (8x 32GB) Samsung DDR4-2133 CL15 (DDP2Gx4) LRDIMM
rpool: 2x 960GB Samsung SM863 Serie, SATA/600, 2,5“
data-pool: 24x 8TB (P/N:0F23651-HUH728080AL4200) of 8TB & 6TB Helium Hard Drive | Ultrastar He8 | HGST
L2ARC: 1x 400GB Intel[emoji768] Solid-State Drive DC P3700 Series NVMe
SLOG: 2x 120GB Samsung SM863 Serie, SATA/600, 2,5“
OS: OmniOS 5.11 omnios-r151018-ae3141d April 2016
Napp-it: 16.05 PRO (evaluation)

Code:
# zpool status
  pool: rpool
state: ONLINE
  scan: resilvered 65.9G in 0h2m with 0 errors on Fri May 13 10:27:32 2016
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c5t4d0s0  ONLINE       0     0     0
            c5t5d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: tank1
state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank1                      ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c2t5000CCA23B0CEF7Dd0  ONLINE       0     0     0
            c2t5000CCA23B0D18F9d0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c2t5000CCA23B0CDAE9d0  ONLINE       0     0     0
            c2t5000CCA23B0D0E11d0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c2t5000CCA23B0C20C9d0  ONLINE       0     0     0
            c2t5000CCA23B0CA94Dd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c2t5000CCA23B07B701d0  ONLINE       0     0     0
            c2t5000CCA23B0C9CD5d0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c2t5000CCA23B0BE229d0  ONLINE       0     0     0
            c2t5000CCA23B0C0935d0  ONLINE       0     0     0
          mirror-5                 ONLINE       0     0     0
            c2t5000CCA23B0BFDA9d0  ONLINE       0     0     0
            c2t5000CCA23B0D25C9d0  ONLINE       0     0     0
          mirror-6                 ONLINE       0     0     0
            c2t5000CCA23B0B9121d0  ONLINE       0     0     0
            c2t5000CCA23B0BFCA1d0  ONLINE       0     0     0
          mirror-7                 ONLINE       0     0     0
            c2t5000CCA23B0BDA41d0  ONLINE       0     0     0
            c2t5000CCA23B0BFBF1d0  ONLINE       0     0     0
          mirror-8                 ONLINE       0     0     0
            c2t5000CCA23B0CE5B9d0  ONLINE       0     0     0
            c2t5000CCA23B0CE7A9d0  ONLINE       0     0     0
          mirror-9                 ONLINE       0     0     0
            c2t5000CCA23B0C0901d0  ONLINE       0     0     0
            c2t5000CCA23B0D1BB5d0  ONLINE       0     0     0
          mirror-10                ONLINE       0     0     0
            c2t5000CCA23B0C00B1d0  ONLINE       0     0     0
            c2t5000CCA23B0C9BD5d0  ONLINE       0     0     0
          mirror-11                ONLINE       0     0     0
            c2t5000CCA23B0A3AE9d0  ONLINE       0     0     0
            c2t5000CCA23B0CF6D9d0  ONLINE       0     0     0
        logs
          mirror-12                ONLINE       0     0     0
            c1t5002538C401C745Fd0  ONLINE       0     0     0
            c1t5002538C401C7462d0  ONLINE       0     0     0
        cache
          c3t1d0                   ONLINE       0     0     0

errors: No known data errors

IOzone Performance:

(iozone -a -g 256G)

data-pool (tank1) compression = off:
View attachment 2572
data-pool (tank1) compression = lz4:
View attachment 2573
data-pool (tank1) compression = off:
View attachment 2570
data-pool (tank1) compression = lz4:
View attachment 2571
How much does it cost?



Sent from my iPhone using Tapatalk
 

gea

Well-Known Member
Dec 31, 2010
2,520
852
113
DE
about sync write
If you enable sync, all data is written over the rambased write cache (a few seconds) in a fast sequential way and additionally logged on a per datablock base to the logdevice. With disks and without a very fast Slog, performance can go down dramatically, sometimes down to 10% of non-sync values. This is the reason why one should use a fast Slog with very low latency, high write iops and powerloss protection or use SSD only pools when you really need sync with high iops.

You need secure sync write when you cannot allow to loose a few seconds of confirmed writes on a crash, especially for databases or iSCSI/NFS when you put older filesystems like ext or ntfs onto as on a crash this may result in a corrupted filesystem. So prefer sync with an Slog for iSCSI and NFS. For SMB this is completely irrelevant as ZFS is always consistent after a crash and the current written file is always lost.

For Arc and L2Arc usage you can use the arcstat.pl script (included with napp-it)

about write performance of an L2Arc
Usually write performance of an L2Arc is not as critical. But with >10G network performance and a fast pool in a high-end setup this may be different. In such a case, write performance and iops becomes relevant as somehow you must write new data to both the pool and the cache.
 
Last edited:
  • Like
Reactions: Chuntzu

Bronko

Member
May 13, 2016
102
7
18
101
about sync write
If you enable sync, all data is written over the rambased write cache (a few seconds) in a fast sequential way and additionally logged on a per datablock base to the logdevice. With disks and without a very fast Slog, performance can go down dramatically, sometimes down to 10% of non-sync values. This is the reason why one should use a fast Slog with very low latency, high write iops and powerloss protection or use SSD only pools when you really need sync with high iops.
Beside from the SSD based pool, this is what I have chose especially with the NVMe SLOG and sync = always from above and got poor performance?!

You need secure sync write when you cannot allow to loose a few seconds of confirmed writes on a crash, especially for databases or iSCSI/NFS when you put older filesystems like ext or ntfs onto as on a crash this may result in a corrupted filesystem. So prefer sync with an Slog for iSCSI and NFS. For SMB this is completely irrelevant as ZFS is always consistent after a crash and the current written file is always lost.
If sync = default is configured, the transaction demands makes the decision:
NFS -> sync
iSCSI -> sync ???
CIFS/SMB -> async

Is it true?

For Arc and L2Arc usage you can use the arcstat.pl script (included with napp-it)
Thanks gea!
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,520
852
113
DE
Beside from the SSD based pool, this is what I have chose especially with the NVMe SLOG and sync = always from above and got poor performance?!


If sync = always is configured, the transaction demands makes the decision:
NFS -> sync
iSCSI -> sync ???
CIFS/SMB -> async

Is it true?
Sync write is not related to a network protocol. Any writing process can request sync write and with sync=default ZFS will use sync but only when requested. Even napp-it requests sync on rpool as it requires sync for a working file locking when several processes are concurrently read or write the same file.

NFS for example does not need sync for anything else than applications lke ESXi that request sync for filesystem consistency or data consistency example when using a database in a VM.

CIFS/SMB usually does not need sync as you mostly do not use databases or applications with filelocking over SMB and when then sync is requested and used.

With iSCSI it depends. With sync=default a zvol follows the writback cache setting of a volume based logical unit. Writebackcache=disabled means sync enabled.

You can force a special behaviour for all writes with sync=disabled or sync=always -
does not matter then what the writing process has requested.
 
Last edited:

Bronko

Member
May 13, 2016
102
7
18
101
Sync write is not related to a network protocol. Any writing process can request sync write and with sync=default ZFS will use sync but only when requested. Even napp-it requests sync on rpool as it requires sync for a working file locking when several processes are concurrently read or write the same file.
Not related to a network protocol, more to the solaris service itself: network/nfs/server always request sync writes to zfs, but I was wrong. Thanks...
(btw there was a writing mistake, what I meant was "If sync = default", changed my post above)
NFS for example does not need sync for anything else than applications lke ESXi that request sync for filesystem consistency or data consistency example when using a database in a VM.
If sync = default is configured, furthermore the application transaction demands dictate the sync request to ZFS, underlying network protocol and target service independently. (May be it isn't worth to write it down, but in my context of misunderstanding it should be OK.)

CIFS/SMB usually does not need sync as you mostly do not use databases or applications with filelocking over SMB and when then sync is requested and used
Why not and I remember your performance analyses for SMB2:
https://napp-it.org/doc/downloads/performance_smb2.pdf
SMB2 vs NFS
I have had stability problems with NFS on OSX as disconnects happens as well as a wery bad performance of NFS.
Even with Jumboframes the best what I got without any tweaks was 70 MB/s write and 150 MB/s read.
Together with the missing authentication or authorisation of NFS, this is not an option.
 
Last edited:

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,046
1,583
113
CA
You should be getting > than 100MB/s with Sync Write & P3700 NVME SLOG I would think based on what I've seen @gea tests in the past.

I'll re-test with my RaidZ2 & ZeusRAM but I was getting 3x what you got in certain write tests.. I think it may be even > now that I've tweaked some configurations too. (Keep in mind I'm running VM not bare metal though too so mine should be worse.)
 

Bronko

Member
May 13, 2016
102
7
18
101
You should be getting > than 100MB/s with Sync Write & P3700 NVME SLOG I would think based on what I've seen @gea tests in the past.
But it is. I have scaled the y-axsis for a better analysis:

IOzone Performance:
(iozone -a -i 0 -g 256G)

data-pool (tank1) compression = off, sync = always
iozone_write_NVMe_SLOG_sync_allw_y_scale.png same as above (y-axis adapted):
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,770
866
113
38
Do I interpret that graph to be roughly 8Gbps or am I WAY out in left field?