NAS OS for NVME-OF and/or 100GbE

Drooh · May 7, 2019

oxynazin said:
Have you looked at spdk? (Storage Performance Development Kit)
https://dqtibwqq6s6ux.cloudfront.ne...e-reports/SPDK_nvmeof_perf_report_19.01.1.pdf
I want to try it by myself just for education purposes but have no time at the moment.
I don't know is it production ready but if you are experimenting you may try it and post results here.

Also for RAID for NVMe you can try Intel VROC. The key is inexpensive for you to try it I think: https://www.amazon.com/Intel-Compon...ds=vroc+intel&qid=1557198183&s=gateway&sr=8-3
https://www.intel.com/content/dam/w...fs/virtual-raid-on-cpu-vroc-product-brief.pdf
Again have no experience with it, just as brainstorm.

I appreciate all the ideas for brainstorming, it’s definitely got me researching new avenues and leading to some progress.

I have tried intel VROC with mixed thoughts. My biggest few issues there are:

-Creating the RAID in the pre-boot environment doesn’t actually create a raid volume that’s recognized by anything other than windows installer. Linux, MacOS, and Windows OS see it as separate drives after the volume is created, it requires an additional software defined raid to be created. Which, unless it’s for a Windows OS boot drive, there’s still OS managing the RAID. Which is ok, but what happened to me due to how consumer mobos bifurcate, the volume can be destroyed without notice. On x299 platform, I had a test raid volume
For both Mac and Win. It worked fine, then I installed another NIC, which reallocated the lanes allocated for VROC. If you boot with part of the volume missing, the array is degraded and there is no coming back. Happened when testing R0 and R1. If it was a raid 5, I assume if 2 Drives didn’t present themselves, even without issues, then the same would happen. It’s cool for testing or if your settings never change, but it’s just too risky. I wouldn’t mind the risk if I could rebuild a failed 4tb array over 100GbE. I could do that in no time, so losing an R0 which could be rebuilt in minutes, no issues for me. If I can’t get information to another machine that fast

-also the performance, compared to a single NvME for boot OS, I can’t say it makes a difference to me. I don’t find any constraints booting or operating on a sigle NvME. Just like my UnRAID VM’s which are on an NvME array that hits 25 gigabytes per second. Gigabytes, not gigabits. I’m like, so what, 95% of everything loads instantaneously with a single high performance NvME. The scale
Achieved is of MS or lower. Latency is already not detectable by human sensory features. Which is all that matters for my project. For a bank, it might show benefit.

-the last thing is along those lines. If the speed doesn’t matter to the OS, the only other use case for VROC would be for transferring files. And I could get some fantastic internal transfer rates, but it begs the question, what’s the use case of transferring faster if not outside the computer itself? I’d made sense when 8tb NvME’s weren’t on the horizon.

The only use case I have is the one at hand. Mainly concerned with read speed of the streaming content, and very little need to write that data to the client. If I could get 40GbE going, I can handle the slower write toke of a single NvME.

That’s my rationale on giving up on VROC

Drooh · May 7, 2019

Also, as I started researching proprietary solutions, i didn’t really find much for hardware/software bundled in a complete package. I’m sure there are options out there, aside from qnap and synology junk, I just haven’t discovered them yet. (And I refer to it as junk because many of their advertised performance specs can’t be achieved. I called them out on it when I spent on an expensive qnap solution that didn’t get anywhere close to what was advertised. They told me they tested under ideal circumstances with ideal equipment, in their lab environment. My response was I have better spec’d client machines, faster drives, better NIC’s, and I’m in an advanced lab environment. I asked them if they could help me replicate their test, and they balked. They wouldn’t give me any information on how they achieved what they advertise. And as I kept raising issue, they wanted me to allow remote access to my network, which was certainly not going to happen)

I decided to give my sales rep a call yesterday. The one who sold me the supermicro NvME Server’s. He didn’t have a clue, but told me to contact supermicro. So I’m in process of getting some suggestions from them. For the time being, they pointed me to

Software-Defined Storage | Super Micro Computer, Inc.

Which gives me a new list of software solutions to explore, some of which were mentioned in this thread.

I’m not super confident in the first 1-2 I started to research, as I saw some reports from other users here, that performance is poor - I think that was a brief skim of the VMware offering.

I’ll research those and report back.

Also, my girlfriend, who works for IBM, put me in touch with a sales guy to talk about a potential proprietary solution. We shall see if there is anything to offer in the realm of affordability.

Ideally, budget for a proprietary system is $10-15k. Which I don’t think is going to happen, even if i am bringing my own NvMe’s.

BoredSysadmin · May 7, 2019

Drooh said:
Ideally, budget for a proprietary system is $10-15k. Which I don’t think is going to happen, even if i am bringing my own NvMe’s.

Decent budget. That should be more than enough to for IBM's Flash v9000 system bezels /s
Source: We have in-house Pure Storage AFA - they aren't the cheapest, but one of the cheaper AFA SANs makers.
I think that entry-level pure //m10 with 5tb (raw) system starts from $50k

Drooh · May 7, 2019

BoredSysadmin said:
Decent budget. That should be more than enough to for IBM's Flash v9000 system bezels /s
Source: We have in-house Pure Storage AFA - they aren't the cheapest, but one of the cheaper AFA SANs makers.
I think that entry-level pure //m10 with 5tb (raw) system starts from $50k

Nice, thanks for the suggestion of v9000. I just checked it out. Wondering now if my
Girlfriend’s employee discount will work on that. Haha.

Looks like it has what I need, and support behind it. That’s a definite possibility. There’s some functionality that is a little
Overkill for me now, but will likely be useful in the near future.

Drooh · May 8, 2019

I've blown my own mind a bit. I'm wondering if someone could help me figure out why this is the case. Because this is likely the key to solving my problem. Or, at least a possible route.

I was working with a few VM's a moment ago. Just a couple of Windows 10 VM's. Transferring between the NVME and SSD array on the same server. Lightning crazy fast transfer. Over 20 gigabytes per second. When I transfer from the same two arrays, outside of a VM, from a docker container, i get no such speed.

This made me think, what would happen if I ditched the virtual bridge, and used the physical bridge to connect to the LAN. And then made me think, let me setup a separate bridge to a physical machine directly attached via 40GbE NIC. I did that, and it saturates the 40Gbe Connection with ease. I'm about to try with the 100GbE NIC. I'm already mystified by this, and I am going to freak out, if it saturates the 100GbE line.

For clarity's sake, my testing has always been Client -> 40Gbe Switch -> Server; or Client ->40Gbe Direct Attach to Server. And I'm testing from client to a share on the physical server.

Now, I have not changed any physical connection, but I am using a Windows 10 VM to initiate the same transfer.

So VM (on Server) -> Bridge -> Switch -> Client; or VM (on Server) -> Bridge -> Client

This method like i said is same physical connection, same share, just initiating from the VM. And I guess the only thing I can figure out is that the VM handles the SMB share differently than however it is handled in Linux.

I don't understand what is going on here, but it is starting to make me a little optimistic.

RageBone · May 8, 2019

with Samba on linux, you have to specifically enable "Multichannel" which contains the - SMB Direct - RDMA capabilitys. If you take a look at the used protocol, it has to be smb3_11 as far as i know.

I have no specific experience with windows, but as far as i have read, SMB Direct is Windows Version as in Home, Pro, Server, specific.
So it could be that all your VMs come with full enabled SMB Direct support out of the box, therefore working flawlessly as hoped and expected in that case.
Ther are a few powershell commands to tell you if you are actually having RDMA capable hardware, if it enabled and working.
I think from what i have read, traffic over rdma won't be counted under network-traffic any more and should only appear in specific Performance counters.

On the other hand, maybe you have such great Hardware at your disposal, that there isn't any bottleneck keeping you below 40GbE in that single scenario.
Though, it is rather hard to tell from your description.
Be aware that different Windows Versions come with differeing SMB support, and especially on Linux, that all used softwares have to support the wanted protocol. GVFS and the Cifs Mount didn't support 3_11 till a few month ago if i have noticed that correctly.

Drooh · May 8, 2019

RageBone said:
with Samba on linux, you have to specifically enable "Multichannel" which contains the - SMB Direct - RDMA capabilitys. If you take a look at the used protocol, it has to be smb3_11 as far as i know.

I have no specific experience with windows, but as far as i have read, SMB Direct is Windows Version as in Home, Pro, Server, specific.
So it could be that all your VMs come with full enabled SMB Direct support out of the box, therefore working flawlessly as hoped and expected in that case.
Ther are a few powershell commands to tell you if you are actually having RDMA capable hardware, if it enabled and working.
I think from what i have read, traffic over rdma won't be counted under network-traffic any more and should only appear in specific Performance counters.

On the other hand, maybe you have such great Hardware at your disposal, that there isn't any bottleneck keeping you below 40GbE in that single scenario.
Though, it is rather hard to tell from your description.
Be aware that different Windows Versions come with differeing SMB support, and especially on Linux, that all used softwares have to support the wanted protocol. GVFS and the Cifs Mount didn't support 3_11 till a few month ago if i have noticed that correctly.

Incredibly useful info. Thank you. The VM’s are some form of distribution for developers which were issued for use by the organization I work for in my other life. Probably does support that protocol.

Are there any other settings or tunings I should execute? You don’t have to elaborate, you can point me to a good source, if possible. Upon my research, it’s hard to locate this information, and once located, vetting the source can be challenging. If you know of a good article, guide, manual, etc, let me know.

For instance, in the NIC configuration, there are several options, that are foreign to me. I’ve researched each one, but still foreign to me at this point.

Hardware is top notch. These particular servers are two Supermicro 1029uz; Dual Xeon Gold 6130, 256GB RAM, Intel P4500 NvME’s. Then there is 1 2029u Dual Xeon Scalable Platinum (can’t remember model off-hand), 256GB RAM, with intel 545 SSD’s, 2 900p optane. Then a 6048 with Samsung EvO SSD’s, 3108 sas raid, 2 900P’s, IronWolf HDD’s, this has dual v4 Xeon e5’d, and 128gb RAM. The others are x9 and x10 supermicro’s with e5’s, 128gb Ram, various mix of enterprise and consumer ssd’s, those are mostly GPU servers. All with Mellanox Connectx-3,4 or 5, or Chelsio t580/6200.

Client machines are mainly i9’s with 64gb RAM, some i7’s. Either 770p, Samsung EVO, or WD Black NvME’s.

Hoping this actually gets me to the light. I’d much rather Have the next spend be on new GPU servers rather than spending more on this problem.

kapone · May 8, 2019

Lemme put it this way...

I went through a similar exercise a while back (https://forums.servethehome.com/ind...ared-storage-or-wtf-is-this-so-complex.21410/) for something similar, and I wasn't even targeting 100gbe...just 40gbe. But..I mean REAL 40gbe, i.e. I can saturate the pipe with anywhere from 10-20x nodes that'll be accessing the storage.

Long story short...all kinds of BS from a variety of vendors, and the absolute cheapest bid from a vendor was > $50K (with a support contract). That's for 40gbe...

maze · May 13, 2019

Maybe have a look at some HPC related storage solutions or go to High-Performance Computing: It's all about the FLOPS. and ask around a bit.

I imagine these guys would be looking at super high performance aswell for their storage operation.

Drooh · Jul 10, 2019

I figured I should come back and give an update. I made some progress!

I went through all of supermicro’s software defined storage solutions. Among the Solutions was Excelero. I talked with them
extensively, but the software license was cost prohibitive for the subscription.

I learned that even with $15k+ per year software solutions, getting extreme throughout from SMB shares to windows and Mac just is not there yet. I was able to learn some tweaks to get full 20GbE throughout.

I did however saturate a 40GbE, then 100GbE line with iSER.

That does the trick for now, and i guess we shall see what the future holds for actual shares

Rand__ · Jul 10, 2019

Please make sure to describe your final solution for others (me included) to follow

RageBone · Jul 11, 2019

cool that iSer was able to saturate 100GbE, how does the host and client side look like in terms of support for it?
Another option might be the slightly older but maybe more supported SRP, scsi +rdma.
iPXE does not supüport iSER, but SRP.

TrumanHW · Aug 18, 2019

Drooh said:
I want to note, on the single NvME vs Array; I was trying to say a single NvME over network is faster than the entire array of NvME's over the network. That's the weirdest problem to assess. The only thing that I can think is the BTRFS RAID array's interaction with the network stack is somehow different than a single attached drive's interaction with the network stack. I have not a clue why or where to start in an assessment. Given this is the case, there may be some legitimacy to trying to implement some sort of pooling software for linux. Or just upgrade to 4TB NvMe's, possibly 8tb NvME's (some are on the way), keeping single files below 4TB's should not be an issue. That may work as a low(ish) cost solution.....

Hey buddy - I'm really hoping you come back to update us on this project. I'm VERY interested to hear what the problem was.

I think your situation is analogous to many others (and mine) ... where the equipment SHOULD -- according to the claims of others ... be faster.

I have 8x 10TB HGST SAS drives ... RAIDZ-2 ... and I get 170MB/s ...

AMAZING! Right?

I also have 4x Samsung PM983 4TB NVMe connected via HighPoint U.2 controller...
In RAID-0 I can get up to 9GB/s.

I have a 10GbE setup ... but like I said -- the max was 170MB/s. I'd REALLY like to know what the bottlenecks are also. I have a feeling there're quite a few people keeping an eye on your progress and wondering "where'd he go?"

If anyone demands proof of the performance I'll benchmark it when I'm done running a task on those drives.

eskiuzmi · Dec 10, 2022

gea said:
ZFS is not a high performance but a high security/ high capacity filesystem. While it offers superiour rambased read/write caches to compensate this.

Does that mean ZFS is terrible for video editing?

i386 · Dec 10, 2022

"it depends"

eskiuzmi · Dec 10, 2022

i386 said:
"it depends"

on what? biggest factor after file size.

gea · Dec 10, 2022

eskiuzmi said:
Does that mean ZFS is terrible for video editing?

ZFS ist ideal for video editing due its proven datasecurity without silent data errors, snaps for versioning (every few minutes when needed) and ultrafast ongoing backups based on ZFS snaps even with huge and open files under high load.

The point is that you need a system and pool that can deliver the needed sequential performance ex 400MB/s+ per editing station with 4k. Unless you want many concurrent active users this can be done especially when you can outsource some hot data to a local NVMe-

i386 · Dec 10, 2022

media devices you use (ssds vs hdd)
number of devices
raid type

with enough hdds you could build huge raid50/60 type arrays (or mirrors/raid 10) for "performant" video editing

eskiuzmi · Dec 10, 2022

gea said:
ZFS ist ideal for video editing due its proven datasecurity without silent data errors, snaps for versioning (every few minutes when needed) and ultrafast ongoing backups based on ZFS snaps even with huge and open files under high load.

The point is that you need a system and pool that can deliver the needed sequential performance ex 400MB/s+ per editing station with 4k. Unless you want many concurrent active users this can be done especially when you can outsource some hot data to a local NVMe-

Timeline can be 4 or 5 video tracks deep and over 20 audio tracks deep so that's well north of 2GB/s. The few gen3 m.2 I have should be fine if striped. But before I get to RAID-ing these I'm gathering intel, like now.
ZFS Snapshot backup looks like RAID 0+1, what's the benefit of each? It seems that ZFS is only as fast as other FS as long as your system is full of RAM or if the network throughput is high enough to flush that cache, is that how it works?

i386 said:
with enough hdds you could build huge raid50/60 type arrays (or mirrors/raid 10) for "performant" video editing

Ok I could try good old mirrored stripe first, HDD are cheap ... and noisy... wait, RAID +0 writes to the mirror at the same time, doesn't it so that'll be noisy. Maybe that's a benefit of ZFS snapshot backup over RAID 0+1, it can be scheduled off hours.

BoredSysadmin · Dec 10, 2022

eskiuzmi said:
on what? biggest factor after file size.

You got two good answers. One more I'd say ZFS could become limiting in terms of performance if you'll be using nvme drives for storage AND have a faster than 100gig (or 12.5GBytes/S) network. ZFS DirectIO should address most of these performance problems later on.
Sources:

Xinnor CEO on its high-speed RAID software and disks, DPUs, ZFS, ZNS, SMR and HPC – Blocks and Files

B&F talks to Xinnor CEO about its tech

blocksandfiles.com

https://builders.intel.com/docs/datacenterbuilders/high-performance-raid-software-for-fast-nvme-storage-systems-with-intel-optane-dc-storage.pdf

NAS OS for NVME-OF and/or 100GbE

New Member

New Member

Not affiliated with Maxell

New Member

New Member

Active Member

New Member

Well-Known Member

Active Member

New Member

Well-Known Member

Active Member

Active Member

New Member

Well-Known Member

New Member

Well-Known Member

Well-Known Member

New Member

Not affiliated with Maxell