Strange problem: SMB read is capped to ~500MB/s

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gnattu

New Member
Jun 21, 2022
7
1
3
My server setup:
omnios-r151042
X520-DA2 nic
Test pool is on Intel Optane 900P so Disk IO is not a problem

I've tried two differnt Windows clients, running both Windows 11 and Windows Server 2019, with 10G nic X557 and Mellanox CX3, and both of them can only read (copy from the server) at 500MB/s or around half of a 10Gbit link. If these two client read at the same time, they can archieve around 800MB/s aggregated speed.

The speed wrting to server is fine and I can hit solid 1GB/s.

iperf is showing 9.4Gbit/s both from and to the server (-R and no -R)

What could be the problem causing this "cap"?
 

tinfoil3d

QSFP28
May 11, 2020
903
437
63
Japan
I assume you verified that local read speeds for those files on that host is much higher than 500 or 800mb/s?
 

i386

Well-Known Member
Mar 18, 2016
4,634
1,763
113
36
Germany
How are you testing?

(slow) Explorer copy?
Or something multithreaded like robocopy?

Edit:
More question(s)
I assume 500MByte/s for large io (movie, iso, etc. files)?
 

gnattu

New Member
Jun 21, 2022
7
1
3
How are you testing?

(slow) Explorer copy?
Or something multithreaded like robocopy?
Just explorer copy, but for a single video file I don't think it would cause any performance problem.

I copied a video file about 5GB from the server and to local nvme ssd, and the speed never go above 500MB(it does goes above from time to time, but not much and below 600), I even changed multiple SSDs to see if the ssd is the bottleneck, but no luck.

I've retired with iscsi and I'm getting better results and much closer to 1GB/s
 
  • Like
Reactions: ColdCanuck

i386

Well-Known Member
Mar 18, 2016
4,634
1,763
113
36
Germany
I don't think it would cause any performance problem
I rarely get over 700MByte/s with explorer copy, but can go over 1.3GByte/s with robocopy and 4+ threads...

Edit: try crystaldiskmark/diskspd with multiple threads to rule out any storage problems
 
Last edited:

gnattu

New Member
Jun 21, 2022
7
1
3
I rarely get over 700MByte/s with explorer copy, but can go over 1.3GByte/s with robocopy and 4+ threads...

Edit: try crystaldiskmark/diskspd with multiple threads to rule out any storage problems
Robocopy is not better in my case, but crystal diskmark is showing good results:

Screen Shot 2022-08-19 at 16.38.30.png

So what is the problem:confused:
 

i386

Well-Known Member
Mar 18, 2016
4,634
1,763
113
36
Germany
Robocopy is not better in my case, but crystal diskmark is showing good results:

View attachment 24044
Nice, you're maxing out the 10GBE link
[/QUOTE]
So what is the problem:confused:
you can rule out the underlying storage, it's fast enough

for explorer/copy: microsoft has documentation about tweaking it (don't have the link right now). usually I wouldn't change these settings...
for robocopy: try different parameters, for large files I use the "/j" paramter (unbuffered io)
 

gnattu

New Member
Jun 21, 2022
7
1
3
for explorer/copy: microsoft has documentation about tweaking it (don't have the link right now). usually I wouldn't change these settings...
for robocopy: try different parameters, for large files I use the "/j" paramter (unbuffered io)
robocopy /J is about the same speed as explorer.

No matter how fast I can get from the benchmarks, if the real work performance is not good than it means nothing.

I just tried a very indirect way: running another Linux server, mount the nfs from omnios, then share it via samba

And I'm getting faster speed than windows to omnios directly :confused:

This is really confusing, if omnios's smb server is slow, why it shows a great benchmark result?
 

gea

Well-Known Member
Dec 31, 2010
3,486
1,370
113
DE
Most benchmarks use large sequential transfers while "realworld" is often many small files. Another point are benchmarks with filesizes < read/write cache. This is why you often use workload based benchmarks ex Aja for Videofiles or filebench ex fivestream read/write or wevserver load.

I have no explanation why SMB - OmniOS should be slower than SMB - SAMBA - NFS - OmniOS. Often the multithraded OmniOS SMB server is faster than singlethreated SAMBA - although not as fast as Solaris with native ZFS. There must be a setting in OmniOS involved. May be an item is SMB Multipath on Linux or some dependencies Windows - OmniOS.

What I have seen on OmniOS performance is a dependency to nic drivers on Windows when an update of my Intel 10G nic drivers had increased performance massively as well as some Windows settings like disabling int throttelling. Also OmniOS ip settings should be optimized (run napp-it default tuning). Jumbo frames can be helpful.
 

gnattu

New Member
Jun 21, 2022
7
1
3
Most benchmarks use large sequential transfers while "realworld" is often many small files. Another point are benchmarks with filesizes < read/write cache. This is why you often use workload based benchmarks ex Aja for Videofiles or filebench ex fivestream read/write or wevserver load.

I have no explanation why SMB - OmniOS should be slower than SMB - SAMBA - NFS - OmniOS. Often the multithraded OmniOS SMB server is faster than singlethreated SAMBA - although not as fast as Solaris with native ZFS. There must be a setting in OmniOS involved. May be an item is SMB Multipath on Linux or some dependencies Windows - OmniOS.

What I have seen on OmniOS performance is a dependency to nic drivers on Windows when an update of my Intel 10G nic drivers had increased performance massively as well as some Windows settings like disabling int throttelling. Also OmniOS ip settings should be optimized (run napp-it default tuning). Jumbo frames can be helpful.
I just tried macOS and got similar read speed, so it could not be Windows side problem but rather an OmniOS SMB problem

I have the default tuning from napp-it and using MTU 9000 on both side.

The SMB - SAMBA - NFS - OmniOS path is only faster in read, and much slower in write. The crystal diskmark also shows a much slower speed, but the copy speed is really close to it(~900 in benchmark, ~800 in a real-world copy).

The thing really confuses me is that, I'm copying a really large single file, so I should get what the benchmark shows me or at least something close, but I only get about half from it.
 

aodix85

New Member
Jan 26, 2020
2
0
1
I see that this thread is old but I am wondering if you figured this one out. I have a small homelab with a pfsense router. The pfsense router has quad 10G NICs all bridged together, and I am running crystaldiskmark between three proxmox servers running SMB shares through Ubuntu containers and a windows client. the client has a 25Gb Cx4 as does one server, while the others use 10Gb NICs. The client connects through the bridged connection in the one server and so, from that server to the client I can get ~2GB/s but when I run the benchmark to either of the other servers on 10Gb connections I can get full 1GB/s write speeds, but my READ speeds are capped at 555MB/s, and it isn't the ZFS pools because I am trying one SSD, a stripe of 4 SSDs, a RAID 5 array of three disks and abother RAID10 array (all are ZFS) and they're consistently all 540-560MB/s - they're all hitting the same cap.
I can't figure out if it is the PfSense router with quad 10G NICs or the Proxmox 25G/10G Cx4 bridge or the cheap 10G L2 switch that I have between those servers and the router or what...and the fact that the read speeds are so slow when the writes are where I expect they should be, that part has me stumped...anyway if you didn't sort this out then I guess I will start going through logs and such...
 

bugacha

Active Member
Sep 21, 2024
457
135
43
I see that this thread is old but I am wondering if you figured this one out. I have a small homelab with a pfsense router. The pfsense router has quad 10G NICs all bridged together, and I am running crystaldiskmark between three proxmox servers running SMB shares through Ubuntu containers and a windows client. the client has a 25Gb Cx4 as does one server, while the others use 10Gb NICs. The client connects through the bridged connection in the one server and so, from that server to the client I can get ~2GB/s but when I run the benchmark to either of the other servers on 10Gb connections I can get full 1GB/s write speeds, but my READ speeds are capped at 555MB/s, and it isn't the ZFS pools because I am trying one SSD, a stripe of 4 SSDs, a RAID 5 array of three disks and abother RAID10 array (all are ZFS) and they're consistently all 540-560MB/s - they're all hitting the same cap.
I can't figure out if it is the PfSense router with quad 10G NICs or the Proxmox 25G/10G Cx4 bridge or the cheap 10G L2 switch that I have between those servers and the router or what...and the fact that the read speeds are so slow when the writes are where I expect they should be, that part has me stumped...anyway if you didn't sort this out then I guess I will start going through logs and such...

You're not alone I can't pass 450mb/s on TrueNAS with 10gbe connection to Windows client

Here is my story https://forums.servethehome.com/ind...formance-windows-11-client.46884/#post-456598