Strange problem: SMB read is capped to ~500MB/s

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gnattu

New Member
Jun 21, 2022
7
1
3
My server setup:
omnios-r151042
X520-DA2 nic
Test pool is on Intel Optane 900P so Disk IO is not a problem

I've tried two differnt Windows clients, running both Windows 11 and Windows Server 2019, with 10G nic X557 and Mellanox CX3, and both of them can only read (copy from the server) at 500MB/s or around half of a 10Gbit link. If these two client read at the same time, they can archieve around 800MB/s aggregated speed.

The speed wrting to server is fine and I can hit solid 1GB/s.

iperf is showing 9.4Gbit/s both from and to the server (-R and no -R)

What could be the problem causing this "cap"?
 

tinfoil3d

QSFP28
May 11, 2020
873
400
63
Japan
I assume you verified that local read speeds for those files on that host is much higher than 500 or 800mb/s?
 

i386

Well-Known Member
Mar 18, 2016
4,217
1,540
113
34
Germany
How are you testing?

(slow) Explorer copy?
Or something multithreaded like robocopy?

Edit:
More question(s)
I assume 500MByte/s for large io (movie, iso, etc. files)?
 

gnattu

New Member
Jun 21, 2022
7
1
3
How are you testing?

(slow) Explorer copy?
Or something multithreaded like robocopy?
Just explorer copy, but for a single video file I don't think it would cause any performance problem.

I copied a video file about 5GB from the server and to local nvme ssd, and the speed never go above 500MB(it does goes above from time to time, but not much and below 600), I even changed multiple SSDs to see if the ssd is the bottleneck, but no luck.

I've retired with iscsi and I'm getting better results and much closer to 1GB/s
 
  • Like
Reactions: ColdCanuck

i386

Well-Known Member
Mar 18, 2016
4,217
1,540
113
34
Germany
I don't think it would cause any performance problem
I rarely get over 700MByte/s with explorer copy, but can go over 1.3GByte/s with robocopy and 4+ threads...

Edit: try crystaldiskmark/diskspd with multiple threads to rule out any storage problems
 
Last edited:

gnattu

New Member
Jun 21, 2022
7
1
3
I rarely get over 700MByte/s with explorer copy, but can go over 1.3GByte/s with robocopy and 4+ threads...

Edit: try crystaldiskmark/diskspd with multiple threads to rule out any storage problems
Robocopy is not better in my case, but crystal diskmark is showing good results:

Screen Shot 2022-08-19 at 16.38.30.png

So what is the problem:confused:
 

i386

Well-Known Member
Mar 18, 2016
4,217
1,540
113
34
Germany
Robocopy is not better in my case, but crystal diskmark is showing good results:

View attachment 24044
Nice, you're maxing out the 10GBE link
[/QUOTE]
So what is the problem:confused:
you can rule out the underlying storage, it's fast enough

for explorer/copy: microsoft has documentation about tweaking it (don't have the link right now). usually I wouldn't change these settings...
for robocopy: try different parameters, for large files I use the "/j" paramter (unbuffered io)
 

gnattu

New Member
Jun 21, 2022
7
1
3
for explorer/copy: microsoft has documentation about tweaking it (don't have the link right now). usually I wouldn't change these settings...
for robocopy: try different parameters, for large files I use the "/j" paramter (unbuffered io)
robocopy /J is about the same speed as explorer.

No matter how fast I can get from the benchmarks, if the real work performance is not good than it means nothing.

I just tried a very indirect way: running another Linux server, mount the nfs from omnios, then share it via samba

And I'm getting faster speed than windows to omnios directly :confused:

This is really confusing, if omnios's smb server is slow, why it shows a great benchmark result?
 

gea

Well-Known Member
Dec 31, 2010
3,140
1,182
113
DE
Most benchmarks use large sequential transfers while "realworld" is often many small files. Another point are benchmarks with filesizes < read/write cache. This is why you often use workload based benchmarks ex Aja for Videofiles or filebench ex fivestream read/write or wevserver load.

I have no explanation why SMB - OmniOS should be slower than SMB - SAMBA - NFS - OmniOS. Often the multithraded OmniOS SMB server is faster than singlethreated SAMBA - although not as fast as Solaris with native ZFS. There must be a setting in OmniOS involved. May be an item is SMB Multipath on Linux or some dependencies Windows - OmniOS.

What I have seen on OmniOS performance is a dependency to nic drivers on Windows when an update of my Intel 10G nic drivers had increased performance massively as well as some Windows settings like disabling int throttelling. Also OmniOS ip settings should be optimized (run napp-it default tuning). Jumbo frames can be helpful.
 

gnattu

New Member
Jun 21, 2022
7
1
3
Most benchmarks use large sequential transfers while "realworld" is often many small files. Another point are benchmarks with filesizes < read/write cache. This is why you often use workload based benchmarks ex Aja for Videofiles or filebench ex fivestream read/write or wevserver load.

I have no explanation why SMB - OmniOS should be slower than SMB - SAMBA - NFS - OmniOS. Often the multithraded OmniOS SMB server is faster than singlethreated SAMBA - although not as fast as Solaris with native ZFS. There must be a setting in OmniOS involved. May be an item is SMB Multipath on Linux or some dependencies Windows - OmniOS.

What I have seen on OmniOS performance is a dependency to nic drivers on Windows when an update of my Intel 10G nic drivers had increased performance massively as well as some Windows settings like disabling int throttelling. Also OmniOS ip settings should be optimized (run napp-it default tuning). Jumbo frames can be helpful.
I just tried macOS and got similar read speed, so it could not be Windows side problem but rather an OmniOS SMB problem

I have the default tuning from napp-it and using MTU 9000 on both side.

The SMB - SAMBA - NFS - OmniOS path is only faster in read, and much slower in write. The crystal diskmark also shows a much slower speed, but the copy speed is really close to it(~900 in benchmark, ~800 in a real-world copy).

The thing really confuses me is that, I'm copying a really large single file, so I should get what the benchmark shows me or at least something close, but I only get about half from it.