So I have this Mellanox ConnectX-3 and ESXi 6.7U1 and I want to tune ..

Discussion in 'VMware, VirtualBox, Citrix' started by svtkobra7, Dec 6, 2018.

  1. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    I have two Connect X-3 cards, port 1 to switch, port 2 direct connection between hosts. Set up with virtual switch + port group in ESXi (6.7 U1), private IP assigned by FreeNAS ... (intended purpose is a high bandwidth transport for replication / segmented from all other traffic)

    So when FreeNAS-01 & FreeNAS-02 talk (iperf) they get ~14 Gbps (and I thought that seemed "low"), but when I create a vmkernel for ESXi to use, and use the same exact iperf settings, I get ~34 Gbps.

    My theory is that the default settings of ESXi are "conservative" and TSO et al can be manually tweaked to allow more of that juice to be passed through to the VM ...

    I found a few bits regarding this on the web, tried to tweak the settings, and ended up totally screwing the network settings on both hosts. And it looks like the settings have changed in format etc ...

    Anyone have any ideas for optimization here? Thanks in advance.

    Reference link = [SOLVED]Mellanox ConnectX 3 can't get 40G only 10G
     
    #1
  2. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    I assume you tried
    https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters
    https://www.mellanox.com/related-do...ide_for_Mellanox_Network_Adapters_Archive.pdf

    There is also a vmware tuning guide which I can't seem to find atm, but honestly it never gave me any improvement (neither did the above).

    I never found why performance sometimes was cranky, and sometimes just flew and i decided it must be driver issues or bugs (like the rumours of packet loss issues with cx3 on esx which was on the old mlx forums, never found any definitive answer)
     
    #2
  3. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    Happy New Year! Thanks for your reply and sorry for the delay on my end.


    • Honestly I've spent more time searching for answers as opposed to taking that guide and attempting to deploy it, but upon review, I have tried much of what is noted in there. Insofar as I could tell, no impact.
    • Don't you need the MLNX-OFED driver to perform any meaningful tuning in ESXi? If so, I think the fact that there is no OFED driver for 6.7 would explain its retirement.
    • You confirmed separately that there is no hack to get the OFED driver in 6.7, right? I thought I read another post with conflicting remarks / hack to get it to work.
    • I think the issue is my lack of knowledge more than anything, example: (1) adding jumbo to the switch, helped me get consistent speeds around line speed on the 10G switched port (where before I couldn't crack 9G) and (2) I picked up +~5G playing with the tunables in FreeNAS (~19G).
    • But when testing ESXi <=> ESXi only (never a real world case), since I was able to hit ~34G, I want to say its not a hardware issue, rather a configuration one.
    • For sure I could live with ~19G, but when testing that over some duration, and looking at the bitrate swings between reporting intervals, I want to say I'm way off target with the tunables / missing something else. I can't test ATM, but the swings were massive (if memory serves as low as ~10-13G and as high as 28G) over 15 seconds or so, and then repeat although I'm not sure how cyclically. I have no clue what I'm doing here, so unsure if that suggests misconfig / packet loss.
    I've tried so many different iterations of tunables in FreeNAS, I don't know what got me the highest bitrate, but they were somewhere close to what is presented below (more or less in alignment with your other post @ FreeNAS).

    Anything else to give a go / advice as to how to investigate those huge bitrate swings?

    Interface Options = mtu 9000 txcsum txcsum lro tso

    Tunables =

    kern.ipc.maxsockbuf = 157,286,400
    kern.ipc.nmbclusters = 13,104,606
    kern.ipc.somaxconn = 4096
    kern.random.harvest.mask = 351

    net.inet.tcp.mssdflt = 1448

    net.inet.tcp.recvspace = 4,194,304
    net.inet.tcp.sendspace = 4,194,304

    net.inet.tcp.recvbuf_inc = 524,288
    net.inet.tcp.sendbuf_inc = 16,384

    net.inet.tcp.recvbuf_max = 67,108,864
    net.inet.tcp.sendbuf_max = 67,108,864

    net.inet.tcp.recvbuf_auto = 1
    net.inet.tcp.sendbuf_auto = 1
     
    #3
  4. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    HNY!

    Unfortunately I never found a root cause for that either. I resorted to "fast enough" most of the time assuming that even bad days will not impact me too much anyway since most of the time storage is slower.
    Then of course I get hit by it when ever I try to try faster storage (see my recent Napp It's attempts), so now I am looking into RDMA enabled traffic and or 100Gbps instead (or to be more precise I went the CX4 route hoping they'd have more consistent behavior and then one thing let to the other with x16 cards and more pcie-slots and optane and now I have a gigantic mess I have no time to resolve;) ).
    But thankfully I went for a 4 node vsan that just churns on most of the time - slowly but resilient even with 2 boxes down for whatever issue vmware came up with again.

    Sorry for ranting, annoying topic;)
     
    #4
  5. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    • Oh, so you did observe the "burstiness" I'm referring to as well?
    • Care to quantify fast enough (and was it a direct connection)?
    • That was a beautiful segue you presented ... my actual bottleneck at the moment is that I cannot get FreeNAS GUI based "Snapshot replication tasks" beyond approximately ~2G (yes, you read that right). O/c I can blow that away via other means. I'm apparently not alone with that issue, either. So I'm "shopping" ... I just haven't ever had the time to really sit down and dive into napp-it (but soon).
    • I'm sure your '19 roll out of RANdOF will be a smashing success! :)
    • Tangent => and damn that enthusiast Optane pass through issue will never be resolved, I swear. Recent FreeNAS 11.2 release = FreeBSD 11.2 and even on ESXi 6.7 U1 10764712 = no dice (was hoping one of these updates would resolve).
    • read = @Rand__ was tinkering around / broke it / don't want to admit it LOL. :)
    • Some of the issues you've posted are crazy ... and I thought I had bad luck.
    • Please, any time (you are more than entitled after all the guidance you have offered me!)
     
    #5
  6. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    Yes, I had what you call "burstiness" and I call unexplanable/irregular behavior on ESX too with CX3's. One reason I went to CX4s. O/c the new plan (Skylake based) says go CX3+CX4 but thats a different story ;)

    Fast enough atm means ~300 MB/s to/from local SSDs from/to vsan as I thankfully rarely need to evac away from vsan atm.
    And actually performing updates, not even tinkering... but yes :p

    Optane Pass through - no idea what the issue is. That reminds me I wanted to see whether that hit the 4800X too. I need to start a list :(

    Replication - thats CPU bound unfortunately - have you observed single core utilization?
     
    #6
  7. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    • Is there a better term (curious, its not like I care because is fun, rather I like learning)? ;)
    • Is that representative of packet loss, potentially?
    • I do like stories!
    • So I was thinking network speed (you mentioned it flew).
    • I know - I gotta give you a hard time.
    • It has something to do with PCI numbering if memory of the bug ticket serves, but there is the passthru.map workaround, which doesn't work for me, I guess because I have too much enthusiast Optane and not enough 4800X. Interestingly it does resolve the kernel panic @ boot (as reported in the bug ticket); however, as soon as I put the slightest pressure on it, it dies and reports resetting NVMe controller, but it is never reset.
    • I can 100% confirm it does NOT affect the P4800X which is on the compatibility list.
    • Further I read somewhere that enthusiast Optane + ESXi = non INTL sanctioned fun and apparently will void the Optane 5 year warranty if failure results from use in ESXi. But of course I read that on the net, so it may be BS, but if not, enforcement would pose a mighty issue.
    • Right because its single threaded. But to be honest, I don't know what WCPU represents other than it is weighted, I didn't think 88.24% = 88.24% of a CPUs max. Learn me.
    • But, 3.0Ghz (2690 v2) = 2G? AND that is with Encryption Cipher = Disabled + Replication Stream Compression = Off (user definable parameters in snapshot replication). So the notion that it is maxxed out due to encrypting / decrypting doesn't make sense to me.
    • Further to that point, I've run it the other way on 2.8 Ghz (2680 v2), which has a base clock of -7% and I don't see a 7% reduction in throughput.
    • So the solution is a 14 Ghz CPU to saturate 10G with a single core (assumption = your pool supports that speed)? ( (10000/8 = 1250) / (2000/8 = 250) = 5) * 2.8 Ghz = 14 Ghz ... where can I get one of those?
    Code:
    root@FreeNAS-02[~]# top
    last pid: 62824;  load averages:  6.19,  9.15,  9.23    up 0+01:27:05  17:41:30
    62 processes:  2 running, 60 sleeping
    CPU:  4.6% user,  0.0% nice, 32.0% system,  2.3% interrupt, 61.1% idle
    Mem: 33M Active, 269M Inact, 731M Laundry, 190G Wired, 4330M Free
    ARC: 180G Total, 20M MFU, 179G MRU, 322M Anon, 316M Header, 23M Other
         174G Compressed, 175G Uncompressed, 1.00:1 Ratio
    Swap: 4096M Total, 4096M Free
    
      PID USERNAME     THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
    53704 root           1 100    0 27260K 22224K CPU5    5  13:46  88.24% sshd
    53707 root           1  39    0  7836K  3980K piperd  7   4:12  26.09% zfs
    
     
    #7
  8. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    Dunno, don't think there is an official name for it, but burstiness sounds too positive :p
    I never saw packet loss as in individual packets, just 'not transmitting' iirc.

    Yes, but who is going to fix that for unsupported drives. But on the other hand, why does it differ between 900 and 4800x in the first place, can't imagine Intel specifically changing this to prevent usability... although nowadays...
    I don't believe the warranty loss though, too far fetched. And not needed, if you use it outside specified parameters (too many writes) your out of luck anyway.

    Hm, 2 G, ie 200MB/s? does not sound too bad;) But I know what you mean, gui based replication is slow as hell with ssh. Still waiting for faster means too. Whats your receiving side? Also 3 Ghz?
     
    #8
  9. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    • Get your head out of the gutter, I said "burstiness" not "bustiness" :):):)
    • Not VMware and not INTL, FreeBSD will eventually incorporate a driver that works, I assume.
    • Seriously? INTL absolutely is the reason and I assume something in the firmware is the cause.
    • Regardless of how similar they may be from a component standpoint, clearly INTL targeted the P4800x @ enterprise and 900p @ enthusiasts (o/c). P4800x = isdct management / INTL vib / etc. 900p = none of that.
    • Likely done to ensure enterprises weren't procuring an undue share of Star Citizen Sabre Raven codes!
    • Kidding, but what % of INTL revenue is from enterprise vs. consumer? I don't know, but surely if I were INTL I would want to protect my enterprise product offer, and the substantial premium it commands.
      • P4800x 375GB = $1k+
      • 900p 280 GB = ~$280
    • Its true actually ...
    Code:
    “Additionally, the Product will not be subject to this Limited Warranty if used in: (i) any compute, networking or storage system that supports workloads or data needs of more than one concurrent user or one or more remote client device concurrently; (ii) any server, networking or storage system that is capable of supporting more than one CPU per device; or (iii) any device that is designed, marketed or sold to support or be incorporated into systems covered in clauses (i) or (ii).”
    • Well, more precisely 2Gbp = 250 MB/s, but yes.
    • And, yes, that sounds horrible. That is the same as 2 x 1G links! If that's all she's got, what is the need for 10G networking? (rhetorical and useless comment)
    • From an ideological perspective, lack of replication speed seems so nonsensical to me (given ZFS focus on data integrity which also suggests data backup). How many times have I had a pool issue and been presented with the dreaded "RESTORE FROM A BACKUP" comment via CLI? WELL, WHAT BACKUP, I COULD NEVER REPLICATE MY ZETABYTE (as in Z of ZFS) VIA REPLICATION? argh!!!
    • push = 2690 v2 (3.0 Ghz base)
    • pull = 2680 v2 (2.8 Ghz base)
     
    #9
  10. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    Lol, that's your imagination running wild, not mine;)

    Hm, good to know. So have to correct my (still relatively good) image of them (again) ...

    Actually its not that big a deal to speed that up, just the process initiated via the freenas Gui is horribly slow

    https://forums.servethehome.com/ind...acheive-smokin-zfs-send-recv-transfers.13988/
     
    #10
  11. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    • LOL ;)
    • Interesting, eh? I'm surprised this is news to you and don't know if you were joking about not knowing or not. ;) Maybe it is time to correct my (sterling) image of you (hopefully not)? :cool:
    • But as to the warranty, as asserted in my original comment, and seconded by yourself, there is no way for INTL to ever confirm whether a drive was used in a 4P server or desktop. All would all come down to LBAs Written ...
    • That warranty verbiage is targeted towards enterprise and not the homelab. A consumer RMA for Optane used in a server would have no issue (save for above); however, I could see an enterprise RMA for potentially dozens of consumer drives raising an eyebrow.
    • I don't know that it should have any impact on your image: (a) they have an enterprise product offer and (b) consumer product offer. Those offers are tailored accordingly - its just business. And especially relevant here: While 100% of the members of this forum agree that all consumers deserve 500k IOPS and sub 10 us latency, what % of the consumer base actually needs that kind of performance?
    • Why is it horribly slow if all the middleware is doing is executing (similar) commands in the background, anyway?
    • Understood and perhaps I illustrated my point poorly - slightly frustrated and sorry for the rant.
    • Perhaps the answer to my frustration isn't as clean and automated as a GUI based solution, but I'm not sure how fair it is to compare a command entered 1 time to a global replication schema to keep two pools in sync via snapshots held for varying periods of time. Surely it can and has been done (replication scheme via CLI only), but I want to be able to blame someone else (and not myself) when I need to restore from backup and that backup is borked.
     
    #11
  12. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    Well I knew they had different policies for enthusiast and enterprise, just was not aware that they explicitly stated so (and limiting to that extent), since that implies that a lot of actual enthusiasts (dual cpu for home use) are not eligible (per se), and more than one user can be run on almost any system (expect Win Home edition I assume) nowadays (theoretically)

    And the commands are not the same, the 'speedy' (not to say bursty;)) solution uses zfs send|receive and netcat while the official solution uses ssh to transfer the data from one box to the other. The latter is more secure but inherently slower
     
    #12
  13. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    • Joking with you o/c!
    • Agree with your remarks, but I wouldn't put much thought into the actual warranty verbiage as its entirely unenforceable (and I don't believe it was intended to be). But I'll bite ...
      • That warranty, written as is, would dissuade "thrifty" corporate IT procurement (a material % of INTL revenue). Which I believe is the purpose of how it is written.
      • I'd love to know the % of home users that have DP systems (it has to be low single digits, if even 1%, and an immaterial % of INTL revenue). Beyond that, what % of those home users with DP systems are the "Original Purchaser" of an Optane drive? The warranty only applies to the "Original Purchaser". I don't have the statistics, but the fact that a code for a silly spaceship was included with consumer Optane drives suggests INTL doesn't think that many and asserts my point. (Gamers are going to go for 1P systems with high clocks, right?) Further, many of those building DP systems are going to look to the secondary market for procurement. That same system builder throwing 2 "retired" Xeons in a workstation is going to be more inclined to drop a second hand Optane drive in their build.
      • Regarding the number of users, it specifies "concurrent" users. I think you caught that, thus your "theoretically" remark. ;)
    • In summary, I wouldn't let his knowledge update keep you up tonight ;), as I don't believe INTL is out to screw you over. In their eyes, you don't even exist (on their income statement). And I don't believe the "niche" group that you (we) represent as a consumer presents any threat, either.
    • Understood, thus my comment "(similar) commands" ;) but my point was the middleware could execute the same commands if designed for such, eh?
    • I'll take busty(•_ㅅ_•)on my secure LAN anytime. Er bursty/speedy, whatever.

    And generally regarding this convo = :)
     
    #13
  14. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    @Rand__ I wanted you to see just how nasty that 40G direct connection looks, before I throw in the towel on this.

    In summary: The 10G port looks OK @ 9.73 Gbps (97.3% of theoretical max speed) with 3,780 retransmits and the 40G port looks like $hit @ 14.0 Gbps (35% of theoretical max) with 11,547 retransmits. I'm sure in time, I figure out what is causing those 3,780 retransmits (but I ran the maths and they only reduce bitrate by 0.15 Gbps). The same settings were in play on both hosts and for both iperf3 runs.

    The -V switch provides a CPU utilization summary; however, I'm unsure as to what it represents, but it does seem very low on the remote for the crippled 40G direct port (makes sense I suppose).

    I have achieved higher bitrates on the direct connection by playing with the tunables; however, that comes with the cost of many more retransmits.

    Looking forward:
    • I wonder if I should buy 2 more QSFP to SFP+ adapters and two more DACs and abandon the direct connection. I was a nice idea to have a high bandwidth link reserved just for FreeNAS replication, but given lack of replication performance there I'm not sure how relevant that idea is any longer.
    • I could create 2 x Dynamic LAG (LACP) on the ICX-6450, 1 per host, and it seems so easy I won't have to bother @fohdeesha.
    • Having not gone down this path before, let me ask the stupid question, does that accomplish anything? I think I would get 20G per host of bandwidth, lower CPU utilization due to load balancing and not having retransmits, and fault tolerance to boot, right?
    lag ESXi-01_LAG dynamic
    ports ethernet 1/2/1 to 1/2/2
    primary-port 1/2/1
    deploy

    lag ESXi-02_LAG dynamic
    ports ethernet 1/2/3 to 1/2/4
    primary-port 1/2/3
    deploy

    Port 1 = Switched @ 10G (1st graph below)
    • BITRATE = 9.73 Gbits/sec
    • TOTAL RETR = 3,780
    • CPU Utilization: local/sender 51.2% (2.8%u/48.3%s), remote/receiver 42.1% (4.1%u/38.0%s)
    Port 2 = Direct @ 40G (2nd graph below)
    • BITRATE = 14.0 Gbits/sec
    • TOTAL RETR = 11,547
    • CPU Utilization: local/sender 61.0% (1.7%u/59.4%s), remote/receiver 13.9% (1.0%u/12.8%s)
    Graphs
    I ran iperf3 with no switches save for -t 60 and -V, took the output and graphed BITRATE on the left Y-axis and the RETR field on the right Y-axis. That apparently is retransmitted TCP segments lost due to congestion or corruption.

    [​IMG]
    [​IMG]
     
    #14
  15. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    Ok, thats FreeNas to FreeNas box I assume?
    Does iperf3 show retransmits out of the box or did you get these from sth else? Almost never use v3

    LACP will only be useful if you run multiple concurrent sessions, not to increase single session bandwith
     
    #15
  16. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    • Correct, I assigned that port to its own vSwitch and Port Group and added to the FreeNAS VM.
    • Static IP manually configured in FreeNAS with iface options: mtu 9000 rxcsum txcsum tso lro
    • Standard tunables IMO ... and to call out a couple of points:
    1. This site suggests that sysctl kern.ipc.maxsockbuf=157286400 (my value = 16777216), but as stated when I started to get more aggressive with the tunables, retransmits increased quite a bit. Also, I wanted to see what stable tunables for the switched port produced on the direct port.
    2. This site suggests additional tunables are needed (CX-3 in focus), but AFAIK they don't exist in FreeBSD, only Linux. Still the MLXNET tuning mentioned is missing without the driver I believe.
    The one item I can't reconcile =
    • Early on, I created a vmkernel just for this NIC so I could assign it a static IP.
    • ~34 Gbps = ESXi <=> ESXi without any issue.
    • To me (and I'm sure I'm missing numerous items), that suggests the "raw performance" is there, even without OFED; however, some unknown ESXi network setting is not optimal for the direct connection. But as stated, its not a bottleneck, more of an annoyance / curiosity than anything (and I've already burned way to much time trying to figure this out).
    • Correct - retransmissions are shown standard.
      • --logfile file send output to a log file. (new in iPerf 3.1) Code tags, embedded in spoilers below, present the logfile should you wish to peruse.
      • + -V, --verbose give more detailed output
      • = a bit more detail such as cpu util / congestion control algo used / etc (iPerf - iPerf3 and iPerf2 user documentation)
      • = pulled the log into Excel and graphed it (like jperf for dummies who can't figure out how to use jperf, i.e. me)
    • Man, and here I thought newer was better (I have that "old tech" dogma stuck in my head). I did notice that iperf and iperf3 produced different results and that wasn't due to any "noise" as I ran each test from ESXi-01 as server and then again from ESXi-02 as server, and that redundant testing showed very little variance between the two iperf3 tests.
    • But if we don't need the latest release, that means we can go back to 6.5, load up OFED and actually get some nice numbers. I actually considered it just to see.
    • I wonder if you can flip back to 6.5 and attach a 6.7 Host Profile to it, without issue? I suspect not, but that approach would make that test quite easy.
    Code:
    iperf 3.5
    FreeBSD FreeNAS-01 11.2-STABLE FreeBSD 11.2-STABLE #0 r325575+fc3d65faae6(HEAD): Thu Dec 20 16:12:30 EST 2018     root@nemesis.tn.ixsystems.com:/freenas-releng-final/freenas/_BE/objs/freenas-releng-final/freenas/_BE/os/sys/FreeNAS.amd64 amd64
    Time: Thu, 03 Jan 2019 11:46:55 GMT
    Connecting to host 10.0.0.52, port 5201
          Cookie: 2kgpnuuhz3ddivvozdrw2b2rytcabtfiydhc
          TCP MSS: 8960 (default)
    [  6] local 10.0.0.51 port 42341 connected to 10.0.0.52 port 5201
    Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 60 second test, tos 0
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  6]   0.00-1.00   sec  1.16 GBytes  9.94 Gbits/sec    0   3.77 MBytes     
    [  6]   1.00-2.00   sec  1.15 GBytes  9.91 Gbits/sec    0   5.30 MBytes     
    [  6]   2.00-3.00   sec  1.15 GBytes  9.92 Gbits/sec    0   6.84 MBytes     
    [  6]   3.00-4.00   sec  1.15 GBytes  9.89 Gbits/sec    0   9.01 MBytes     
    [  6]   4.00-5.00   sec   744 MBytes  6.24 Gbits/sec  217   1.01 MBytes     
    [  6]   5.00-6.00   sec  1.15 GBytes  9.86 Gbits/sec    3   3.66 MBytes     
    [  6]   6.00-7.00   sec  1.15 GBytes  9.89 Gbits/sec    0   5.25 MBytes     
    [  6]   7.00-8.00   sec  1.15 GBytes  9.87 Gbits/sec    0   6.51 MBytes     
    [  6]   8.00-9.00   sec  1.15 GBytes  9.87 Gbits/sec    0   8.51 MBytes     
    [  6]   9.00-10.00  sec  1.15 GBytes  9.88 Gbits/sec    0   10.9 MBytes     
    [  6]  10.00-11.00  sec  1.15 GBytes  9.89 Gbits/sec    0   13.9 MBytes     
    [  6]  11.00-12.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  12.00-13.00  sec  1.15 GBytes  9.88 Gbits/sec    0   14.0 MBytes     
    [  6]  13.00-14.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  14.00-15.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  15.00-16.00  sec  1.15 GBytes  9.90 Gbits/sec    0   14.0 MBytes     
    [  6]  16.00-17.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  17.00-18.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  18.00-19.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  19.00-20.00  sec  1.14 GBytes  9.83 Gbits/sec    0   14.0 MBytes     
    [  6]  20.00-21.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  21.00-22.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  22.00-23.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  23.00-24.00  sec  1.15 GBytes  9.90 Gbits/sec    0   14.0 MBytes     
    [  6]  24.00-25.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  25.00-26.00  sec  1.15 GBytes  9.88 Gbits/sec    0   14.0 MBytes     
    [  6]  26.00-27.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  27.00-28.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  28.00-29.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  29.00-30.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  30.00-31.00  sec  1.15 GBytes  9.88 Gbits/sec    0   14.0 MBytes     
    [  6]  31.00-32.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  32.00-33.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  33.00-34.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  34.00-35.00  sec   941 MBytes  7.89 Gbits/sec  1560   7.66 MBytes     
    [  6]  35.00-36.00  sec  1.15 GBytes  9.90 Gbits/sec    0   8.52 MBytes     
    [  6]  36.00-37.00  sec  1.15 GBytes  9.89 Gbits/sec    0   9.33 MBytes     
    [  6]  37.00-38.00  sec  1.15 GBytes  9.89 Gbits/sec    0   10.8 MBytes     
    [  6]  38.00-39.00  sec  1.15 GBytes  9.89 Gbits/sec    0   13.0 MBytes     
    [  6]  39.00-40.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  40.00-41.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  41.00-42.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  42.00-43.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  43.00-44.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  44.00-45.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  45.00-46.00  sec  1.15 GBytes  9.90 Gbits/sec    0   14.0 MBytes     
    [  6]  46.00-47.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  47.00-48.00  sec  1.14 GBytes  9.83 Gbits/sec    0   14.0 MBytes     
    [  6]  48.00-49.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  49.00-50.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  50.00-51.00  sec   746 MBytes  6.26 Gbits/sec  2000    350 KBytes     
    [  6]  51.00-52.00  sec  1.11 GBytes  9.55 Gbits/sec    0   3.69 MBytes     
    [  6]  52.00-53.00  sec  1.15 GBytes  9.89 Gbits/sec    0   5.25 MBytes     
    [  6]  53.00-54.00  sec  1.15 GBytes  9.89 Gbits/sec    0   6.90 MBytes     
    [  6]  54.00-55.00  sec  1.15 GBytes  9.88 Gbits/sec    0   9.12 MBytes     
    [  6]  55.00-56.00  sec  1.15 GBytes  9.89 Gbits/sec    0   11.6 MBytes     
    [  6]  56.00-57.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  57.00-58.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  58.00-59.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    [  6]  59.00-60.00  sec  1.15 GBytes  9.89 Gbits/sec    0   14.0 MBytes     
    - - - - - - - - - - - - - - - - - - - - - - - - -
    Test Complete. Summary Results:
    [ ID] Interval           Transfer     Bitrate         Retr
    [  6]   0.00-60.00  sec  68.0 GBytes  9.73 Gbits/sec  3780             sender
    [  6]   0.00-60.01  sec  68.0 GBytes  9.73 Gbits/sec                  receiver
    CPU Utilization: local/sender 51.2% (2.8%u/48.3%s), remote/receiver 42.1% (4.1%u/38.0%s)
    snd_tcp_congestion htcp
    rcv_tcp_congestion htcp
    
    iperf Done.

    Code:
    iperf 3.5
    FreeBSD FreeNAS-01 11.2-STABLE FreeBSD 11.2-STABLE #0 r325575+fc3d65faae6(HEAD): Thu Dec 20 16:12:30 EST 2018     root@nemesis.tn.ixsystems.com:/freenas-releng-final/freenas/_BE/objs/freenas-releng-final/freenas/_BE/os/sys/FreeNAS.amd64 amd64
    Time: Thu, 03 Jan 2019 11:55:28 GMT
    Connecting to host 10.2.0.52, port 5201
          Cookie: 4inwi4ie4ri2s6eobtnbq2n5hd2yvbhjsvjk
          TCP MSS: 8960 (default)
    [  6] local 10.2.0.51 port 53469 connected to 10.2.0.52 port 5201
    Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 60 second test, tos 0
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  6]   0.00-1.00   sec  1.39 GBytes  11.9 Gbits/sec  291   1.35 MBytes    
    [  6]   1.00-2.00   sec  2.06 GBytes  17.7 Gbits/sec  423    988 KBytes    
    [  6]   2.00-3.00   sec  1.47 GBytes  12.6 Gbits/sec  362    804 KBytes    
    [  6]   3.00-4.00   sec  2.19 GBytes  18.8 Gbits/sec  245   1.88 MBytes    
    [  6]   4.00-5.00   sec  1.24 GBytes  10.7 Gbits/sec  104   1.02 MBytes    
    [  6]   5.00-6.00   sec  1.55 GBytes  13.3 Gbits/sec  102   1.27 MBytes    
    [  6]   6.00-7.00   sec  1.86 GBytes  16.0 Gbits/sec  161   1.35 MBytes    
    [  6]   7.00-8.00   sec  1.53 GBytes  13.1 Gbits/sec  173   1.13 MBytes    
    [  6]   8.00-9.00   sec  2.24 GBytes  19.2 Gbits/sec  120   1.57 MBytes    
    [  6]   9.00-10.00  sec  1.55 GBytes  13.3 Gbits/sec  133   3.55 MBytes    
    [  6]  10.00-11.00  sec  1.96 GBytes  16.8 Gbits/sec  475   1.09 MBytes    
    [  6]  11.00-12.00  sec  1.74 GBytes  15.0 Gbits/sec  102    673 KBytes    
    [  6]  12.00-13.00  sec  1.58 GBytes  13.5 Gbits/sec  103    585 KBytes    
    [  6]  13.00-14.00  sec  1.06 GBytes  9.17 Gbits/sec   43    997 KBytes    
    [  6]  14.00-15.00  sec  1.70 GBytes  14.6 Gbits/sec  332    691 KBytes    
    [  6]  15.00-16.01  sec  1.15 GBytes  9.86 Gbits/sec   52    271 KBytes    
    [  6]  16.01-17.00  sec  1.05 GBytes  9.08 Gbits/sec  364    657 KBytes    
    [  6]  17.00-18.00  sec  1.88 GBytes  16.1 Gbits/sec   26   3.98 MBytes    
    [  6]  18.00-19.00  sec  2.33 GBytes  20.0 Gbits/sec  400   4.88 MBytes    
    [  6]  19.00-20.00  sec   917 MBytes  7.69 Gbits/sec  260   1.70 MBytes    
    [  6]  20.00-21.00  sec  1.48 GBytes  12.7 Gbits/sec  173   1.84 MBytes    
    [  6]  21.00-22.01  sec  1.48 GBytes  12.7 Gbits/sec  267    454 KBytes    
    [  6]  22.01-23.00  sec  1.52 GBytes  13.1 Gbits/sec  217    901 KBytes    
    [  6]  23.00-24.00  sec  1.95 GBytes  16.7 Gbits/sec  189    665 KBytes    
    [  6]  24.00-25.00  sec  1.47 GBytes  12.6 Gbits/sec  148    280 KBytes    
    [  6]  25.00-26.00  sec  1.96 GBytes  16.8 Gbits/sec   76   3.75 MBytes    
    [  6]  26.00-27.00  sec  1.15 GBytes  9.82 Gbits/sec  264    551 KBytes    
    [  6]  27.00-28.00  sec  1.11 GBytes  9.61 Gbits/sec  179    507 KBytes    
    [  6]  28.00-29.00  sec  1.74 GBytes  14.9 Gbits/sec   39   1.35 MBytes    
    [  6]  29.00-30.00  sec  2.09 GBytes  18.0 Gbits/sec  113   1.79 MBytes    
    [  6]  30.00-31.00  sec  1.57 GBytes  13.5 Gbits/sec  187   1.20 MBytes    
    [  6]  31.00-32.00  sec  1.61 GBytes  13.9 Gbits/sec  118   1015 KBytes    
    [  6]  32.00-33.00  sec  1.50 GBytes  12.8 Gbits/sec  116    822 KBytes    
    [  6]  33.00-34.00  sec  1.62 GBytes  13.9 Gbits/sec  134    621 KBytes    
    [  6]  34.00-35.00  sec  1.67 GBytes  14.4 Gbits/sec   70   1.01 MBytes    
    [  6]  35.00-36.00  sec  1.54 GBytes  13.2 Gbits/sec   28   3.44 MBytes    
    [  6]  36.00-37.00  sec  1.56 GBytes  13.4 Gbits/sec  280    743 KBytes    
    [  6]  37.00-38.00  sec  1.85 GBytes  15.9 Gbits/sec  135   2.48 MBytes    
    [  6]  38.00-39.00  sec  1.66 GBytes  14.3 Gbits/sec  269   1.13 MBytes    
    [  6]  39.00-40.00  sec   931 MBytes  7.81 Gbits/sec   37   1.91 MBytes    
    [  6]  40.00-41.00  sec  1.66 GBytes  14.2 Gbits/sec  175   1006 KBytes    
    [  6]  41.00-42.00  sec  1.56 GBytes  13.4 Gbits/sec  117   1.46 MBytes    
    [  6]  42.00-43.00  sec  1.91 GBytes  16.4 Gbits/sec  243   1.07 MBytes    
    [  6]  43.00-44.00  sec  1.45 GBytes  12.4 Gbits/sec  186   1.62 MBytes    
    [  6]  44.00-45.00  sec  1.69 GBytes  14.5 Gbits/sec  141    945 KBytes    
    [  6]  45.00-46.00  sec  1.24 GBytes  10.6 Gbits/sec   43   2.40 MBytes    
    [  6]  46.00-47.00  sec  1.78 GBytes  15.3 Gbits/sec  362    691 KBytes    
    [  6]  47.00-48.00  sec  1.09 GBytes  9.40 Gbits/sec   12   3.11 MBytes    
    [  6]  48.00-49.00  sec  1.23 GBytes  10.5 Gbits/sec  475   1.14 MBytes    
    [  6]  49.00-50.00  sec  2.65 GBytes  22.8 Gbits/sec  477   3.54 MBytes    
    [  6]  50.00-51.00  sec  2.24 GBytes  19.3 Gbits/sec  468   2.19 MBytes    
    [  6]  51.00-52.00  sec  1.32 GBytes  11.3 Gbits/sec  198    857 KBytes    
    [  6]  52.00-53.00  sec  1.18 GBytes  10.2 Gbits/sec  145   1.27 MBytes    
    [  6]  53.00-54.00  sec  2.49 GBytes  21.4 Gbits/sec  131    875 KBytes    
    [  6]  54.00-55.00  sec  1.44 GBytes  12.3 Gbits/sec   43   2.00 MBytes    
    [  6]  55.00-56.00  sec  2.30 GBytes  19.7 Gbits/sec  117   1.92 MBytes    
    [  6]  56.00-57.00  sec  1.42 GBytes  12.2 Gbits/sec  259   1.43 MBytes    
    [  6]  57.00-58.00  sec  1.34 GBytes  11.5 Gbits/sec  185   2.55 MBytes    
    [  6]  58.00-59.00  sec  1.56 GBytes  13.4 Gbits/sec  108    131 KBytes    
    [  6]  59.00-60.00  sec  2.04 GBytes  17.5 Gbits/sec  352   2.83 MBytes    
    - - - - - - - - - - - - - - - - - - - - - - - - -
    Test Complete. Summary Results:
    [ ID] Interval           Transfer     Bitrate         Retr
    [  6]   0.00-60.00  sec  97.4 GBytes  14.0 Gbits/sec  11547             sender
    [  6]   0.00-60.00  sec  97.4 GBytes  13.9 Gbits/sec                  receiver
    CPU Utilization: local/sender 61.0% (1.7%u/59.4%s), remote/receiver 13.9% (1.0%u/12.8%s)
    snd_tcp_congestion htcp
    rcv_tcp_congestion htcp
    
    iperf Done.

    • $hit thats right, its been a while since I've had any exposure to it, I'm such a newb.
    • I suppose my thinking is flawed entirely here, i.e. I was thinking that putting it on the switch would eliminate the retransmits (and it would if they existed in actual use and not just a benchmark), but that is only because it is limiting the bitrate. But that theoretical bitrate won't ever be seen anyway, so I'm trying to solve an issue that doesn't exist?
    • Further, there is more than one way to limit the bitrate, most easily by doing so via ESXi to 10G.
    • Running these to the switch doesn't achieve anything other than burning my last 2 SFP+ ports, but perhaps once I figure out how to migrate to a dvswitch having 2 uplinks to each host is preferable?
    Thanks for your reply / assistance as always.
     
    #16
  17. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    Indeed weird. Could be one of mine;)

    1. Have you tried without rxcsum/txcsum? There were some issues around either of those with CX3's I think
    2. In https://forums.servethehome.com/index.php?threads/fighting-new-mellanox-connectx-3-setup.23009/ sombody reported they got the very old 1.8.2 driver going for ESXi, you could give that a try
    3. Have you tried passing a virtual card to FreeNas directly - or the whole card if you can run on your onboard 10g? Not sure that fit's your requirements though

    I don't know whether you can attach a 6.7 profile to 6.5, I'd think rather not. Theoretically if you have not used any of the new functionality, but who knows what they did under the hood.

    New tech vs old tech - that totally depends on you ( or rather the manufacturers) point of view... it has advantages for the creator and if you are lucky then you benefit too, but ...

    2 Uplinks ... well it won't hurt except a little more power usage. Gives you redundancy and more potential bandwith if you run two+ streams concurrently (freenas sync + vmotion or FT etc) :)
     
    #17
  18. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    316
    Likes Received:
    64
    • Sorry didn't follow that comment. Are you talking about the links provided? If so, they were not to your post, but separately that is where I got the "rxcsum txcsum tso lro" from.
    • Actually yes, I tested the same results I presented to you, WITHOUT the addition of the interface options: "rxcsum txcsum tso lro"
    • On 10G, 7.18 Gbits/sec / 27,830 Retr, then I added "rxcsum txcsum tso lro" = 9.73 Gbits/sec / 3780 Retr. So, those options resulted in more speed and less retransmissions.
    • Didn't test on 40G in same manner.
    • I think I will find some time this w/e to try. Thanks!
    • I did actually. I had issues with the port being detected as "up" (Media Status = Down).
    • Knowning FreeNAS only has love for Chelsio, I decided it wasn't worth the time to try and test further.
    • I agree likely not. With our luck, I'd end up losing my backup pool somehow thanks to ESXi throwing a fit. While theoretically impossible for that destruction to occur, I'm sure I could make it happen. ;)
    • Oh no, it was a joke I thought you would get ... your old tech comment regarding NVMe = prefence for new,
    • so I was suggesting I'd follow that ideology and use iperf3 and not iperf.
    • Anyway, speaking of new tech ... Going CX-4 could be an option ... grab these => Mellanox IBM 46M2201 Dual Port ConnectX 4X 2 por PCI-E card | eBay I'd have to research if they can be crossflashed etc.
    • Put existing on ebay.
    • Any change would have to be a Rand__ validated design o/c ;); however, I think it makes sense however you slice up that 10G + 10G, as while I need 1 of my integrated INTL AT-2 NICs for pfSense WAN, I could direct connect the other directly @ 10G.
    • I like the idea of a reserved direct connection for FreeNAS replication, but considering I'll never replicate beyond 2 Gbps, if I only had 20G (10G + 10G), I think it would make more sense to forgo the direct connection and go switched for everything.
    • But it sounds like I can have my cake and eat it too by getting to 30G. I totally see where all of this is headed, I'm going to end up with an ICX-6610 instead of an ICX-6450 LOL.
    But in summary, I'm happy with the config as is ... maybe I do two uplinks + INTL direct ... maybe CX-4 ... maybe as is. This was just one of those things that you put a bunch of time into and it frustrates the $hit out of you trying to figure it out (at least I've learned a bit).

    Separately, I'm happy to report that replication is moving along nicely. I totally did not realize that the fastest approach was to use nc or mbuffer and THEN set up replication in FreeNAS. I know you made a point to call out nc / mbuffer (and I've use it before, I just didn't realize that could be used to seed replication). I didn't connect the dots and you practically poked me in the eye with them. :) So I've been doing that for each dataset, with some really nice speeds, and then turning on replication with the following schema:
    1. Interval = 5 min | TTL = 1 hour
    2. Interval = 1 hour | TTL = 1 day
    3. Interval = 1 day | TTL = 1 week
    4. Interval = 1 week | TTL = 1 month
    So far 20TiB in sync, near real time @ 5 min. Dozens of TiB more to go ...
     
    #18
  19. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    Quick reply only since on the road.
    Dont buy those "cx4" they are 3's in truth ( most likely)
     
    #19
  20. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    3,435
    Likes Received:
    487
    No, the weirdness of the problem which might be one of my weird problems;)

    Individually or all? I think it was just one which was not working properly.

    Hm yeah does not look as easy as it should.

    Got it, no worries. Just a case of me being a knowitall and commenting on obvious stuff;)
    Those have CX3 level connectors, not the smoother CX4/5 level ones, thats why I don't think they are cx4. Also very very cheap for cx4.

    Lots of (relatively) cheap 40G options out there nowadays :)
    Glad you are getting there:)
     
    #20
Similar Threads: Mellanox ConnectX-3
Forum Title Date
VMware, VirtualBox, Citrix Proxmox issues with SR-IOV and Mellanox Connectx-3 Dec 20, 2015
VMware, VirtualBox, Citrix ESXi + XPEnology + Mellanox performance? Dec 11, 2018
VMware, VirtualBox, Citrix Mellanox ConnectX-2 and ESXi 6.5 Oct 13, 2017
VMware, VirtualBox, Citrix Mellanox unhappy w/ vtD passthru in vSphere Nov 24, 2016
VMware, VirtualBox, Citrix Mellanox ConnectX-2 and ESXi 6.0 - Barely Working - Terrible Performance Nov 7, 2016

Share This Page