Can't Get Over 400MB/s On One Workstation...

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

mattr

Member
Aug 1, 2013
120
11
18
I have a bunch of machines all connected to a US-16-XG. They are all using Supermicro AOC-STGN cards. They all hit 700-800MB/s consistently except the Xeon E3-1285v6/X11SSM-F machine. Copying files to the file servers is my concern here. The Xeon E3 machine is stuck at 400MB/s. I've swapped PCIe slots and tried every combination of 10G NICs/SFP/DACs all with the same results. I know the combination of hardware I'm using works since one of the file servers is using the same board and NIC. The only difference with that machine is that the boot drive is a nvme installed in a PCIe to nvme card. However, I can't see how that would impact throughput. Could just having a PCIe to nvme card installed really be causing a PCIe lane bottleneck? Aren't the onboard SATA lanes sharing that bandwidth anyway? So in theory I'd be seeing the same slowness on the File Server with the same motherboard? I've been swapping hardware around for months trying to get this machine up to speed with no luck. Any ideas?


WorkstationXeon E3-1285v6Supermicro X11SSM-F
Plexi5-8600KSupermicro X11SCA
Workstationi7-6800KASRock X99 Extreme4
File Serveri3-6100TSupermicro X11SSM-F
File ServerXeon L5630Supermicro X8DTH-6F
ESXiDual E5-2650Lv3Dell
Workstationi5-9600KASRock Z370 Extreme4
 

uldise

Active Member
Jul 2, 2020
209
72
28
try to boot some Live linux with or without that nvme drive and compare iperf3 results.
 

jdnz

Member
Apr 29, 2021
81
21
8
I presume that’s 400mb/s on file copy - have you tested the individual elements involved separate (i.e. iperf3 the lan side to make sure you can get ~1gbps on that card in that board, and then benchmark the ssd only to make sure you’re not having link negotiation issues with the nvme)

the intel c236 on that supermicro board only has 20pcie lanes - the stgn will be asking for 8, the nvme could be asking for up to 4, I presume you also have a GPU of some sore plugged in, do you know how many lanes it’s asking for?
 

mattr

Member
Aug 1, 2013
120
11
18
I don't have a GPU but I have a 9341-8i installed. All components were tested independently. I've tested with a Sabrent 1TB Rocket NVMe and a Samsung 970 PRO SSD 512GB both of which test well over 2000MB/s read and write. I swapped the NIC into the other workstation to make sure it was capable. Also tried an x520. Tried with SFP+/fiber as well as multiple DAC.

The file server has a AOC-STGN and a 9211-8i installed on the same X11SSM board and hits 800MB/s with ease. The only real difference is the file server boots from a RAID1 pair of SATA DOMs on the powered onboard SATA ports. The workstation boots from the PCIe nvme. I still need to test a live image without the nvme installed like uldise suggested to see if that nvme is somehow locking up the lanes needed.
 

jdnz

Member
Apr 29, 2021
81
21
8
the 9341-8i is x8 same as the stgn - so in combination with the nvme card and anything else allocating pcie lanes could be causing a drop in speed due to boards getting less lanes than they are asking for

can you test with the raid card pulled? Other option would be to move the 10gbe card from a true x8 slot to to the x4-in-x8 slot - those cards only need full x8 if you're running BOTH ports, you can run a single port just fine on x4 (I'm doing exactly that myself with a stg- i2t)
 

pod

New Member
Mar 31, 2020
15
7
3
By my probably poor count, the AOC-STGN is running as an X1 at its native gen2 speed of 4Gb. Processor supports 16 lanes, 9341 takes 8, motherboard uses 3(11used) for lan/bmc. NVME device/card uses 4?(15 used). Lots of bandwidth left but underused lanes. Running from sata vs nvme, or a gen3 10gb card probably simplest(cheapest) solutions. Or I could be loony.
 

Scarlet

Member
Jul 29, 2019
86
38
18
By my probably poor count, the AOC-STGN is running as an X1 at its native gen2 speed of 4Gb.
This could be it. Try checking the actual link width of your PCIe Cards. Also check the manual of your supermicro mainboard, they are usually quite detailed about which PCIe lane is shared by what devices / slots.
Here's an example of a ConnectX-2 (PCIe 2.0 x8), note the LnkCap Section that describes what the adapter is able to do and the LnkSta that describes the actual configuration.

Bash:
# lspci -s 03:00.0 -vvv
03:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
        Subsystem: Mellanox Technologies Device 0015
        Physical Slot: 0-1
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 27
        NUMA node: 0
        Region 0: Memory at fb200000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at f9800000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at fb100000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Vital Product Data
                Product Name: ConnectX-2 SFP+
                Read-only fields:
                        [PN] Part number: MNPA19-XTR
                        [EC] Engineering changes: A2
                        [SN] Serial number: MT1140X00835
                        [V0] Vendor specific: PCIe Gen2 x8
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                Read/write fields:
                        [V1] Vendor specific: N/A
                        [YA] Asset tag: N/A
                        [RW] Read-write area: 105 byte(s) free
                End
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
                Vector table: BAR=0 offset=0007c000
                PBA: BAR=0 offset=0007d000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [148 v1] Device Serial Number 00-02-c9-03-00-4f-a4-40
        Kernel driver in use: mlx4_core
        Kernel modules: mlx4_core
 

mattr

Member
Aug 1, 2013
120
11
18
So I removed the 9341-8i so the nvme and AOC-STGN were the only PCIe devices. The NIC only needs 2 channels to reach 10G and the nvme needs 3~4 channels to reach its max speed. The PCIe to NVME adapter only has 4 pcie lanes.

What's killing me here is that I have another machine with the same motherboard, same NIC, slower HBA and its hitting 10G with no issue.
 

jdnz

Member
Apr 29, 2021
81
21
8
So I removed the 9341-8i so the nvme and AOC-STGN were the only PCIe devices. The NIC only needs 2 channels to reach 10G and the nvme needs 3~4 channels to reach its max speed. The PCIe to NVME adapter only has 4 pcie lanes.

What's killing me here is that I have another machine with the same motherboard, same NIC, slower HBA and its hitting 10G with no issue.
is it a single of dual port aoc-stgn? remember it's old (intel x520 chipset) and hence pcie2.0 - that's why it needs 8 channels to drive two 10gbe ports

next I'd try iperf3 from windows - current win64 builds are here ( Home • Files.Budman.pw - referenced from Iperf 3.10.1 Windows build )
 

mattr

Member
Aug 1, 2013
120
11
18
is it a single of dual port aoc-stgn? remember it's old (intel x520 chipset) and hence pcie2.0 - that's why it needs 8 channels to drive two 10gbe ports

next I'd try iperf3 from windows - current win64 builds are here ( Home • Files.Budman.pw - referenced from Iperf 3.10.1 Windows build )
Its a single port. I've tried with a dual port x520 and and dual port AOC-STGN. I currently have the NIC in a x8 CPU PCIe slot and the nvme in a x4 PCH PCIe slot. There shouldn't be any PCIe bottlenecks.
 

mattr

Member
Aug 1, 2013
120
11
18
did you test with iperf3? to eliminate other factors..
I tried iperf3 to and from multiple machines and they all max out at about 100MB/s.... I'm using the default settings... Even on my main workstation where I'm getting 800+MB/s.
 

jdnz

Member
Apr 29, 2021
81
21
8
I tried iperf3 to and from multiple machines and they all max out at about 100MB/s.... I'm using the default settings... Even on my main workstation where I'm getting 800+MB/s.
that's not right - what is the machine you're using to run the server end, and what command line options are you using?
 

mattr

Member
Aug 1, 2013
120
11
18
iperf3 -s

iperf3 -c remotehost -i 1 -t 30

Just the default commands from the documentation.

I've done Windows to TrueNAS and TrueNAS to Windows with 3 different Windows machines and 2 different TrueNAS servers. Actual file copies between all of those machines (except the problem child) hits 700-800MB/s consistently.
 

mattr

Member
Aug 1, 2013
120
11
18
Here is a screen shot from one of the Windows machines. I regularly copy files from this workstation to the TrueNAS server at 700-800MB/s.
 

uldise

Active Member
Jul 2, 2020
209
72
28
i see a 1Gbe Ethernet here.. have your Windows machine more than one NIC connected?