Dell C6100 and ConnectX-2 Mezzanine card runnin at pci-e x4 not x8

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Well I am having trouble with finding any answers to this problem. I have a number of the ConnectX-2 c6100 mezzanine cards that I have set up and working wonderfully in Windows Server 2012 r2 but the links are running at half speed, Fore example, when running large block reads and writes to ram disks I reach 1.4GB/s. i have had other ConnectX-2 cards work at 3.2GB/s in the same nodes. When i open up the device manager I find that they are running at x4 not x8 speeds....I have 2.10 firmware loaded up on them and rdma is running fine. Here is a copy of the device manager log.

My Log
Driver Version : 4.60.17718.0
Firmware Version : 2.10.720
Port Number : 1
Bus Type : PCI-E 5.0 Gbps x4
Link Speed : ----
Part Number : 0JR3P1
Device Id : 26428
Revision Id : B0
Current MAC Address : 00-24-E8-FF-91-0C
Permanent MAC Address : 00-24-E8-FF-91-0C
Network Status : Disconnected
Adapter Friendly Name : Ethernet 3
IPv4 Address : 192.168.3.2
Adapter User Name : 0xffff-IPoIB

For referance here is the image from the main site posting showing x8 speeds

.

Just to let you know what i have tried and failed with so far.
-changed out the mezzanine riser pci-e extension piece
-I have used full size (non-mezzanine card) and it works at full speed.
-firmware up to date, and most current drivers installed
-bios up to date.
-card is plugged into a voltaire 4036 with up to date firmware
-shows up as x4 when unplugged and plugged in.
 
Last edited:

Aluminum

Active Member
Sep 7, 2012
431
46
28
Random thought, are you sure that slot is x8? It may have the pins but not be wired for it.

The C6100 thread is like a billion pages so no idea if someone already posted a block diagram for the various boards.
 
Last edited:

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Yeah I am sure, per dell, per the thread, and per the main site page explaining both the c6100 and flashing firmware to these mezzanine cards. I don't suppose you or anyone else knows a way to query how many lanes a pci express slot is in windows or linux with no device installed or with a device installed. I struck out when trying to find a program to do that.
 

MiniKnight

Well-Known Member
Mar 30, 2012
3,072
973
113
NYC
To me if it is truly only an x4 then it barely can handle a 2x 10 gig E connection right? Seems overly constraining especially since the 5520 chipset has plenty of pcie lanes to make that an x8.
 

sag

Member
Apr 26, 2013
34
6
8
I installed 2012r2 on one of my nodes and this is what I am showing for my mezzanine card.

Driver Version : 4.60.17718.0
Firmware Version : 2.10.720
Port Number : 1
Bus Type : PCI-E 5.0 Gbps x8
Link Speed : 32.0 Gbps/Full Duplex
Part Number : 0JR3P1
Device Id : 26428
Revision Id : B0
Current MAC Address : 00-24-E8-FF-5D-51
Permanent MAC Address : 00-24-E8-FF-5D-51
Network Status : Connected
Adapter Friendly Name : Local Area Connection
IPv4 Address : 172.20.0.1
Adapter User Name : 0xffff-IPoIB


Have you tried installing 2012r2 on a different node? It could be that the node is bad. Don't really have to install it could just swap the HDD I think to another node that has the mezzanine card.
 

lmk

Member
Dec 11, 2013
128
20
18
Hey all,

By fluke, I happened to come across this post (via an entirely separate search on Google) and had to do a double-take to make sure I wasn't seeing things. I have been having the very same problem with some JR3P1 cards I got a few days, too!

Now the best (most important!) part is that the card WAS x8 and IS NOW x4. Specifically, another forum user owned it and had posted months ago a screen shot of the adapter properties in Windows showing x8. I have the same card in my C6100 and when checking adapter properties, it shows as x4.

We have been privately trying to figure out what the heck is going on, but don't have any leads to go on. I have tried virtually every option on BIOS, reseating, other QSFP cables, dual PSUs, etc, etc. I will be trying another couple nodes next and I am waiting on the original owner to send me a AIDA scan to compare how his C6100 shows the PCI Express slots and speeds.

This was discovered after spending days testing the cards and never being able to reach the +3000MB/s everyone else has shown in benchmarks (e.g. the Mellanox flashing firmware and RDMA with SMB 3.0 thread).

In the mean-time, in reply to Aluminum, I found this which shows the DELL Mezzanine slot and the cards - http://www.mellanox.com/pdf/prod_adapter_cards/ConnectX_IB_Magnesium_User_Manual_1_0.pdf

All documentation I found states the DELL Mezzanine slot is a PCI Express 2.0 x8 slot. Again, another user had the SAME card and it registered as an x8, while in my server it shows as x4.
 
Last edited:

lmk

Member
Dec 11, 2013
128
20
18
I found his screenshot from a few months ago (the reason for the older driver - could that be it?) and my screen shot (same card/MAC/etc).

How come I cannot post attachments/screenshots?

Also, can you download Trial AIDA64 Extreme (zip or installable, just make sure to run zip version after extracting all of it) in Windows and check PCI Devices in that. It should show the PCI Express slots and what speeds x4, x8, x16, etc.
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
When I get home from work tonight I will give aida64 a shot. I am glad I am not the only one with this issue. And I have multiple nodes acting up right now doing the same thing.
 

lmk

Member
Dec 11, 2013
128
20
18
Okay, let me know.

In the meantime, the usual obligatory details for all the nodes:

BIOS = 1.71
BMC = 1.33
FCB = PIC16 (I have to check what FW version)
CPUs = Dual L5639s
RAM = 24GB

Chassis disk bays = 24
 

Aluminum

Active Member
Sep 7, 2012
431
46
28
When I get home from work tonight I will give aida64 a shot. I am glad I am not the only one with this issue. And I have multiple nodes acting up right now doing the same thing.
HWinfo64 is another program that can dig up pretty good info about buses and such. I learned from it that socket 115x has an extra x4 2.0 lanes from the cpu that are not enabled by many boards. (buried deep in intel docs)

Your problem is not all that unique though, a lot of video card owners have similar issues. Since these boards probably don't have muxes like many consumer sli/xfire oriented boards do (#1 cause) its probably driver/os related (#2 cause).

I'm assuming you reseated it once to be sure though right? #3 cause ;)
 

lmk

Member
Dec 11, 2013
128
20
18
I booted all four nodes, 2 with Mellanox cards and 2 with the LSI SAS1068E cards.

All four were booted to CentOS 6.5 Live CDs and queried with lspci -vvvv (plus another 20 combinations -x -d -b -t etc) to try and get it to list the width. I even found a Myricom document showing how to query devices and look at the hex dump to get it, but it doesn't line up with their results and interpretation.

So far, nothing. Meanwhile, I see other people have used the command on other devices and had the link width show up; either as a value hardcoded to show what it supports or as an actual negotiated speed.

Still looking...
 
Last edited:

lmk

Member
Dec 11, 2013
128
20
18
I should have waited a few more minutes, before accepting the results :)

Figured out that the missing info was not listed because of permissions, so running the lspci -vvv as root (sudo) actually returns usable results.

LnkCap: Width x8

but...

LnkSta: Width x4

more looking to do...
 

lmk

Member
Dec 11, 2013
128
20
18
There are 3 PCI Express root ports that show up with lspci - port 1, 3, and 7. Is that normal? Does the C6100 have an inactive one? Or maybe the riser counts as a root port?

Anyways, queried the PCI Express Devices and used "lspci -nn" to get device IDs...

use the IDs to check the 3 Express slots directly:
lspci -vvv -d 8086:3408
lspci -vvv -d 8086:340a
lspci -vvv -d 8086:340e

1st (root port 1): LnkCap: Speed 5GT/s, Width x4. LnkSta: Speed 2.5GT/s, Width x4.
2nd (root port 3): LnkCap: Speed 5GT/s, Width x8. LnkSta: Speed 5GT/s, Width x4.
3rd (root port 7): LnkCap: Speed 5GT/s, Width x16. LnkSta: Speed 2.5GT/s, Width x0.
 

lmk

Member
Dec 11, 2013
128
20
18
1st (root port 1): LnkCap: Speed 5GT/s, Width x4. LnkSta: Speed 2.5GT/s, Width x4.
2nd (root port 3): LnkCap: Speed 5GT/s, Width x8. LnkSta: Speed 5GT/s, Width x4.
3rd (root port 7): LnkCap: Speed 5GT/s, Width x16. LnkSta: Speed 2.5GT/s, Width x0.
So, looks like the last port (port 7) is the PCI Express x16/full length/"proper" slot and it shows LnkSta with Width x0 because there is no card installed.

Now, the other two could have some answers, since one port shows the Speed drop from 5GT/s to 2.5GT/s and the other slow shows the Width drop from x8 to x4.
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Great work! I have mandatory OT at work so I have been working extra long hours this week and weekend, I will check today when I get home. For some reasonthat 2.5 link makes me think that those might be for the regular Ethernet NICs....possible those somehow need driver/bios update? I know that is a long shot but it might be aome thing. I have a feeling when I check tonight I am going to come up with the same results. With a little luck we will have this solved in no time.
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
I just got done reading a pci-sig doc indicating that power saving state can down regulate pci-e gen (2.5 v 5.0) and width (x8 v x4) for power savings. Another possibility could be that if we adjust power savings setting in the bios that could help. This is the line that identifies what is going on "Sending TS1 PAD in Recovery.Idle to drive LTSSM to Config(to initiate link width upconfigure)". Since you have rulled out the mellanox card as not the issue (ie the LSI cards are registering at x4 speeds) then it narrows down the scope to power savings from the bios or unreliable link issues causing the down step in speeds. Doesn't seem the os is causing the down stepping since centos and server 2012 register the speed the same. So firmware or bios setting seem to be the issue now. Hopefully this makes sense and I'm on the right track.
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Understanding PCI 2.0 bandwidth management was the name of the pci-sig PDF/white paper/over view. If you want to look through it.
 

lmk

Member
Dec 11, 2013
128
20
18
Hey all,

I finally took out my nodes and did the good ol' physical inspection...

Not good..."PCIE x4 Mezzanine Card" is printed on the motherboard...

Please check and confirm, if you have physical access...

If so, also is the C6100 the 24-bay or 12-bay version? And does it have the Mezzanine cut-out slot or did it have to be modified?
 
Last edited:

zoroyoshi

New Member
Dec 10, 2013
20
6
3
Intel 5520 chipset -- 36 Lanes pci-express
Intel 5500 chipset -- 24 Lenes pci-express

maybe, some c6100 motherboards (DCS version?) use 5500 chipset.