Hi,
I have a weird problem. I purchased 4x used identical Sun QDR X4242A Infiniband cards (Mellanox MHQH29B rebrands) for my 4 node supermicro 6027TR-HTR (X9DRT-HF) server. 3/4 cause boot to hang at post code 91, which is when the PCI stuff is loaded. The fourth one seems to let the node boot fine. Switching nodes/pci slots doesn't help, and I know all the pci slots work fine. "OK, so you have 3 dead cards."...except not. Here's the weird part: they all boot fine in my work station (i7-5960x, X99-SLI motherboard) and are recognized by lspci and ibstat.
I did some more troubleshooting. The only difference I could find was the card firmware version. The one that works in the supermicro server has 2010 firmware while all of the others have 2012 firmware. That's kind of odd to me...you'd think the older firmware would have problems, not the newer.
My supermicro server nodes had a 2013 bios, which I updated (after updating IMPI) to the latest (2015) bios. Same problem. I tried various bios settings, none helped. I tried the SMBus pin covering trick, but that didn't help either. The older firmware card still works with the system, but the newer firmware cards all prevent boot from completing.
Any ideas?
I don't have a Sun/Oracle support contract, so I can't access any of the firmware updates for their cards. I also don't have access to the older firmware version, so I can't roll back my 2012 cards. I could try to reflash with the latest ConnectX-2 Mellanox firmware for the MHQH29B using my desktop, but I'd like to exhaust all other options because that looks difficult to do.
Thanks
I have a weird problem. I purchased 4x used identical Sun QDR X4242A Infiniband cards (Mellanox MHQH29B rebrands) for my 4 node supermicro 6027TR-HTR (X9DRT-HF) server. 3/4 cause boot to hang at post code 91, which is when the PCI stuff is loaded. The fourth one seems to let the node boot fine. Switching nodes/pci slots doesn't help, and I know all the pci slots work fine. "OK, so you have 3 dead cards."...except not. Here's the weird part: they all boot fine in my work station (i7-5960x, X99-SLI motherboard) and are recognized by lspci and ibstat.
I did some more troubleshooting. The only difference I could find was the card firmware version. The one that works in the supermicro server has 2010 firmware while all of the others have 2012 firmware. That's kind of odd to me...you'd think the older firmware would have problems, not the newer.
My supermicro server nodes had a 2013 bios, which I updated (after updating IMPI) to the latest (2015) bios. Same problem. I tried various bios settings, none helped. I tried the SMBus pin covering trick, but that didn't help either. The older firmware card still works with the system, but the newer firmware cards all prevent boot from completing.
Any ideas?
I don't have a Sun/Oracle support contract, so I can't access any of the firmware updates for their cards. I also don't have access to the older firmware version, so I can't roll back my 2012 cards. I could try to reflash with the latest ConnectX-2 Mellanox firmware for the MHQH29B using my desktop, but I'd like to exhaust all other options because that looks difficult to do.
Thanks