lga3647 esxi build to host my Oracle Apps/Databases

BennyT

Active Member
Dec 1, 2018
137
37
28
*edit: You were right @rand___ ...no real benefit of 3UPI for a dual socket. Found this diagram:
2019-01-09_14-22-30.png
 
Last edited:

BennyT

Active Member
Dec 1, 2018
137
37
28
Something I just thought of. The PSU I was using has a switch on it for Multi-rail / Single-rail. I had it set to Multi-Rail when I was trying to get it to POST. Not sure if that had any affect on why the DPI board wouldn't POST. Seems like a longshot but it was something I wish I had tried prior to returning the "dead" board. Just something I'll remember when I get the replacement board.

Also, I've come to conclusion I will stick to the X11DPI-NT (same model I returned to seller) instead of switching to a X11DPH-(T). The deciding factor were those cool onboard oculink on the DPI board. Those are about a $200 value if I were to purchase an AOC with NVMe ports. The X11DPH did have one more M.2 and one more PCIe slot than the DPI but I think 6 expansion should be sufficient :)

I'd probably try the X11DPH if I were to use engineering sample processors (DPH is one of the few dual socket 3647 boards that are proven to work with ES processors).
 
Last edited:

BennyT

Active Member
Dec 1, 2018
137
37
28
I'm looking at VMware compatibility and it doesn't list the Intel c620 chipset for SATA controllers. I was going to try using the three onboard miniSAS to feed three of the Norco backplanes. I wasn't going to try using RAID, just simple JBOD. It would save the the cost of buying a new HBA or RAID card.

upload_2019-1-11_19-48-26.png

The Intel C610 and earlier SATA controllers are listed, but not the c620:
upload_2019-1-11_20-6-2.png


Can anyone confirm if Intel's c620 SATA controller works with ESXi 6.5+?

If esxi drivers won't work with Intel c620 controller that's okay. It is aa good excuse to upgrade to a good SAS card. The Norco 4224 chassis backplane is direct attach and not an expander backplane.

So I'd need to purchase HBA controller:

  1. - 24i HBA like this LSI 9305 for about $600 USD: LSI 9305-24i x8 lane, PCIe 3.0 SAS SAS 9305 12 Gb/s SAS Host Bus Adapter - Newegg.com

OR- instead of a 24i HBA, get an 8i + expander:

- 8i HBA LSI 9300 12Gbps for about $400 USD: https://www.amazon.com/LSI-LSI00344-9300-8i-PCIE3-0-Controller/dp/B00PH9VG8Y
- plus a 12Gbps 9port expander about $550 USD: HPE - storage SAS bus extender - SAS 12Gb/s - 870549-B21 - SCSI Adapters/Controller - CDW.com
Approx $900 total​

OR- instead of HBA, go with RAID card:

- I could get a hardware RAID controller if I later wanted to setup RAID 10, but initially flash it for use as HBA
Here is a 24i LSI 9361 RAID controller for about $1100 USD: https://www.amazon.com/Broadcom-LSI...UTF8&qid=1547260662&sr=1-23&keywords=lsi+9361
- add to the RAID controller a cachevault and lithium battery kit for about $170 USD​


You can see why I hope to use onboard SATA controllers initially.

Thanks!
 

Rand__

Well-Known Member
Mar 6, 2014
6,128
1,495
113
I never checked but I can't remember any issues.
I most likely attached an existing SSD with 6.7 to the board using the sata dom (orange) connectors and not sas-hd, but from my point of view they should run fine.
 
  • Like
Reactions: BennyT

BennyT

Active Member
Dec 1, 2018
137
37
28
I was thinking about trying to feed two of the Norco SATA backplanes via the oculink 8611 connections found on the motherboard. The oculink is for PCIe connections such as a NVMe drive. But I read somewhere that in BIOS I may be able to set it for SATA drives somehow.

I began searching for SFF-8611 (oculink x4 PCIe 42pin) to miniSAS SFF-8087 cables. I could not find any. I was surprised because I did find numerous companies making cables for oculink to minSAS HD (SFF-8643) cables:

Actually I did see one company making oculink to miniSAS (8087)but minimum order qty 25... and $41 per cable.

Or I could remove a backplane and plug these directly into the drives but I'd lose hotswap on that plane:

Orr I could buy a HBA card, but would be nice to fully utilize the connectors on the board.

This is where the DPH motherboard might be nice to have. THe DPH board doesn't oculink... but
the two M.2 that it does have are direct to the CPU rather than via the PCH on DPI board. On the DPH I could use an m.2 to NVMe adapter (female connector for SFF-8643), then use a miniSAS HD (SFF-8643) to miniSAS (SFF-8087) cable to the backplane.

I bring all this up now because I'm ordering a replacment motherboard today or tomorrow and I am still vacillating between he DPI and DPH for the above reasons.

*edit: nevermind. I think I'll stick again to the X11DPi. I'll simply use the three SAS connectors on the mobo to feed 3 of the six backplanes. Eventually I'll get an HBA.
 
Last edited:

BennyT

Active Member
Dec 1, 2018
137
37
28
Ordered the replacment X11DPI-NT from wiredzone.com

The orginal seller I purchased from was reputable on Amazon (they had very good feedback) and they honored the full refund and paid for the return shipping.

But I decided this time to try wiredzone rather than the Amazon seller for a few reasons.

1)Wiredzone is listed officially as "authorized" on Supermicro website
2) They dropship most motherboards from Supermicro. This means I'm not likely to get motherboard from very old inventory with old firmware/BIOS etc that may have sat on shelf at reseller for a year or two.
3) If a board arrives DOA, Wiredzone will cross-ship overnight the replacement while I ship back the dead board. faster turnaround.
4) They have live chat and I can get a full transaction log of the chat. This is really nice to have and much faster than email exchanges. I also prefer chat over voice calls because so much easier to understand/communicate, in my opinion.

I'm not trying to promote their company, this is first time I've ordered from wiredzone. Just stating my reasons for selecting them this time around. I'll post my experience after I get the board, but so far has been good (I've chatted already with them via their live chat to understand how returns work if I get another dead board).
 

BennyT

Active Member
Dec 1, 2018
137
37
28
Hello. I'm going to have to call SuperMicro support. I must be doing somthing wrong. I've received the new board delivered straight from Supermicro. I'm having exact same problem as before. I couldn't possibly have had two dead boards. This time I'm running it outside the chassis on the cardboard box.

IPMI Hardware Info again shows the same bogus hardwware tree with parts I don't even have installed (shows two 6126, I have a single 6130. It shows the 16 DIMMs are filled with 16GB Hynix. I have only two 32GB Hynix. I'm really thinking that is carry over from the test hardware.

Maybe I need to flash new BIOS perhaps?

Here is what the main screen shows in IPMI:
upload_2019-1-23_17-24-10.png


Same problems as before where it doesn't display anything to VGA display. IPMI doesn't know the system is powered on, presumably because BIOS is failing? Maybe the CPU is bad? I don't hear any beeps during boot up, but I'll try again. Maybe I didn't connect the buzzer speaker properly.
 

BennyT

Active Member
Dec 1, 2018
137
37
28
*EDIT: recapping my issues --


short summary of my hardware:

Supermicro X11DPI-NT motherboard

1x Xeon Gold 6130

2x Hynix 32GB 2666 sticks

PSU Corsair HX1200​



I can access IPMI but the issues I'm having :

-system does not POST.

-powered on, there is no VGA output via the motherboards AST2500 BMC.

-powered on, IPMI doesn't see the sytem as being on. Unable to power system off from IPMI as it doesn't see it as being on.

-powered off, IPMI can poweron the system but IPMI doesn't see it as powered on (and it doesn't POST and there is no VGA output)

-iKVM says "no signal"

-html5 window, similar to iKVM, says "no signal"

-no beeps/buzzers from the speaker attached to JD1 header (speaker header) even if I remove all RAM sticks

-Hardware Info shows incorrect hardware. It shows dual Gold 6126, and all 16 DIMMS populated with 16GB Hynix sticks. I'm assuming this is from Testing at factory because I have a single 6130 and only two 32GB RDIMMS. Both motherboards, the one I RMA'd and the new one I just received show this weird hardware info tree.

-the CPU is cold to the touch even after having system powered on for 30+ minutes​



What I've tried:

-confirmed the 8pin CPU power is connected (I also tried connecting two of the 8pins from PSU even though I only have one CPU ).

-confirmed the 24 pin is connected properly. The BMC heartbeat is pulsing, which means BMC is fine. The power LED on the board near the frontface headers is green, which means the board sees power.

-confirmed that I have the proper DIMMs populated P1DIMM-A1 and P1DIMM-D1 for a single CPU with two sticks of RAM. This info is on the quick reference guide which came with the board, also it is in the manual.​

-booting with motherboard outside the chassis with no difference.

-removing CMOS battery, resetting CMOS by jumping the CMOS pads, and reinstalling battery (according to the manual).

-Tried altering the amount of turns/torque on the LGA3647 CPU heatsink. The manual states no more than 12lbf. I think that is "pounds force" and because scale is absent it is probably 12 lb inches. I don't have an inch lb tq wrench or tq screwdriver so I've been trying to guess what "feels" like 12 lbs inches.

-socket pins look good. no bent pins. I confirmed using my smartphones magnifyer app to zoom in reallly close.​

I RMA'd the first board because when I couldn't get any beep codes I figured the board was simply dead. But I'm having exact same issues on this 2nd board. Both boards are the X11DPI-NT. I find it hard to believe two boards are dead. The first board I bought from a reseller on Amazon. This 2nd board I bought from Wiredzone.com and was drop shipped direct from Supermicro in San Jose CA.

What I've not tried yet:

- I've not tried flashing BIOS. To do this I need to acquire an IPMI license (about $20 I think). Or I may try a USB stick with FREEDOS and the BIOS files on it, plugged direct into the board. I'll have to read up on how to do this exactly. I'm unsure what my current BIOS version is.

- I've not tried replaceing CPU. Maybe I have a bad CPU. I'm seriously considering this one as I have only 7 days before returns are no longer accepted.​

*update: I talked with the CPU seller and they issued me an RMA on the CPU. I'm getting crazy with these RMAs, I'm telling ya.
*also had a chat with friends on reddit in r/homelab...from the dialogue there it seems if it were bad RAM then I would've heard beep codes. Since I'm not hearing any beeps it is likely bad CPU or PSU. Another indication of bad CPU, it is cold to the touch even after being powered on for 30+ minutes. I have a few spare PSU from other servers that I will try tonight. But I still have my RMA return label if I need to mail the CPU back.
 
Last edited:

itronin

Well-Known Member
Nov 24, 2018
894
560
93
Denver, Colorado
Have you tried wearing a brown fedora and strapping a bullwhip & pistol to your hip while troubleshooting? ;) You sir are on a full quest of pulp proportions. "BennyT and the Gold Motherboard".... sorry don't mean to make light of your situation. I think all of us who build systems and especially servers have been somewhere similar before so keep plugging away - its truly a process of elimination now and you seem to be whittling down the possible culprits.
 
  • Like
Reactions: BennyT

BennyT

Active Member
Dec 1, 2018
137
37
28
I'm definitely learning allot from this experience.

I tried with a different PSU but same issues. So I've removed cpu, reboxed it and dropped it off at UPS for shipping back to seller.


Now I'll grab a pizza and go watch "raiders of the lost ark". Ha
 

BennyT

Active Member
Dec 1, 2018
137
37
28
And the results using the replacement CPU? SUCCESS

man oh man that was a learning experience for me
IMG_20190207_092510_01.jpg

iKVM HTML5 works as expected now:
upload_2019-2-7_10-4-37.png


IPMI reports correct hardware
upload_2019-2-7_10-5-52.png


Thanks for the supportive feedback, encouragements and for your patience as you watched me lose my mind the last few pages.

Next up is to examine the BIOS, plug in the drive backplane and other peripherals, and eventually install ESXI onto my 80GB SSD.

At the moment I hear the fans speed up and slow down every few seconds. I have the PWM fan wall plugged into FANA. I have alot of learning to do.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,395
506
113
Thanks for the supportive feedback, encouragements and for your patience as you watched me lose my mind the last few pages.
I've got a voodoo doll of your replacement CPU and I'll be sticking pins in it shortly! You should probably also book an exorcist and buy some gremlin repellent.

At the moment I hear the fans speed up and slow down every few seconds. I have the PWM fan wall plugged into FANA. I have alot of learning to do.
This is almost certainly due to the speed of one or more of your PWM fans dropping below the IPMI's LCR threshold - the good news is you can usually solve this by tweaking (lowering) the lower thresholds to stop the assert from happening. The IPMI log and web GUI should show you which fan(s) are causing the asserts and why, and the output of `ipmitool sensor` should give you a full run-down on what current fan rpm's are and what the upper and lower thresholds are currently set to.
 
  • Like
Reactions: BennyT

BennyT

Active Member
Dec 1, 2018
137
37
28
@EffrafaxOfWug you were right

IPMI sees my two 80mm exhaust fans (FAN5 and FAN6) running below a preset threshold...

now to install SMC IPMItools so I can adjust and save the corrected config.


FAN1 is my CPU1; FAN5/6 are my 80mm exhausts fans; FANA is to my fanwall with three 120mm fans.
upload_2019-2-7_13-40-0.png

this shows after IPMI briefly ramps up ALL FANS because the two exhaust fans being "lower critical". two seconds later and they settle back to 400 RPM and raises the alert again.
upload_2019-2-7_13-40-42.png
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,395
506
113
If you're using linux, ipmitool can also set the thresholds from within the OS (and do a lot more useful stuff besides). If you're happy with the thermal performance of your fans at 400rpm you can set the LNR/LCR/LNC all at 400rpm like so:

Code:
ipmitool sensor thresh FAN5 lower 400 400 400
ipmitool sensor thresh FAN6 lower 400 400 400
Might need a BMC reset to take effect.

If you want to see what your existing threshold is, run ipmitool sensor and you'll get output like this:
Code:
effrafax@wug:~$ ipmitool sensor | grep -i fan
FAN1             | 900.000    | RPM        | ok    | 300.000   | 500.000   | 600.000   | 25300.000 | 25400.000 | 25500.000
FAN2             | 800.000    | RPM        | ok    | 300.000   | 500.000   | 600.000   | 25300.000 | 25400.000 | 25500.000
FAN3             | na         |            | na    | na        | na        | na        | na        | na        | na
FAN4             | 700.000    | RPM        | ok    | 300.000   | 500.000   | 600.000   | 25300.000 | 25400.000 | 25500.000
FANA             | na         |            | na    | na        | na        | na        | na        | na        | na
Columns from left to right are: sensor name, current value, name of value, whether this is considered OK or not, LNR, LCR, LNC, UNC, UCR, UNR. The upper ones are never set to realistic values on the SM boards I've used so it's only the low thresholds you need to try tweaking.
 
Last edited:
  • Like
Reactions: BennyT