Unstable SYS-2027R-WRF running ESXi, what should I test ??

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

azev

Well-Known Member
Jan 18, 2013
769
251
63
Bought 2 of these server off fleabay, but I am having the hardest time with them recently.
Before I send them to Colo I had tested Memtest for a few days and it was working just fine with no error.
For some reason soon after it arrive at the colo, the server keep rebooting at random time.
No real usable log that I can find, or maybe I just dont know the keyword to look for.

The hardware is setup as follows:

- dual E5-2679
- 128Gb ddr3 (8x16gb)
- 2 Emulex oce11102 10gb CNA
- H310
- 2x 240Gb Vertex 4 SSD

I tried running both esxi 6 and esxi 5.5 but same result, however esxi5.5 usually last longer before it would randomly reboot.

I tried running Prime95 using UBD, and it would run fine until it ran out of memory.

Right now I am installing windows 2012 and going to test running Prime for a few days.

I have owned numerous supermicro server in the past and never had one that had problem like this.
Does anyone have suggestion what other testing I can do to validate the stability of the hardware ??

Thanks
 

Blinky 42

Active Member
Aug 6, 2015
615
232
43
48
PA, USA
Do you have any sense if it is basically idle or busy doing things when it reboots? Approx uptime between reboots?

Can you get to the temp sensors etc from IPMI and see if anything is out of spec there?

I also like to run breakin from Breakin » Advanced Clustering Technologies as part of my normal testing pattern.

I had an odd situation with a brand new X10 board recently where it would lock up if left just running the OS but basically idle, but really random between 15 minutes and 20 hours. Since it was new I ended up RMA'ing it after wasting the better part of a week trying to figure out what was going on. The replacement board didn't show any issues so I am guessing it was a fluke.
 

azev

Well-Known Member
Jan 18, 2013
769
251
63
Hi Blinky, thanks for Breakin, that is one kewl ISO added to my arsenal of tools :)
Anyway, I tried running the test, and so far its been running for almost 24hrs with no issues.
Initially i thought the mobo could be bad, but then I tried running esxi on the 2nd server with exactly the same result.
The 2nd server is running prime95 on windows 2012 for over 24hr allready and it proved to be stable.
I have open a case with supermicro, but unfortunately so far they only pointed me to a vmware kb article.
Right now I am planning to reinstall esxi 6 on both of these server and try again, hopefully I get a log file that are usable to determine the cause of the reboot.
 

azev

Well-Known Member
Jan 18, 2013
769
251
63
Just a quick update, the issue apparently caused by the esxi power management settings. The default was set to balanced, and as soon as I changed it to high performance the servers been up for a few weeks now.