Intel S2600WTT based server - Nmi activated - system halted

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Patrick

Administrator
Staff member
Dec 21, 2010
12,512
5,800
113
Hopefully this helps someone in the future. I was trying to install Ubuntu on an Intel S2600WTT based server. I was using the LSI SAS2308-IR mezzanine card with a RAID 1 pair of SAS SSDs. When I got a Nmi activated - system halted error:
upload_2015-11-19_9-23-48.png

I had 16x16GB DDR4 DIMMs so I pulled eight and tried using 8x Samsung 16GB DIMMs. That did not fix it. I then tried 8x Micron 16GB DIMMs and that did not fix it. Finally, I tried 8x SK.Hynix 16GB DIMMs and that also did not work. I felt pretty good about it not being a memory error as the might have suggested via Google.

I then tried updating the BIOS by getting the newest October 2015 version for the board but it would keep going back to Nmi activated - system halted.

I pulled the battery and let it sit for 15 minutes. Still would give me Nmi activated - system halted whenever I got through the POST (including the LSI sequence.)

Finally, I decided to just start pulling drives. I pulled both Samsung 200GB SAS SSDs and viola! The system booted the UEFI update package off of the USB key.

Very odd error, but hopefully it is a lesson that someone can benefit from in the future. I know a lot of us are using Intel branded servers. I was very surprised to see this with an Intel mezzanine card in an Intel server.
 

JeffroMart

Member
Jun 27, 2014
61
18
8
45
We've seen this in the past with a lot of higher end PCI and PCI-E add-in NICs over the years such as Silicom, Interface Masters, and Napatech cards, but I've never seen an Intel mezzanine card trip the NMI. That slot obviously uses PCIE as well, so maybe you have a bad mezz card? Were the Samsung drives connected to it or onboard?
 
  • Like
Reactions: Patrick

Patrick

Administrator
Staff member
Dec 21, 2010
12,512
5,800
113
We've seen this in the past with a lot of higher end PCI and PCI-E add-in NICs over the years such as Silicom, Interface Masters, and Napatech cards, but I've never seen an Intel mezzanine card trip the NMI. That slot obviously uses PCIE as well, so maybe you have a bad mezz card? Were the Samsung drives connected to it or onboard?
They were connected to the mezzanine card. I think they were 1625 200GB drives. I am going to try another pair before bringing it back to the datacenter.
 

Lost-Benji

Member
Jan 21, 2013
424
23
18
The arse end of the planet
I am feeling Jeffro is on the money with the riser card. Go into the BIOS and look for any settings along the lines of FRB (Fault Resistant BIOS) or Watchdog's and knock them on the head. Also test with a Windows load as you may have issues with the Ubuntu playing with it and an OS Watchdog styles setting/problem.
RAM is not the issue or it would tell you with a DIMM indication on the board and fault code lights at rear.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,512
5,800
113
Well this is more encouraging:
Code:
# zpool list
NAME          SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
p3600400gb    372G   276K   372G         -     0%     0%  1.00x  ONLINE  -
rpool         186G  1.13G   185G         -     0%     0%  1.00x  ONLINE  -
xs1715800gb   744G   276K   744G         -     0%     0%  1.00x  ONLINE  -
I just gave up on the SAS expander. 2x Samsung XS1715 800GB's and Intel P3600 400GB's.
 
May 4, 2015
39
16
8
41
Well this is more encouraging:
Code:
# zpool list
NAME          SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
p3600400gb    372G   276K   372G         -     0%     0%  1.00x  ONLINE  -
rpool         186G  1.13G   185G         -     0%     0%  1.00x  ONLINE  -
xs1715800gb   744G   276K   744G         -     0%     0%  1.00x  ONLINE  -
I just gave up on the SAS expander. 2x Samsung XS1715 800GB's and Intel P3600 400GB's.
I know this is a necro bump, but ran into the same error message as above. Went into the BIOS and disabled SERR reporting. Fired right up after that. this was on a intel s2600 board. Hopefully this helps someone in the future
 
  • Like
Reactions: Patrick