Problems booting Kernel >4.10, stuck at boot screen

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

darkconz

Member
Jun 6, 2013
193
15
18
I am running into a problem on my intel 4 node server. The server board is S2600JF all 4 nodes in the chassis have the latest available firmwares on the intel site.

All boards are equipped with 32GB ram and 2xE5-2630L.

How I noticed the issue was after I updated the Proxmox nodes to 5.1, none of the servers would boot. So I did different tests since then. I installed Proxmox 5.0 and it worked. Reinstalled 5.1 and it failed to boot.

The last message on the screen when it hangs is IPMI kcs interface initialized.

Things I have tested:
1. Installed Proxmox 5.0 then upgrade to 5.1 (failed to boot after reboot)
2. Installed Debian then install Proxmox 5.1 on top of it (failed to boot after reboot)
3. Installed Debian and various kernel versions (4.9.x worked, 4.10.17 worked, 4.13.x failed, 4.14.x failed, 4.15.x failed)

So this is leading me to think there is something I am not seeing here which the new kernel.

Can somebody please point me in the right direction?

Thanks


Sent from my iPhone using Tapatalk
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
i haven't used debian in a while, but does it by default boot with the 'quiet' option? if so, remove that from grub and see what messages you see during the boot failure. that might tell you a lot more as to why it is failing to boot.
 

pricklypunter

Well-Known Member
Nov 10, 2015
1,709
517
113
Canada
Are you mounting anything, like an iSCSI block device etc before the interface for it has come up?
I ask because I have a hang issue the other way round. Mine hangs on a reboot, from Deb 9 onward. Deb 8+ worked perfectly though. So far there's no fix for it that I have been able to find, just a script proposed as a possible workaround :)
 

darkconz

Member
Jun 6, 2013
193
15
18
Nope, this is a fresh install of the OS and kernel with no other addition.

I also removed the quiet tag in boot but still nothing that pops out stating the obvious.


Sent from my iPhone using Tapatalk
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
Nope, this is a fresh install of the OS and kernel with no other addition.

I also removed the quiet tag in boot but still nothing that pops out stating the obvious.


Sent from my iPhone using Tapatalk
well, you can either post what you see or try to describe it better than what you already have in OP. do you know what stage of the boot process you are in when the boot fails? pass grub? pass kernel load? pass initramfs? are you pass pivot to root? etc. etc..

if you have a late-stage boot failure, then I would recommend booting into single user mode first. See if you can get that far at least. If you can, then get yourself to multi-user.target. Perhaps figure out which service start up is causing the failure, and disable it and see if that boots up?
 

MiniKnight

Well-Known Member
Mar 30, 2012
3,072
973
113
NYC
I haven't seen this one myself yet. I'm seeing more of the zfs initialization time errors lately where you need to add a boot delay to fix.
 

darkconz

Member
Jun 6, 2013
193
15
18
Thanks for the suggestion going into single user mode. I can get into the single user mode but I am not sure how to get into multi-user.target.. is that with the command in boot: systemd.unit=multi-user.target ?
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
Thanks for the suggestion going into single user mode. I can get into the single user mode but I am not sure how to get into multi-user.target.. is that with the command in boot: systemd.unit=multi-user.target ?
ok, so that's actually good news. this is a late stage boot issue it seems, if you can get to single user mode - it means, you pass grub, pass kernel loading, pass initramfs and everything that goes on from initramfs. if you can get into single user, usually that also means you've successfully pivoted to your system root disk.

so, you've now narrowed it down quite a bit to some service starting problem. to get to multi-user.target:

systemctl isolate multi-user.target

however, that will probably just get you into the problem situation you were having previously. what I might do first, since you have access to root disk in single user mode, is to start looking through the logs to see if you can find the problem. every linux distro logs slightly differently, but most of the time it is in /var/log. some times there's even a log of the last failed boot up in /var/log/dmesg.log or something like that. I would look for that, or perhaps boot.log or similar. I would also look at /var/log/messages.

if you find some information about the culprit of your problem, feel free to post and we can look at it closer. the goal at this point is to isolate it down to a single service that is causing the problem. we can then confirm it by disabling that service and rebooting to see if it then successfully boots up with the problematic service disabled.

once you isolate it down to a single service, we can then modify the systemd start up definition of that service to collect diagnostics before it is started so you can see the state of the system at the time that service starts. then re-enable it again reboot to collect diagnostic data. we can then troubleshoot why that specific service is failing to start and why it is blocking.