A workstation / home server build: IPMI versus vPro

twojays

New Member
Apr 23, 2021
1
0
1
I'm looking into building a decent home workstation that double-functions as home server, running 24/7, accessible and managable through vpn. I'm planning to run virtual machines on it, although undecided yet what type, likely a proxmox hypervisor (type 1). Since I'm regularly abroad and since my place regularly suffers from power cuts, I want to have a machine-level (rather than OS-level) managing solution available to be able to restart it remotely when necessary. I firstly presumed this boiled down to must for IPMI, but then I came across Intel's vPro's AMT. It is not clear to me what this allows for, however, and how exactly it differs from IPMI. For one, I understand it does not use its own ethernet port, rather it seems to utilize some pin(s) on ordinary ports. I also understand that vPro is more business oriented from reading Intel's web pages and reading several other forums. But, what exactly does that mean? It is bugging me that I can't find anything concrete about this solution, rather most information I come across is marketing material. If vPro would allow me to remotely monitor and (re)start my workstation, then it would have my preference over IPMI simply because it allows for a cheaper and more inclusive workstation-oriented motherboard.

In particular, I'm considering the Xeon W-1290P processor for this build, because it offers sufficient performance, seems to idle at very low power (https://www.servethehome.com/intel-xeon-w-1290p-benchmarks-and-review-a-top-end-sku/3/), quick sync for hardware acceleration (Intel in general), and I found a good deal for it. I preferably also stick to a mATX form factor, partially related to me liking the Node 804 case, but also because I do not intend to equip it with multiple extension cards in the future (perhaps only a single GPU). With its W480 chipset and a desire for IPMI, this does not leave a lot of options: the Asrock Rack W480D4U or the SuperMicro X12SCZ-F or X12SCZ-TLN4F, which cost an arm and a leg (especially the latter for offering 10GBe over 1GBe). Instead, for example the GIGABYTE W480M Vision W does not offer IPMI, but supports vPro, comes at less than half the cost and includes many additional peripherals (including 2.5GBe, audio ports, display ports and USB 3.2, optical S/PDIF). Would its support of vPro serve my purpose of remote management?
 

Whaaat

Active Member
Jan 31, 2020
169
44
28
If you need to remotely reboot your workstation only, vpro is enough; ipmi offers much wider monitoring and management capabilities
 

Stephan

Active Member
Apr 21, 2017
247
123
43
Germany
Honestly, get an IPMI machine that has Intel 211 and no 219 network chips. Or better yet, something else. Reasons: the Intel 219 kind of network chips that usually go with vPro enabled machines have serious bugs. If you intend to run Linux, prepare for ethtool workarounds to disable certain hardware offloads. Yep, even in machines released in 2021. Problem is already at least 5 years old. If you fancy some admin gore, google the e1000e driver and "stuck" or "hung". vPro is also badly supported by Intel. Sometimes it works, sometimes not. Keep vPro client like Mesh Commander open when booting? NIC could be stuck at whatever speed it had, and that could be 10 Mbps (power saving measure). IPMI will work all of the time, because it is a separate solution. Also Comet lake is still plagued from Meltdown/Spectre slowdown for context switches. Only in last 11th gen has Intel fixed this to get back to pre-2016 levels. But power consumption is through the roof with 11th gen Intel.

I say get an ECC capable Ryzen board with IPMI. An ASRock Rack X570D4U-2L2T should work nicely if board (chipset, ethernet) is cooled properly with a 120/140mm fan. ECC UDIMMs. Ryzen 5800X. Look here: https://forums.servethehome.com/ind...0d4u-2l2t-build-for-esxi-and-nas-usage.30993/ for more ideas.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,395
499
83
I'd echo Stephan's advice, but I'm biased because I'm already running a Ryzen server-lite platform with IPMI (X470D4U with a 3700X and ECC) and it's a very nice little package.

I can't speak for the NIC funkiness myself as I'm using either i210s or X710s.

AMT isn't really in the same league as IPMI; IPMI is a full out-of-band management system. AMT was, I think, initially aimed at business desktops to allow things like remote poweron and poweroff for power-saving reasons. Certainly last time I used it, using the remote console via AMT was a complete pain in the arse, the HTMLv5 console in modern IPMI implementations is so much nicer (and personally I much prefer being able to put the IPMI NIC on a dedicated network separate from the main data one). I don't think AMT allows for mounting remote media either.

An old thread from StH asking much the same thing:
 
  • Like
Reactions: Stephan

Whaaat

Active Member
Jan 31, 2020
169
44
28
here is how dell's implementation of amt looks like (16992 port). Should the workstation hung up, amt is enough to reboot it and once OS loads correctly gain the full controls again

amt.PNG
 
  • Like
Reactions: vanfront

Stephan

Active Member
Apr 21, 2017
247
123
43
Germany
Just to add about AMT, Intel has "differentiated" the living daylights out of this feature. They hate their customers and until Ryzen/Epyc came out, the market by necessity let them get away with it, because there was no good alternative.

For starters, there are different versions of AMT, check out the table on bottom: Intel® AMT SDK Implementation and Reference Guide

In addition to correct chipset (e.g. I use Q77, C236), you need the correct BIOS with correct AMT license set by manufacturer. Good luck finding out pre-purchase, when sales material only blabs about "AMT vPro". The few freaks left using AMT I think either buy a sample machine and then the 1000 they planned, or download a BIOS update and then analyse the ME portion with the excellent platomav/MEAnalyzer to find out what AMT they will truly get. This is nuts.

Usually you only get "Standard Manageability" and that is a webserver like shown in the picture by Whaaat, running on an embedded x86 core (used to be ARC) within the Q- or C-chipset PCH. No KVM features! You can inventory the machine and turn it on/off, that's it. Can't mount an ISO etc. Next to useless! Did I say useless, well, exploitable n months after release usually, like by sending it an empty password hash string. NSA knew exactly why they wanted this whole thing off the moment the platform has started up and the main CPU is booting.

When you are abroad and need to fix things, instead of "Standard Manageability" you want "Full". Or a cluster with auto-failover in case a node fails? Going off on a tangent here. Well, I have seen people patch the "Standard Manageability" flag to zero (enabling all features) and hunting down a suitable "RGN" (never booted "clean" stock image) "Corporate" ME flash image for their board. Because if you have "Consumer" or "Slim" (latter only in use by Apple I think), then again, no KVM. Google "Intel Firmware Repositories: (CS)ME, (CS)TXE, (CS)SPS, GSC, PMC, PCHC, PHY & OROM" just in case you want to swap the PMC portion of your ME image for something else (kidding).

Not done yet! You also need a Core i5/7 (no i3!) and also no -K version of those CPUs. I kid you not. So if you are thinking like me "great, Q77 board and that 3550-K from the bin, will use KVM on this", you will get a nice Intel differentiated product surprise. Oh yeah and you need an iGPU CPU with internal Intel graphics.

And if you persisted like a mad man and got all this working, you will be pleasantly surprised when using the shared AMT/host Intel i219-LM 1 Gbps ethernet port, that you will only get ~90 MByte/s instead of 115. DMA bug: Performance with Intel i218/i219 NIC - Hetzner Docs "This fix slighly slows down DMA access times to prevent the NIC to hang up on heavy UDP traffic." In actuality "it depends". Linux 5.10+ kernels appear to have introduced a regression that lets e1000e devices hang even with tso off gso off. No problem on consumer boards, that hang every couple hours will probably go unnoticed. Different matter if you run your Asterisk VOIP and the interface goes away for 5-10 seconds while the kernel does its best to reset the device and bring it back.

Still want AMT?

I say go buy that Ryzen board with ECC and IPMI. Whatever bugs AMD or Asrock might have in their chips and their IPMI, it just can't be worse.
 

jabuzzard

New Member
Mar 22, 2021
20
8
3
If you need to remotely reboot your workstation only, vpro is enough; ipmi offers much wider monitoring and management capabilities
Not strictly true. If you have AMT version 11.6+ or 12.0.20+ you can download MeshCommander into the AMT flash and get a KVM and IDE redirection right out the web page. You can even make it out of band if you are willing to burn an ethernet port on the board for that. It makes for a cheap lights out management solution.
 

Whaaat

Active Member
Jan 31, 2020
169
44
28
If you have AMT version 11.6+ or 12.0.20+ you can download MeshCommander into the AMT flash and get a KVM and IDE redirection right out the web page
If I understand it correctly to get KVM functionality you need a processor with iGPU. Most server xeons (except e3 maybe) do not have iGPU thus making AMT useless for KVM. On the contrary BMC offers dedicated GPU hence with IPMI enabled server motherboard you will have KVM with any processor.
 

RageBone

Active Member
Jul 11, 2017
421
109
43
yes and no, i haven't seen a ipmi solution yet, that can grab the picture off of a dedicated GPU that is probably in use in a workstation.
Meaning that the moment your OS boots and displays picture on your monitor(s), ikvm from the ipmi is displaying either a black screen, or still the last picture(s) of what it showed before.
Experience is based on a SM X10DRi F with a GTX750ti. iKVM kept displaying the Screens before entering the BIOS with writing "system initializing ., .., ..."

It is a technically solvable problem, but i don't know any available solution for that yet.
 

jabuzzard

New Member
Mar 22, 2021
20
8
3
Indeed you need to pick your CPU carefully, but you can get full remote functionality, the only thing you sacrifice is ECC RAM. Personally I am upgrading my home server of 11 years from a Supermicro X7SLA which is still going strong even been doing Plex duties for six years now to a Supermicro X11SCV-Q with an i5-9500T to let me run MeshCommander. If I went with a Xeon I would have to hunt around for a second hand part of unknown providence to get something with Quick Sync or run an Nvidia card which means a much higher power draw and the complications of the integerated KVM not working.
 

Stephan

Active Member
Apr 21, 2017
247
123
43
Germany
If you want ECC (and you should want) you need a socket 1151/1151v2/1200 entry-level Xeon with iGPU. ECC UDIMM only, though, RDIMMs are working only with bigger E5 Xeons. Why iGPU, because Intel ME has no idea how to pull framebuffer contents from a Radeon or Geforce GPU.

Also be aware that most motherboards do not support headless mode for AMT KVM, so you either need a monitor connected or one of those "emulators" in the form of a HDMI or DP plug that simulate a connected display. Otherwise your iGPU will power off and go into RC6 power saving mode once the driver is loaded by the operating system, and you will see no picture over KVM. I've seen such headless support only in Gen7 NUCs by Intel in their "Visual BIOS".

On Supermicro you can set console redirection to AMT (Serial Port Console Redirection -> EMS Console Redirection: Enabled + EMS Console Redirection Settings -> Out-of-Band Mgmt Port: AMT SOL) and in Linux add boot parameter "console=ttyS4,115200" so that systemd will create an agetty console which you can then use to login. Might need that port in /etc/securetty and then root login over serial should work. That would sidestep any KVM issue.

I saw someone use a Raspberry Pi to create a poor man's KVM, might be another (better) option than all this AMT hassle.
 

Whaaat

Active Member
Jan 31, 2020
169
44
28
I saw someone use a Raspberry Pi to create a poor man's KVM, might be another (better) option than all this AMT hassle.
as long as OP only wants to be able to remotely restart a poorly behaving machine, even smart plug with wifi control (plus power meter as a bonus) will be enough. and 'always on after power fail' setting in bios of course.
 

Stephan

Active Member
Apr 21, 2017
247
123
43
Germany
as long as OP only wants to be able to remotely restart a poorly behaving machine, even smart plug with wifi control (plus power meter as a bonus) will be enough. and 'always on after power fail' setting in bios of course.
Beg to disagree. If you are away from home you want KVM in addition to power controls, whatever kind, accessible by VPN. Imagine CMOS has lost its contents because that 20 cent 3V coin cell died way ahead of its time. Now every time on boot you get a "Settings reset to defaults, to continue hit F3" and machine just sits there. So try every key blind?

To round out measures: I recommend "errors=panic" mount option so when the slightest is wrong with a block device, the kernel will panic and reboot. Usually prevents more serious corruption in fs too and for sure alerts you that something bigger might be wrong.

Add to sysctl kernel.panic=1 and kernel.panic_on_oops=1 so kernel will also panic and reboot on oopses.

Finally use wdctl and add to file /etc/systemd/system.conf the lines RuntimeWatchdogSec=1min and WatchdogDevice=/dev/insertdevicehere to point systemd to the watchdog device. On my C236 board I also need this:

$ cat /etc/udev/rules.d/98-watchdog.rules
ACTION=="add", KERNEL=="watchdog*", SUBSYSTEM=="watchdog", ATTR{identity}=="iTCO_wdt", TAG+="systemd", SYMLINK+="watchdog-ich"
ACTION=="add", KERNEL=="watchdog*", SUBSYSTEM=="watchdog", ATTR{identity}=="iamt_wdt", TAG+="systemd", SYMLINK+="watchdog-amt"

because different kernels enumerate watchdog0 watchdog1... in different order. Here I opted for the ICH watchdog.

Of course as soon as you're away travelling, some other failure mode will put your house offline. :) Cat jumping onto server case and hitting power button shutting the server down (seen it live; good choice of location, Fractal Design and NZXT). Nearby lightning strike frying the VDSL modem frontend chip (seen it). Power outage with UPS running dry, but nothing comes back up once power is restored because you forgot to tell the UPS to kill power (done that). Dead PSU (ran for 10 years and I ignored age). And my favorite, Intel Microcode bugs! Something in the wake of Meltdown/Spectre came down as an update, but when applied machine started oops'ing every couple of days. Uh oh - you think PSU, RAM, flakey mainboard. Being the tough nerd that you are, you read up on crashkernel= and how you can let the kernel start a small new Linux instance in case of a panic, which compresses a crash dump from RAM to disk, and then reboots. You recompile the kernel will full debug symbols, simulating and bugfixing the process by crashing your system on purpose, multiple times, finally all working great. Because it wouldn't be the first regression that GregKH has backported into the LTS branch. Then you wait. A few days later it happens, but what is going on? Why is the machine just frozen? No crash dump?? Resignated, you hit reset. Thinking through the last couple of updates you made, it dawns on you what might be going on. You downgrade to the JCC bugfix microcode because that one really is a serious bummer, and magically, the bug is gone. You contemplate sending a wtf letter to Intel Haifa's design team but never do it.

I think one of my next projects will be a PCEngines APU4D4 with 4G/5G modem, that runs Asterisk for chan_dongle or something else that can react to SMS, and that will turn on an outbound VPN tunnel once it receives an SMS with a password.
 

Whaaat

Active Member
Jan 31, 2020
169
44
28
Imagine CMOS has lost its contents because that 20 cent 3V coin cell died way ahead of its time. Now every time on boot you get a "Settings reset to defaults, to continue hit F3" and machine just sits there. So try every key blind?
I have never seen a single machine with dead battery that is constantly connected to mains power. As I understood, battery is not in use when at least standby voltage from PSU is applied. I have servers that run constantly since 2008 and non of them ever asked for the battery replacement. But thanks for the reminder, I'll replace them as a preventive measure.
As you correctly noticed, there are a whole bunch of hardware problems, that cannot be resolved nether by KVM nor even by a random person on site. For instance I suffered from two sudden death of PSU in ProCurve switches and one in Cisco router. In this case even KVM via VPN will not help you because you will no longer have VPN)))
 

jabuzzard

New Member
Mar 22, 2021
20
8
3
I have never seen a single machine with dead battery that is constantly connected to mains power. As I understood, battery is not in use when at least standby voltage from PSU is applied. I have servers that run constantly since 2008 and non of them ever asked for the battery replacement. But thanks for the reminder, I'll replace them as a preventive measure.
You need to run more servers for longer in that case :) Basically the batteries even with no power draw have a limited life span. Eventually the battery goes bad. At this point I would note that the terminal voltage is probably still 3V but it can't actually provide sufficient current so monitoring that is no good. Then for some reason you have a power outage. My favourite being that a PSU goes bad and trips the breaker on the panel. At that point the battery can't keep the CMOS going and when the power comes back it all goes to pot and you have a whole rack of machines to replace the batteries in.

I would note that double PSU failures are more common than one might imagine in my experience, having been on the receiving end of several now. Basically the two PSU's in the piece of equipment are likely off the same production line at more or less the same time. The first PSU goes and the second has to immediately take twice the load it was before (or even go from nothing to 100% depending how they are configured) and it decides it can't do that and promptly fails itself.
 

Stephan

Active Member
Apr 21, 2017
247
123
43
Germany
Yes, CMOS backup battery is not used when server is always on. It ages and self-discharges though, so that is a classic when you get to a customer to swap out ancient gear. Turn off old server that ran for 10 years to say redo the even older UPS, turn server back on and after 10 years the battery is done and so are the BIOS settings.

Of course you are supposed to employ stack of switches that run MLAG and will do hitless failover in case the PSU of one goes bad. Or during a firmware upgrade of one. This is STH after all, right? I like personally like Arista: https://cheatography.com/sh-arista/cheat-sheets/arista-mlag/pdf/
 

Whaaat

Active Member
Jan 31, 2020
169
44
28
Turn off old server that ran for 10 years to say redo the even older UPS, turn server back on and after 10 years the battery is done and so are the BIOS settings.
Nice trap, hah? Have you find a solution for this classic surprise?
 

Stephan

Active Member
Apr 21, 2017
247
123
43
Germany
To avoid surprises, first measure will always be to virtualize old servers onto new hardware using some sort of imaging technology, without rebooting. For Windows down to NT4sp6 we used Drive Snapshot by Tom Ehlert in the past, for old Linux boxes stopping basically everything except sshd and kernel and ssh'ing off a tar.gz is usually good enough. Anything even older, like that plant nursery still running a C-64 in production that I saw in my feed three days ago, usually means wrong customer for us (never invests into anything) or rate hike beyond what they are willing to spend (to cover various inevitable unsuspected 'surprises' that always lurk at such jobs).