HP Blade ESXi Driver/firmware updates

modder man · Dec 1, 2015

Wondering if anyone has any great practices for updating firmware/drivers on HP Blades? On our last go round of updates we spent 18 months to get them updated. At that rate we will always be behind. It looks like the only way to update all firmware is with the SPP and then drivers will still have to be updated after that. Is that an inaccurate statement?

markpower28 · Dec 1, 2015

update one node at a time, firmware/bios first then drivers. that's why UCS is taking over...

modder man · Dec 1, 2015

Yeah we are starting to switch to UCS, very slow process for us though. There are just to any servers to switch them over quickly.

markpower28 · Dec 1, 2015

There are lots of videos from last VMworld, the one thing consistent is UCS is in every data center.

modder man · Dec 1, 2015

Yeah we are getting there, I think we have 200 or so UCS blades at this point. Unfortunately it has not been switched to our new standard yet so we are also buying gen9 blades as well. That said we will have some HP in our environment for at least 5-6 years still.

EffrafaxOfWug · Dec 1, 2015

Depending on your environment (assuming you're using C7000 enclosures?) you might be able to get away with using the SPP in tandem with Enclosure Firmware Management; that'll allow you to do the firmware relatively easily if you can afford the boxes being offline. It's a slow process though, and riddled with gotchas - for instance, it'll fail if your update ISO is bigger than 4GB so you need to make a custom SPP which... well, goes against half the point of the SPP really.

For our windows boxes we normally do drivers and firmware in one fell swoop during the maintence window; run up HPSUM on a workstation, point at your windows servers, it'll go do its scan and be able to apply firmware packages as well as updated drivers and software. Sometimes this is a bit hit-and-miss (e.g. a NIC firmware or driver update would take the machine off the network so HPSUM couldn't continue doing its thing) but we've not had a problem like that for a couple of years now. Then reboot the box to apply the new firmware. It's technically riskier than doing the two separately yes but makes for vastly lower maintenance overhead.

Your HP rep might tell you that OneView is The Way Forward and will make your firmware management issues go away. Short answer: no it won't. Long answer: f**k no it won't and you'll lose sleep, hair and SLAs over it. Avoid at all costs if at all possible.

It's really only our ESX servers we care about here in terms of firmware (windows and linux nodes get fix-on-fail treatment unless there's a security vuln) so we can take them in and out of service relatively easily and HPSUM does an OK job of updating the firmware and agents on them without issue. Now if only HP would learn to start doing QA on their firmware releases we might start seeing some customer satisfaction.

Long story short, our experience is that keeping up with HPs furious release schedule for fixing all their own bugs is a rod for your own back and you're better off holding off upgrades that aren't applicable to you and are only feature enhancements (or can be mitigated through DiD). Quicker to read the release notes and decide upgrades aren't worth it than to do HPs QA for them IMNSHO

I too used to have upgrade-itis but five years of dealing with HP firmware has made me exceedingly conservative in this regard.

/cynical and occasionally exasperated HP customer that found this site after investigating the word on the street for commodity whitebox builds

markpower28 · Dec 1, 2015

For HP, I believe you can setup a management server to push BIOS/firmware. You can update windows drivers remotely not sure about other OS.

EffrafaxOfWug · Dec 1, 2015

Yeah you can do remote upgrades for windows, linux and vmware for both the drivers and software as well as the firmware components. We found it much, much easier to have special management workstations running the SPPs and manually pushing updates to groups during maintenance windows (as we'd need someone online to do testing after the box was rebooted anyway) than using any of HPs attempts at centralised maintenance and having them run amok. Our estate runs to a few hundred blades across thirty or so enclosures, not sure about other teams.

Helped that we had buy-in from our management (going from prior experience with HP support) that "upgrade the firmware!" (and the ensuring problems that that caused) would not be an automatic step taken in resolving issues unless HP could point to the release notes in question that claimed the issue was fixed and was indeed the same as the one we were experiencing.

TuxDude · Dec 1, 2015

My typical process is to manually put the blades into maintenance-mode in vCenter first, then use VMware update manager to install any VMware updates as well as HP drivers/agents/etc. When that is complete and the blade still in maintenance-mode, I reboot it over to the SPP and let that bring the firmware up to date. Reboot when that is done back into ESX, exit maintenance-mode, and move on to the next one. The vast majority of the time is spent waiting for the SPP which makes it very easy to be working on a bunch of blades in parallel - around here that usually means one node from each cluster is in maintenance-mode simultaneously but I only have 30ish blades to do, at your larger scale you can probably have significantly more blades offline at once while keeping enough resources online for production. I don't have a problem getting 10 blades fully up to date with drivers/firmware during a regular 8-hour day - being able to do more in parallel a single guy should be able to have all 300 of your blades done in under a month easy.

modder man · Dec 1, 2015

The architect of our environment did not set-up host level redundancy as in we cannot just pull a host out and migrate VM's to another. Each VM has a dedidcated host and can only live on that host. That said obviously if the host goes down so does the VM. We provided redundancy at the VM level, As in if a client needs 4 VM's they are allocated 8 spread across different hosts. So even though a loss of one VM's does not take the client down we are still not allowed to take hosts down without a large amount of planning. The process is very broken in my mind. Would be much easier with the use of vMotion.

EffrafaxOfWug · Dec 1, 2015

That sounds so very broken to me that I think it warrants a

o_0

on its own line. It's throwing away one of vmware's biggest advantages (namely, no reliance on a single bit of hardware) for a start, and replacing it with AFAICT application-level complexity... ugh. Just ugh. I'm sure there were reasons for it but part of me thinks that "suffering from a serious head wound" isn't a very valid business case...

We're not allowed to take any of our production ESX nodes down during business hours either, and most of our clusters are small enough that we can't take down any more than two nodes in them at a time without violating resource constraints but as TuxDude points out you've still got a lot of options for parallelisation whichever way you slice it.

TuxDude · Dec 1, 2015

I have to agree - if you aren't using VMware to turn a collection of physical boxes into a single pool of resources where no individual host matters, then how are you justifying the license cost? VMware isn't cheap (even for us with academic pricing, where MS is practically free), and resource-pooling/vMotion/storage-vMotion are the game-changing features brought to the table that make it worth it.

I do have to have a change-request approved ahead of time including the schedule of when I plan to work on the hosts - but with that I am allowed to work on production hosts during business hours (one node per farm). It's just balancing risk vs cost - the change-request gives business people a chance to say "no you can't work on things next week, finance is doing year-end budgeting or whatever" so risk to important processes is minimized, and doing it during business hours means no overtime, so costs are minimized as well. I don't think I've had to work evenings/weekends for a vmware-related upgrade in probably 5 years, though major SAN upgrades are still done after hours.

modder man · Dec 1, 2015

Thats what I wonder. It would seem that the biggest features of Vmware are just flat out not being used. I know it is partially to save on citrix's listening costs. That said I really have no exposure to that part of our environment. I am only responsible for maintaining hosts at the host level.

NME · Dec 1, 2015

@ EffrafaxOfWug: Did you tried HP OneView 2.0 ? Is it worth trying ?

EffrafaxOfWug · Dec 2, 2015

I think the last version we allowed anywhere near our servers was 1.20. Our experiences with that were enough to convince us that the methodology behind it was fundamentally broken. By all means try it if you've got the spare coin (it's very definitely not free if you want to use it as anything other than a not-that-great flashy dashboard and SIM is orders of magnitude more useful), but the advanced management features (most explicitly the VCEM-alike stuff) plain didn't work, the API reference was a) incomplete and b) frequently wrong and you even lose the ability to manage your blades through the OAs. This all-or-nothing approach combined with several enclosure-breaking bugs (to the extent we had to erase several OAs and redeploy the OneView appliance from scratch multiple times) earned it a coveted "Until Hell Freezes Over" medal.

We didn't get as far as attempting to manage firmware with it though, although from what modder man says this wouldn't help him any anyway given that his problem seems to be the way their nodes are architected.

Back on topic - modder man, from the sounds of it and given that you're not directly exposed to the environment it sounds like your company may well have cheaped out and gone for the "free" ESX hypervisor-only route without using any of them advanced features from virtual centre. You're still stuck between Iraq and a hard place for getting the hosts bounced given your rather ridiculous constraints but yeah, you'll be able to at least install things on the fly... but I think to do so would be meaningless without being able to reboot straight afterwards to verify that the new drivers/firmware are loaded and actually work.

modder man · Dec 2, 2015

We definitely do not have free esxi.We have 27 vcenters in our datacenter, due to host count limitations. As you said it is just a strange architecture with the way they chose to deploy citrix application servers on top of esxi. Since the citrix app servers are redundant they felt no need to make the hosts redundant as well. While I agree from a prod standpoint it make maintenance a giant pain in the butt.

markpower28 · Dec 2, 2015

Pvs with xenapp? It is true ha is handled at app/os layer. It is true local storage performance is better than SAN for write cache. For a company who has 27 vcenters should able to hire a few entry-level admin to handle hardware

modder man · Dec 2, 2015

That was the way it was handled last go round. We had 10 entry level guys, still took 18 months. Was just making sure that I wasn't missing any other obvious options. We have a plink script that I was using to install individual CP's. Works well and keeps the downtime lower, problem is that I cant find CP's for everything example: nic firmware.

markpower28 · Dec 2, 2015

With the amount of the HP investment you have, have you explorer your option with your HP rep yet?

modder man · Dec 2, 2015

I personally have not as we have an architect here that work with HP. According to him we have been pushing them for years to fix the lack of functionality at this type of scale they have in their firmware update tools. As you said I find it odd that HP would not be a bit more inclined to fix this as they are about to lose a very large client.

HP Blade ESXi Driver/firmware updates

Active Member

Active Member

Active Member

Active Member

Active Member

Radioactive Member

Active Member

Radioactive Member

Well-Known Member

Active Member

Radioactive Member

Well-Known Member

Active Member

New Member

Radioactive Member

Active Member

Active Member

Active Member

Active Member

Active Member