HP H220 Fine under Ubuntu 16.04 LTS, but causes corruption under Latest Proxmox?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

mattlach

Active Member
Aug 1, 2014
328
91
28
Hey all,

I have been using a HP H220 controller flashed to IT mode in one of my HP servers running Ubuntu 16.04 LTS for some time.

One of the ports had a single SAS cable going to the backplane (which had a built in SAS expander) and the other was connected to a SATA breakout cable which I had my boot drives connected to (two Samsung SSD's mirrored in ZFS)

I recently retired this server, and when I did, decided to move the H220 controller over to my other server. As soon as I installed it and connected it, ZFS freaked out, found lots of errors on the boot SSD's (same type of SSD mirror boot config on this one) and wanted to resilver.

I panicked, shut down then:
- Checked all cable connections, no issues, did not solve the problem
- replaced the SAS breakout cable, in case it was damaged, did not solve the problem.

Then I moved the sata drives to another known good machine to check and repair the ZFS mirror. Luckily no data was lost.

After this, I gave up, and plugged the drives into the on-board sata ports on the server, where they work just fine, without any errors reported.

The old server being retired where the H220 worked just fine was an old HP DL180 G6, running Ubuntu 16.04 LTS.

The new server the H220 went into where I had corruption issues is a custom build around a Supermicro X9DRI-F running Proxmox VE 6 (debian based)


Anyone have any ideas why it would work perfectly in one, but cause corruption issues in another? If possible I'd like to use this card as it is PCIe Gen 3 and supports better bandwidth use than the old SAS2008 controllers I am using right now, but I don't trust it after that last issue.

Could this be a driver/firmware "phase" mismatch?

How do I check which driver phase each has to make sure they match?

Much obliged,
Matt
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,672
1,081
113
artofserver.com
you probably need to share some more info. i don't know specifically what you mean by "ZFS freaked out"... that could mean a lot of things, from read errors, write erros, checksum errors, or that you saw visions of zombies with the letters "Z" "F" "S" in their foreheads.

would also be helpful if you show the kernel logs for the driver. something like "dmesg |grep mpt" should show you. would also be helpful if you can get the scsi and ata driver messages.
 

mattlach

Active Member
Aug 1, 2014
328
91
28
you probably need to share some more info. i don't know specifically what you mean by "ZFS freaked out"... that could mean a lot of things, from read errors, write erros, checksum errors, or that you saw visions of zombies with the letters "Z" "F" "S" in their foreheads.

By "ZFS freaked out" I got a ton of checksum errors. First hundreds on one of the drives, but not on the other in the mirror causing ZFS to drop the drive. I then shut down, checked all the cable connections, replaced the SAS cable just in case. I cleared the errors and tried a resilver, I got a small number of errors (11?) on the other drive too during the resilver process, which I then interrupted, removed the drives, and popped them into another server using sata. A resilver found absolutely nothing wrong with them in that server.

I then moved them back to the affected server, and just plugged them into onboard SATA instead, without any issues.

would also be helpful if you show the kernel logs for the driver. something like "dmesg |grep mpt" should show you. would also be helpful if you can get the scsi and ata driver messages.
Hmm. These are probably long gone in history of DMESG. I have a few LXC containers which have php running in them which are absolutely spamming me with apparmor error messages every time (every 30 minutes per guest) they run the php sessionclear script. This is apparently a known bug in Ubuntu, which I just have to deal with for now.

Actually, it wouldn't be in dmesg anymore anyway, as I have rebooted several times since I removed the SAS card.

I guess I could reinstall it and attach some drives I don't care about to see if I can provoke the issue into returning.