Can't migrate VMs in clustered ProxMox

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

JC Connell

Member
Apr 17, 2016
52
6
8
36
I added an old laptop as a second node to my existing ProxMox homelab. The goal was to experiment and learn and I wanted to be able to take advantage of live migration. I can't migrate VMs or transfer via backups at all and I'm not sure why. Looking for some help.

The first node already had ProxMox installed with a handful of VMs. I installed ProxMox on the second node, created a cluster on the first node and added the second node to the cluster. This was easy.

On the first node, there is a single root drive (local). There is also a ZFS VM pool with 2 SSDs in a RAID 0 configuration (r0ssd400gb).

On the second node, there is a ZFS pool with 2 SSDs in RAID 0 (local). I've also configured a ZFS directory as r0ssd500gb.

I would like the local storage (root) on Node 1 to be used only for the OS and backups, tempaltes, etc. I would like the SSD arrays on Nodes 1 and 2 to be used only for VMs. In some cases it seems that I transfer VMs from the SSD array on Node 2 to the root drive on Node 1. I can also transfer VMs created on Node 2 to Node 1. I cannot transfer VMs created on Node 1 to Node 2. I've also noticed that r0ssd400gb is listed as not active on the second node.


LXC From Node 1 to Node 2:
Code:
Aug 16 21:47:16 starting migration of CT 252 to node 'pve2' (10.0.1.15)
Aug 16 21:47:16 found local volume 'r0ssd400gb-zfs:subvol-252-disk-1' (in current VM config)
send from @ to r0ssd400gb/subvol-252-disk-1@__migration__ estimated size is 1.07G
total estimated size is 1.07G
TIME        SENT  SNAPSHOT
cannot open 'r0ssd400gb/subvol-252-disk-1': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
warning: cannot send 'r0ssd400gb/subvol-252-disk-1@__migration__': Broken pipe
Aug 16 21:47:16 ERROR: command 'set -o pipefail && zfs send -Rpv r0ssd400gb/subvol-252-disk-1@__migration__ | ssh root@10.0.1.15 zfs recv r0ssd400gb/subvol-252-disk-1' failed: exit code 1
Aug 16 21:47:16 aborting phase 1 - cleanup resources
Aug 16 21:47:16 ERROR: found stale volume copy 'r0ssd400gb-zfs:subvol-252-disk-1' on node 'pve2'
Aug 16 21:47:16 start final cleanup
Aug 16 21:47:16 ERROR: migration aborted (duration 00:00:00): command 'set -o pipefail && zfs send -Rpv r0ssd400gb/subvol-252-disk-1@__migration__ | ssh root@10.0.1.15 zfs recv r0ssd400gb/subvol-252-disk-1' failed: exit code 1
TASK ERROR: migration aborted

Here are some photos: Imgur: The most awesome images on the Internet
 
Apr 2, 2015
48
1
8
46
i'm guessing your issue is with the error message" found stale volume copy" on node "pve2". can you list the zvols on node2?

Sent from my SM-G930F using Tapatalk
 

JC Connell

Member
Apr 17, 2016
52
6
8
36
There is a picture of the zvols in the Imgur album above. I've noticed that the r0ssd400gb zvol from Node 1 is listed as not active on Node 2.
 

kroem

Active Member
Aug 16, 2014
252
44
28
38
Maybe I'm totally wrong but I don't think you can transfer from zfs (node1) to zfs (node2). You (or I do at least have to...) need to transfer zfs (node1) > nfs (node2) > zfs (node2)

That's my experience... I'd love to be wrong though :)
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,518
5,819
113
I have not set this up on Proxmox for some time so they may have changed something. From what I remember you used to not be able to live migrate from local ZFS storage on machine 1 to local ZFS storage on machine 2 and therefore you had to use shared storage (e.g. a FreeNAS machine or using Proxmox GlusterFS/ Ceph) to be able to live migrate.

I think qemu can do live migration of the machine including storage but I thought Proxmox had only implemented moving the machine part. That is why live migration works with shared storage but not local ZFS.

Again, that may be out dated since the last Proxmox cluster I built to use this was around a year ago and my brain cell that contains this information has aged. I would love to be wrong on this one.
 
  • Like
Reactions: rubylaser

rubylaser

Active Member
Jan 4, 2013
846
236
43
Michigan, USA
I have not set this up on Proxmox for some time so they may have changed something. From what I remember you used to not be able to live migrate from local ZFS storage on machine 1 to local ZFS storage on machine 2 and therefore you had to use shared storage (e.g. a FreeNAS machine or using Proxmox GlusterFS/ Ceph) to be able to live migrate.

I think qemu can do live migration of the machine including storage but I thought Proxmox had only implemented moving the machine part. That is why live migration works with shared storage but not local ZFS.

Again, that may be out dated since the last Proxmox cluster I built to use this was around a year ago and my brain cell that contains this information has aged. I would love to be wrong on this one.
Yes. This is still how it functions. You need some form of shared storage to live migrate, or you shutdown the cm and do an offline migration directly.
 
  • Like
Reactions: Patrick

JC Connell

Member
Apr 17, 2016
52
6
8
36
A ProxMox dev was kind enough to send me some help on their forums. I was able to achieve the desired result by naming both pools the same thing. The names aren't as descriptive of the pools now (A trick I picked up from a tutorial here at STH), but it works. Live migration of LXC is not possible, but live migration of VMs and offline migration of both are possible.

Host 1:
- Root (Proxmox, backups)
- Raid 0 of Toshiba SSDs = 400GB

Host 2:
- Root (Proxmox, backups)
- Raid 0 of Samsung/Crucial SSDs = 500GB

Both raid 0 arrays are named "r0ssd400gb". All drives are ZFS.

I don't have the hardware yet to run a machine for shared storage but it's something I'd like to do very soon.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,518
5,819
113
@JC Connell great workaround. I use descriptive names just to make it easy to manage in larger clusters. Maybe that is a bad habit.
 

RobstarUSA

Active Member
Sep 15, 2016
233
104
43
I have not set this up on Proxmox for some time so they may have changed something. From what I remember you used to not be able to live migrate from local ZFS storage on machine 1 to local ZFS storage on machine 2 and therefore you had to use shared storage (e.g. a FreeNAS machine or using Proxmox GlusterFS/ Ceph) to be able to live migrate.

I think qemu can do live migration of the machine including storage but I thought Proxmox had only implemented moving the machine part. That is why live migration works with shared storage but not local ZFS.

Again, that may be out dated since the last Proxmox cluster I built to use this was around a year ago and my brain cell that contains this information has aged. I would love to be wrong on this one.
You can use "non-shared" storage with an Active/Active DRBD setup with LVM on top. I did this for quite some time before switching back to a single server. This way you can use local disk with DRBD on each node and you have to tell LVM it's shared storage. When a VM starts on one, the virtual disk state (at the LVM level) changes to "o" for the logical volume meaning it's open. On the passive node, that state will replicate and it will know the first node has the volume open.

I did this a couple proxmox VE versions ago & it worked flawlessly. This was with DRBD8, BTW. The only downside is you have to have 2x as much storage to replicate everything between two nodes. I've never tried this with more than two nodes or DRBD version > 8.
 
  • Like
Reactions: Patrick