Proxmox - Disappearing Datasets

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
As the Proxmox forum isn't helping with my current ZFS issue, I'm asking here in the hope to go into the right direction.

Updated/-graded the system and rebooted which was followed by the mysterious 'loss' of datasets. The datasets are not really gone, as I can still list them with 'zfs list'.
root@pve2:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
storage 129G 7.78T 29.3G /storage
storage/backups 24K 7.78T 24K /storage/backups
storage/iso 24K 7.78T 24K /storage/iso
storage/vm 100G 7.78T 24K /storage/vm
storage/vm/vm-100-disk-0 100G 7.86T 9.72G -
Checking what causes the datasets not to mount at reboot, showed me that the zfs-mount.service is not working anymore:
root@pve2:/var/log/apt# systemctl status zfs-mount.service
● zfs-mount.service - Mount ZFS filesystems
Loaded: loaded (/lib/systemd/system/zfs-mount.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-07 21:40:43 CEST; 13min ago
Docs: man:zfs(8)
Process: 1629 ExecStart=/sbin/zfs mount -a (code=exited, status=1/FAILURE)
Main PID: 1629 (code=exited, status=1/FAILURE)
Aug 07 21:40:43 pve2 systemd[1]: Starting Mount ZFS filesystems...
Aug 07 21:40:43 pve2 zfs[1629]: cannot mount '/storage': directory is not empty
Aug 07 21:40:43 pve2 systemd[1]: zfs-mount.service: Main process exited, code=exited, status=1/FAILURE
Aug 07 21:40:43 pve2 systemd[1]: zfs-mount.service: Failed with result 'exit-code'.
Aug 07 21:40:43 pve2 systemd[1]: Failed to start Mount ZFS filesystems.
As this is a backup server only, for a quick solution I can run 'zfs mount -O -a', get it into /etc/crontab and run it at@reboot. All datasets will be back. But that's not ok. As I'm not a professional in zfs, I could not find a correct solution. I'm also a little bit confused about why ZOF doesn't act as 'old school Linux', where Linux mounts everything if configured correctly, no matter a dir is empty or not.

If someone has a bit of good advice, please let me know. Would be much appreciated.

Thank you in advance.

Regards

Mike
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
I saw the post, yes. But I did not try it so far, as this just looks like a workaround and I couldn't find a confirming post that this specific issue is to be solved like in this thread. So I mounted manually, backed up all the current backups. Did the steps in the last post, including setting the correct cache file. Without it won't work.

I'm just still curious how zfs can 'loose' the cache.

Thank you. For the moment it seems to be solved. Have a nice weekend.
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
No, this issue is not solved. I have the same behavior on another machine now.

I made a 'Backups' directory under my 'storage' pool and scheduled some backups overnight for daily and weekly backups. Unfortunately, the pool must have kind of 'unmounted' itself, as all the backups were done and saved under directory /storage/Backups/ and filled my whole root. Full system crash. The zfs datasets were 100% mounted. I set the Backups through the Proxmox Gui as always ...

Proxmox_Backups.png


ADD:

root@pve1:~# systemctl status zfs-import-cache.service
● zfs-import-cache.service - Import ZFS pools by cache file
Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2020-08-09 20:46:28 CEST; 1 day 16h ago
Docs: man:zpool(8)
Main PID: 1399 (code=exited, status=1/FAILURE)

Aug 09 20:46:28 pve1 systemd[1]: Starting Import ZFS pools by cache file...
Aug 09 20:46:28 pve1 zpool[1399]: invalid or corrupt cache file contents: invalid or missing cache file
Aug 09 20:46:28 pve1 systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Aug 09 20:46:28 pve1 systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Aug 09 20:46:28 pve1 systemd[1]: Failed to start Import ZFS pools by cache file.
root@pve1:~#
Just what I found while the machine was sitting around Aug 09 20:46:28 at IDLE. No backups scheduled. No other jobs scheduled at the time. No startup. Nothing. Mysterious.
 
Last edited:

WANg

Well-Known Member
Jun 10, 2018
1,302
967
113
46
New York, NY
No, this issue is not solved. I have the same behavior on another machine now.

I made a 'Backups' directory under my 'storage' pool and scheduled some backups overnight for daily and weekly backups. Unfortunately, the pool must have kind of 'unmounted' itself, as all the backups were done and saved under directory /storage/Backups/ and filled my whole root. Full system crash. The zfs datasets were 100% mounted. I set the Backups through the Proxmox Gui as always ...

View attachment 15371


ADD:



Just what I found while the machine was sitting around Aug 09 20:46:28 at IDLE. No backups scheduled. No other jobs scheduled at the time. No startup. Nothing. Mysterious.
What was in the /storage directory before you manually zfs mount on top of it? Some stale lock files and/or temporary files that can be deleted?
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Nearly 'everything'. But I was able to mount it manually again and to set overlay and provide new cache config. I'm just curious why one or the other host looses it's cache config and fails to mount. If I would have removed log/cache or so, I would understand. But the pool was untouched for weeks.
 

WANg

Well-Known Member
Jun 10, 2018
1,302
967
113
46
New York, NY
Nearly 'everything'. But I was able to mount it manually again and to set overlay and provide new cache config. I'm just curious why one or the other host looses it's cache config and fails to mount. If I would have removed log/cache or so, I would understand. But the pool was untouched for weeks.
no, I mean, before you use that mountpoint to mount the zfs filesystem it’s supposed to be a blank directory. Were there a few files there by accident?
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Of course, when used first it was empty. Later directories where added. It's not and wasn't blank at all. Or I do not fully understand your question.
 

WANg

Well-Known Member
Jun 10, 2018
1,302
967
113
46
New York, NY
Of course, when used first it was empty. Later directories where added. It's not and wasn't blank at all. Or I do not fully understand your question.
Okay.
Unmount the filesystem, and instead of showing a zfs mount when do you do a df -h . , it should show a regular directory in Linux (ext4) that is supposed to be empty.
Does it contain any files within?
 

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
So I understood you correctly. If zfs filesystem is mounted it shows, as expected, the following:
/storage
/storage/Backups
/storage/iso
/storage/vm/vm-100-disk-0
:
/storage/gnops
If the filesystem is unmounted it shows, also as expected, the following:
/storage/Backups
And that's correct because the directory ../Backups was created in Proxmox for the purpose of saving backups there. Backups were created and placed there for several weeks. So the directory /storage/Backups/ was never empty. The host rebooted several times. Without any problems. But one day, the host couldn't mount the file system anymore. I know this might happen if the cache/log gets lost or was destroyed on purpose. But here nothing was done with the cache/log and the 'drives' are just ok.
 

WANg

Well-Known Member
Jun 10, 2018
1,302
967
113
46
New York, NY
So I understood you correctly. If zfs filesystem is mounted it shows, as expected, the following:


If the filesystem is unmounted it shows, also as expected, the following:


And that's correct because the directory ../Backups was created in Proxmox for the purpose of saving backups there. Backups were created and placed there for several weeks. So the directory /storage/Backups/ was never empty. The host rebooted several times. Without any problems. But one day, the host couldn't mount the file system anymore. I know this might happen if the cache/log gets lost or was destroyed on purpose. But here nothing was done with the cache/log and the 'drives' are just ok.
Hmm...Move ../Backups to somewhere else and see if the issue goes away. Otherwise, see if there is a directory permissions/ownership issue, and then comb your apt logs for changes - upstream maintainers for daemons sometimes do some rather innocuous things on seemingly unrelated libraries that can break services like zfs in unexpected ways. I still remember how a single argument switch in nfsd caused a bunch of home directories to freeze out upon kerberos ticket expiration at my last gig. That one took me WEEKS to unravel.
 
  • Like
Reactions: gb00s

Wolvez

New Member
Apr 24, 2020
18
4
3
I've had this problem a few times. What happens is Proxmox will create the folder for a directory storage if it doesn't exist. For some reason sometimes Proxmox won't wait for the ZFS pool to be mounted at boot before it checks if the directory exists. So it sees the directory isn't there, so it creates it. Then tries to mount the pool but fails because the directory isn't empty. Sometimes it happens with just some datasets and not the whole pool.
I fix it by:
Remove all directory type storage from Proxmox that is on the pool.
Export the pool.
Make sure pool isn't mounted then delete everything from the directory where it is supposed to mount.
Import pool.
Add storage removed earlier.
 
  • Like
Reactions: dswartz

gb00s

Well-Known Member
Jul 25, 2018
1,177
587
113
Poland
Thank you, to both @WANg && @Wolvez

I deleted directories /storage/*. From the logs, I can see that ZFS mounts 'very late' in the boot process. I have chosen the 'dirty' way and set a '@reboot root zfs mount -O -a' in /etc/crontab. I always recommended Proxmox under version 5 due to its super stability. But, with the move to Proxmox 6 it started to have weird issues almost weekly. It feels like a 'rushed' out product. made me so insecure about it.
 

dswartz

Active Member
Jul 14, 2011
610
79
28
I've had this problem a few times. What happens is Proxmox will create the folder for a directory storage if it doesn't exist. For some reason sometimes Proxmox won't wait for the ZFS pool to be mounted at boot before it checks if the directory exists. So it sees the directory isn't there, so it creates it. Then tries to mount the pool but fails because the directory isn't empty. Sometimes it happens with just some datasets and not the whole pool.
I fix it by:
Remove all directory type storage from Proxmox that is on the pool.
Export the pool.
Make sure pool isn't mounted then delete everything from the directory where it is supposed to mount.
Import pool.
Add storage removed earlier.
Back when I was using proxmox, I got burned by this more than once. Complaints on their forum were not received gracefully. For that reason (among others), I ended up moving away from proxmox.
 

WANg

Well-Known Member
Jun 10, 2018
1,302
967
113
46
New York, NY
Back when I was using proxmox, I got burned by this more than once. Complaints on their forum were not received gracefully. For that reason (among others), I ended up moving away from proxmox.
Yeah, their forums are nearly as insufferable as FreeNAS/TrueNAS core.
 
  • Like
Reactions: dswartz