It seems like lately everything I get involved with falls apart. I had a couple of disks fail at home, losing my redundancy from a rather large ZFS raidz2 pool. No problem I thought, I can still pull the data off in a degraded state while I wait on replacements. Nope, it wasn't to be, 10 mins in, and another crapped out, array offline, no getting that data back anytime soon, if at all. Of course, ZFS being the helpful soul that it is, kept trying to re-silver everything in sight ad-nauseam, making it almost impossible to achieve any meaningful disk access.
Moving on, I pull the 2 definitely dud disks and work on repairing the last one, which I do, eventually, and get it running again. I break sweat, but finally recover 10TB of data successfully, albeit that it was now spread over at least a half dozen random smaller disks of varying years. Not a great place to be in, but at least I got my data back. Replacement disks arrive and I set about getting things back together. More re-silvering...
Wondering why not just wipe the pool and restore from a back-up? Well, I do have one, but it's on the other side of the planet and would involve even more pain to get it here. The really important stuff, I have backed up locally, but that only amounts to about 4TB and is easily portable.
Moving on again, stupidity took hold of me at this point, as it does, and I decided, as it was all bolloxed up anyway, I may as well upgrade everything along the way and start fresh. I cut my teeth not too long ago on Debian, although I have dabbled with other distro's over the years, I like it and it has been good to me, so I move up to using Buster for all my VM's. I'm no expert by any means, but I have been playing with it long enough to find my way around, and anyway, that's for google was invented for, i.e to give you all the answers to the questions you didn't actually ask, but decide you need to know now because they popped up in front of your eyeballs during a search and you didn't want to waste the electricity
I also decide to bring my ESXi up to date and apply the latest 6.7u3 patch. I thought, in for a penny...
I should have had someone slap me at this point.
Well the ESXi thing went sideways immediately, and I was left with a non bootable Host. That was a real pain in the proverbial to get going again, and took me almost 2 days of head scratching, desktop hypervisors and a new Sandisk USB stick to fix. I did eventually get back to where I started, but my head well and truly hurt afterward. My only saving grace was that I usually keep at least 6 old saved config bundles on my laptop, having learned my lesson after last time. As it happened, I needed the oldest one. Now for some reason I fail to explain, despite being able to happily patch up and down until now, I can no longer. The patches all fail with the dreaded ERRNO28 and for various odd random reasons, tools issues, out of disk space when there's plenty, various file issues and cryptic error messages etc. It's an issue that is going to come back to haunt me, I can feel it, but I'll take the win and for the time being, the Host is back up and running, so I move onto upgrading all of my VM's, this should be the easy bit, I thought.
It all looks like it's going great, except, now for some reason my iSCSI config will not load on (re)boot. Much, much digging led me to a bug report from April, before Buster was released I hasten to add, so it could have been fixed beforehand, or at least an update released, stating that one of the libraries used by targetcli is missing a couple of things. Now it does run, if I manually restore my config, but the first couple of times I rebooted, I was like wtf, where did my targets go? Then not realising that my config was still there, just not loaded, proceeded to exit the shell, wiping out said config. The pain didn't stop there either, as the new fancy ZFS 0.8.1 doesn't want to play straight out of the box either. Again, after much pain trawling through other peoples tales of woe, I finally stumble my way through, one by one, all of the issues I was seeing and begin figuring out what the hell was going wrong. I don't remember having any of these ZFS issues when just testing Buster, but I was using 0.8.0 then. Well, it turns out that some libraries are in the wrong place for a start, then there's the fact that it can't import your pool, then it will, if you do it manually, but won't on boot, oh wait, was that my iSCSI, wtf, where did my targets go. You get the idea...
In the end I deleted everything I had just done, got a bare Debian VM install going and took a snapshot. I got so tired trying to remember what steps didn't work that had to reversed again, I just blew it away each time and tried some other approach.
I get to the end of all of that misery and finally get things running again and the power supply in the Host dies, taking the bloody RAM with it. So I quickly do a shonky "get the wife happy again" wiring job (meaning I'll need to spend even more money replacing my new Seasonic cables), swap in some untested RAM and get things going again, while I'm waiting on replacements. All was going great that is up until tonight, when some of my disks began dropping out again, quickly followed by ZFS trying to build me a life raft out of 1's and 0's, which means I can't get at my damned data again. I feared the worst, I could hear my wallet sobbing in the kitchen from the basement, but it turns out that the onboard LSI controller on the Mainboard is now dying. So I have added in a spare controller to get me going yet again just now, and have just ordered a new X9SCM-F to swap out the Mainboard with, because I know where this is heading. I think when I finally get done with this, I'll have the chassis re-galvanised, it feels like that's the only thing I haven't replaced yet
I swear, I should go out and buy a lottery ticket tonight...
Moving on, I pull the 2 definitely dud disks and work on repairing the last one, which I do, eventually, and get it running again. I break sweat, but finally recover 10TB of data successfully, albeit that it was now spread over at least a half dozen random smaller disks of varying years. Not a great place to be in, but at least I got my data back. Replacement disks arrive and I set about getting things back together. More re-silvering...
Wondering why not just wipe the pool and restore from a back-up? Well, I do have one, but it's on the other side of the planet and would involve even more pain to get it here. The really important stuff, I have backed up locally, but that only amounts to about 4TB and is easily portable.
Moving on again, stupidity took hold of me at this point, as it does, and I decided, as it was all bolloxed up anyway, I may as well upgrade everything along the way and start fresh. I cut my teeth not too long ago on Debian, although I have dabbled with other distro's over the years, I like it and it has been good to me, so I move up to using Buster for all my VM's. I'm no expert by any means, but I have been playing with it long enough to find my way around, and anyway, that's for google was invented for, i.e to give you all the answers to the questions you didn't actually ask, but decide you need to know now because they popped up in front of your eyeballs during a search and you didn't want to waste the electricity
I also decide to bring my ESXi up to date and apply the latest 6.7u3 patch. I thought, in for a penny...
I should have had someone slap me at this point.
Well the ESXi thing went sideways immediately, and I was left with a non bootable Host. That was a real pain in the proverbial to get going again, and took me almost 2 days of head scratching, desktop hypervisors and a new Sandisk USB stick to fix. I did eventually get back to where I started, but my head well and truly hurt afterward. My only saving grace was that I usually keep at least 6 old saved config bundles on my laptop, having learned my lesson after last time. As it happened, I needed the oldest one. Now for some reason I fail to explain, despite being able to happily patch up and down until now, I can no longer. The patches all fail with the dreaded ERRNO28 and for various odd random reasons, tools issues, out of disk space when there's plenty, various file issues and cryptic error messages etc. It's an issue that is going to come back to haunt me, I can feel it, but I'll take the win and for the time being, the Host is back up and running, so I move onto upgrading all of my VM's, this should be the easy bit, I thought.
It all looks like it's going great, except, now for some reason my iSCSI config will not load on (re)boot. Much, much digging led me to a bug report from April, before Buster was released I hasten to add, so it could have been fixed beforehand, or at least an update released, stating that one of the libraries used by targetcli is missing a couple of things. Now it does run, if I manually restore my config, but the first couple of times I rebooted, I was like wtf, where did my targets go? Then not realising that my config was still there, just not loaded, proceeded to exit the shell, wiping out said config. The pain didn't stop there either, as the new fancy ZFS 0.8.1 doesn't want to play straight out of the box either. Again, after much pain trawling through other peoples tales of woe, I finally stumble my way through, one by one, all of the issues I was seeing and begin figuring out what the hell was going wrong. I don't remember having any of these ZFS issues when just testing Buster, but I was using 0.8.0 then. Well, it turns out that some libraries are in the wrong place for a start, then there's the fact that it can't import your pool, then it will, if you do it manually, but won't on boot, oh wait, was that my iSCSI, wtf, where did my targets go. You get the idea...
In the end I deleted everything I had just done, got a bare Debian VM install going and took a snapshot. I got so tired trying to remember what steps didn't work that had to reversed again, I just blew it away each time and tried some other approach.
I get to the end of all of that misery and finally get things running again and the power supply in the Host dies, taking the bloody RAM with it. So I quickly do a shonky "get the wife happy again" wiring job (meaning I'll need to spend even more money replacing my new Seasonic cables), swap in some untested RAM and get things going again, while I'm waiting on replacements. All was going great that is up until tonight, when some of my disks began dropping out again, quickly followed by ZFS trying to build me a life raft out of 1's and 0's, which means I can't get at my damned data again. I feared the worst, I could hear my wallet sobbing in the kitchen from the basement, but it turns out that the onboard LSI controller on the Mainboard is now dying. So I have added in a spare controller to get me going yet again just now, and have just ordered a new X9SCM-F to swap out the Mainboard with, because I know where this is heading. I think when I finally get done with this, I'll have the chassis re-galvanised, it feels like that's the only thing I haven't replaced yet
I swear, I should go out and buy a lottery ticket tonight...