Storage Space - lost virtual disk

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BestGear

Member
Aug 25, 2014
59
3
8
44
Guys

If you have any ideas, it would be very much appreciated... bear with me....

Been running a 2012R2 storage space pool of six 3TB drives for years - no issues. The server OS was upgraded to server2019 around a year ago, and the storage space was not upgraded at the time of the OS upgrade.

One physical drive started to show smart errors (cautions) so wanted to replace failed drive.

I upgraded the storage space to 2019 level. This worked (or at least gave no errors and continued to work fine).

I added a new 3TB drive to the pool - which it appeared to do, again without error.

Several hours later, I attempted to remove the failing drive, and it started the process.... but before it completed, the server blue screened.

This server has *never* blue screened - so no reason to believe it is hardware failure. Oh, the box has plenty of RAM too.

Anyway - now the server is back up - the virtual disk has dissapeared.

Get-StoragePool shows the friendlyname as primodial, ok, healthy,isPrimordial=true,not read only and the correct capacity (ie seven times 3TB capacity).

Disk manager shows only one 3tb drive - which is the 3tb drive that it had started to remove from the pool. The other drives are not visible in disk manager, but ARE all visible when you check with get-Physicaldisk, so are all there.

Server Manager/storage pools show Primordial with physical disk assigned is a single drive - which is the one that was awaiting removal.

Before I attempt any manoeuvres to try and repair this, anyone have some advice as how to approach this fix?

Blowing the drive away and restoring data is an option (as its all backed up) but time is against me on that one as the restore will take ages.


Any advice is very much valued and appreciated.


David
 

BestGear

Member
Aug 25, 2014
59
3
8
44
Hi....

I tried ReclaiMe (no license purchase) - just to see what it sees on the drives... and it shows two virtual disks - both named with the disk volume I was expecting.

The first one says damaged-repairable but the second is healthy and intact.

Shame the tool wont fix it in place - looks like you pay the $$$$ to copy the files off to another storage location.

I have a backup, so aint paying to get it back, but cant believe there is not a sensible way to recover in place?!??!


David
 

BestGear

Member
Aug 25, 2014
59
3
8
44
I think I have spotted something that may be of help... the drive I tried to remove, is showing in get-physicaldisk as "CanPool True" and the othe r six spindles are all sitting with OperationalStatus as "Starting, OK" - any thoughts?

upload_2020-3-22_17-9-57.png
 

BestGear

Member
Aug 25, 2014
59
3
8
44
Hi, and thanks for the reply.

Repair-SpacesConfiguration - weirdly, this gives the error below:

upload_2020-3-22_18-14-12.png

I am surprised by this - running on an update to date Server2019 DC instance. I cannot figure out why its offline as all drives are there - with the only noted (above) detail of six drives stating "starting, ok"


Next - the storage space IS flagged as primordial - which it "went" after the blue screen reboot....

upload_2020-3-22_18-17-23.png

So - I wonder if the "starting, ok" is a red herring... but that would explain why server manager sees the pool - see below....

upload_2020-3-22_18-18-12.png

Please note: the physical disks only show the drive that was being prepped for removal - and the drives that above show as "starting, ok" are not listed as assigned - even in the primordial.

upload_2020-3-22_18-19-4.png


Weird - having spent this afternoon reading online, I fear the zapping the drives and restoring data may be the way forward... but live in hope that something can be done - especially as ReclaiMe shows the pool and virtual disk DATA1 (single within that pool) as being healthy.


upload_2020-3-22_18-22-45.png

The top DATA1 instance, shows the "disk missing or bad metadata" - that shows the config with the drive I was trying to remove, removed, and the new drive not added.

The second DATA 1 listed, shows the pool with the original 6 drives, and the newly added drive missing.

The drive I was removing was still operational but had shown a single relocated sector, hence I was getting it out before trouble appeared...

I think the "OperationalStatus" set to "starting, OK", for drives is a curve ball as the OperationalStatus for the primordial is "OK".



Any suggestions!??!?!


David
 
Last edited:

BestGear

Member
Aug 25, 2014
59
3
8
44
Well, after many hours, I am looking at zapping the drives and going for a restore....

I hate giving up - I would have much preferred to learn from the experience!

Bottom line is, the drives are sitting as "Starting, OK" as operational status, and that is what I believe is preventing them coming online.

I dont believe there are any hardware issues - there were none before adding a disk, and other rools all see the drives, and SMART reports clean.

If anyone has some last minute thoughts, they would be appreciated (other than the advice NOT to use Storage Spaces...).


David
 

ecosse

Active Member
Jul 2, 2013
463
111
43
I had a quick google - I know nothing about storage spaces - but I hate to see someone in this situation. I didn't mind much that looked relevant - the ones I found people gave up and restarted. I unfortunately only have two general comments:
  1. What does the Windows system event logs say? Is there anything interesting in there at all?
  2. If you haven't, try patching Windows up to the latest patch level. I have had things miraculously work after a patch cycle even though there is noting in the release notes to suggest a fix
Best of luck!

OK I will reference this site. https://www.checkyourlogs.net/the-c...k-lost-communication-storagespacesdirect-hci/ - if you reset a disk will you lose data?
 

BestGear

Member
Aug 25, 2014
59
3
8
44
Thanks Escosse (Are you from, or in Scotland? I am beside Edinburgh..l)

Logs show no errors... the only clue if you like is that the drives were stuck at "Starting, OK" - and never appear as started.

Sero errors of any sort - just no storage space, and you cant get a sniff of the virtual disk if the underlying storage space is not in good form.

I lifted the drive set out and into another box - this time a fully patched Windows 10, and it remained the same - no change at all.

I have ended up wiping all the drives (used DiskGenius as it would see the drives when other tools would not) and setting it all up again and just started restoring the 10-11TB of data.


All good fun... and even with resilience in spindles, shows that backups are still mandatory...


David
 

ecosse

Active Member
Jul 2, 2013
463
111
43
Thanks Escosse (Are you from, or in Scotland? I am beside Edinburgh..l)
David
My father was scottish - he got me at an early age; if only I'd known how sh1t we are at football :) I live near Newark (UK).

What are you storing on this array? Not that this negates your backup advice but this is why I like snapraid for media files - the disks are native to the OS so a single loss of a disk shouldn't kibosh the entire array.
 
  • Like
Reactions: BestGear

Net-Runner

Member
Feb 25, 2016
81
22
8
41
Since you have already wiped the data, my advice is a little bit too late, but I will still share my experience for those guys who will find this thread later using search.

I had a similar issue a while ago. The only difference in my case that it was not a BSOD but a server room blackout. Another difference it was a Windows Server 2019 installed initially. The rest of the story is pretty the same. Drive failure. Added a replacement. Started the process. Puff!

Unfortunately, wiping everything was not an option for me at all. It was a customer's server, and some data on it had no backup. I've tried pretty much everything I could google. By the end of the day, I could invent the way to get the pool back, but the trick is pretty unobvious.

If the disks are visible in Disk Management, you can spin up a virtual machine on that same server with freshly installed Windows 10 or Windows Server 2019 and pass the disks through to the virtual machine. The OS has no idea about the storage spaces present but detects the disks headers and allows you to recognize the pool and repair it. If this is not the option, then reinstallation of the OS, dual-booting a fresh temporary Windows OS from an external USB drive, or moving the disks physically to another computer/server also allows you to fix this problem.

I hope it will help someone.
 

gregsachs

Active Member
Aug 14, 2018
562
192
43
Since you have already wiped the data, my advice is a little bit too late, but I will still share my experience for those guys who will find this thread later using search.

I had a similar issue a while ago. The only difference in my case that it was not a BSOD but a server room blackout. Another difference it was a Windows Server 2019 installed initially. The rest of the story is pretty the same. Drive failure. Added a replacement. Started the process. Puff!

Unfortunately, wiping everything was not an option for me at all. It was a customer's server, and some data on it had no backup. I've tried pretty much everything I could google. By the end of the day, I could invent the way to get the pool back, but the trick is pretty unobvious.

If the disks are visible in Disk Management, you can spin up a virtual machine on that same server with freshly installed Windows 10 or Windows Server 2019 and pass the disks through to the virtual machine. The OS has no idea about the storage spaces present but detects the disks headers and allows you to recognize the pool and repair it. If this is not the option, then reinstallation of the OS, dual-booting a fresh temporary Windows OS from an external USB drive, or moving the disks physically to another computer/server also allows you to fix this problem.

I hope it will help someone.
That is seriously genius.
One note is that storage spaces has different versions, so be sure to not accidentally upgrade the storage space to a newer version than the original host. I know 2012 and 2016 are different versions, and I'd expect that 2019 is as well. I don't know if w10 supports 2016 SS version or not.
 
  • Like
Reactions: Net-Runner

Net-Runner

Member
Feb 25, 2016
81
22
8
41
That is seriously genius.
One note is that storage spaces has different versions, so be sure to not accidentally upgrade the storage space to a newer version than the original host. I know 2012 and 2016 are different versions, and I'd expect that 2019 is as well. I don't know if w10 supports 2016 SS version or not.
Thank you very much for that great tip and clarification. Forgot to mention that the same (or at least similar) OS version should be used. I thought this one is somewhat obvious but who knows :)
 

BestGear

Member
Aug 25, 2014
59
3
8
44
What are you storing on this array? Not that this negates your backup advice but this is why I like snapraid for media files - the disks are native to the OS so a single loss of a disk shouldn't kibosh the entire array.

All media - so not worried at write speeds.... will certainly look at snapraid as I really like the idea of the drives being native to the OS....and backup to disk is a costly game.
 

BestGear

Member
Aug 25, 2014
59
3
8
44
If the disks are visible in Disk Management, you can spin up a virtual machine on that same server with freshly installed Windows 10 or Windows Server 2019 and pass the disks through to the virtual machine. The OS has no idea about the storage spaces present but detects the disks headers and allows you to recognize the pool and repair it. If this is not the option, then reinstallation of the OS, dual-booting a fresh temporary Windows OS from an external USB drive, or moving the disks physically to another computer/server also allows you to fix this problem.

I hope it will help someone.

Hi - I did try moving all spindles over to another box that had Windows 10 running physically - and it showed me the same error state. Given that the "host" was Server2019 all patched up and the Windows 10 box was latest too, I had high hopes...

Did not think to try your approach but kinda expect it to be the same as i found doing it physically.... may be wrong though.

All good stuff this... just need to always make sure the backups are ok!

Out of interest, I have swapped out many drives (increasing capacity) on Windows10 and found the simpler GUI interface (to Server2019) easy and so far, robust in operation...which is surprising given their common parentage.


David
 
  • Like
Reactions: Net-Runner

Net-Runner

Member
Feb 25, 2016
81
22
8
41
Hi - I did try moving all spindles over to another box that had Windows 10 running physically - and it showed me the same error state. Given that the "host" was Server2019 all patched up and the Windows 10 box was latest too, I had high hopes...
Did not think to try your approach but kinda expect it to be the same as I found doing it physically.... may be wrong though.
All good stuff this... just need to always make sure the backups are ok!
Sad to hear that. This approach helped my colleagues and me several times, messing out with Storage Spaces. And yes, you are right; it does not matter whether it is physical or virtual. In my case, I just didn't have the option of a second physical box that could host all my drives. It looks like, in your case, the whole Storage Space got critically corrupted because of a direct upgrade from 2012 to 2019, which is a massive leap into the future. Possibly a more smooth routine like 2012->2016->2019 would make things better but who knows :-(

Out of interest, I have swapped out many drives (increasing capacity) on Windows10 and found the simpler GUI interface (to Server2019) easy and so far, robust in operation...which is surprising given their common parentage.
David
Have you tried using Windows Admin Center for this purpose? I did not, but the rumors have it is even better than all the classic thick tools altogether.
 

BestGear

Member
Aug 25, 2014
59
3
8
44
Have you tried using Windows Admin Center for this purpose? I did not, but the rumors have it is even better than all the classic thick tools altogether.

Hi - yes, did try admin centre but got fed up with it as you must keep it updated else it times out... pain in the butt when you just want to nip in for a quick tweek. Other than that - its a good tool - saves using powershell for some more complex setups.


What was the biggest disappointment was that ReclaimME tools saw the space and showed it was in good (recoverable) shape. It would not fix it in place but would have allowed me to copy the data off the drives which would have been a saviour if I did not have a backup.


David