Hello,
I'm running Napp-IT on OmniOS in a VM running on ESXi.
ESXi : 6.7.0 Update 1 (Build 10764712)
OmniOS: OmniOS v11 r151030ex
Napp-IT: 21.06a7
LSI 2008 HBAs being passed through to Napp-IT (x3)
3 storage pools running total:
1) Mirrored SSDs (primary storage for VMs)
2) Stripped mirrors (20TB) (retired pool - now just used for temporary items - mostly empty)
3) Raidz2 pool (40TB) - this is the pool which is giving me trouble. Its fairly full - about 4.5TB available.
This configuration has been running for many years with minimal intervention.
I have backups of the most critical data from this pool (documents, photos etc.), but it stores a ton of media which I don't typically backup. Total pool size (useable) is just under 40TB.
Since yesterday evening, I've been troubleshooting an issue that has cropped up with a raidz2 pool on my home fileserver. I noted it was showing 2 disks as removed.
After shutting down all of the running VMs, I was able to shutdown the server as well. I physically looked at the front of my chassis for obvious signs of trouble, and seeing nothing out of the ordinary reseated all of the drives by removing and reinserting them while the server was shutdown. After restart, the drives showed up as online, and a resilver completed successfully.
I had been running older versions of OmniOS and Napp-IT, so I decided to update both. Napp-IT went from 18.12 to 21.06, and OmniOS from r151028 to r151030.
After all of this competed successfully, I decided to initiate a scrub of the pool and called it a night after monitoring it making progress for an hour or so. This morning I woke up and have the same issue again - 2 disks showing as removed from the same pool. It could be by imagination (don't think it is), but I believe it is 2 different drives that showed as removed this time.
I shutdown the running VMs, and rebooted Napp-IT (no shutdown - just an init 6 from the CLI) and the drives have showed up again as online, and started reslivering again.
So - I'm looking for suggestions as to what I should do next. I can confirm that I've already physically looked at the server, but I haven't cracked it open. I haven't been inside the chassis for a few months. The server is quite old (2x Xeon(R) CPU E5-2650 v2 CPUs), as are the HBAs (3x IBM Serveraid M1015), but still meets my performance needs.
Thank in advance for your help!
PS - I have a copy of what dmesg shows, but I'm not clear on what the correct way of sharing that kind of info is on STH. It shows "multipath status: degraded:" errors for the two devices that were "removed" (sd26 and sd30)
I'm running Napp-IT on OmniOS in a VM running on ESXi.
ESXi : 6.7.0 Update 1 (Build 10764712)
OmniOS: OmniOS v11 r151030ex
Napp-IT: 21.06a7
LSI 2008 HBAs being passed through to Napp-IT (x3)
3 storage pools running total:
1) Mirrored SSDs (primary storage for VMs)
2) Stripped mirrors (20TB) (retired pool - now just used for temporary items - mostly empty)
3) Raidz2 pool (40TB) - this is the pool which is giving me trouble. Its fairly full - about 4.5TB available.
This configuration has been running for many years with minimal intervention.
I have backups of the most critical data from this pool (documents, photos etc.), but it stores a ton of media which I don't typically backup. Total pool size (useable) is just under 40TB.
Since yesterday evening, I've been troubleshooting an issue that has cropped up with a raidz2 pool on my home fileserver. I noted it was showing 2 disks as removed.
After shutting down all of the running VMs, I was able to shutdown the server as well. I physically looked at the front of my chassis for obvious signs of trouble, and seeing nothing out of the ordinary reseated all of the drives by removing and reinserting them while the server was shutdown. After restart, the drives showed up as online, and a resilver completed successfully.
I had been running older versions of OmniOS and Napp-IT, so I decided to update both. Napp-IT went from 18.12 to 21.06, and OmniOS from r151028 to r151030.
After all of this competed successfully, I decided to initiate a scrub of the pool and called it a night after monitoring it making progress for an hour or so. This morning I woke up and have the same issue again - 2 disks showing as removed from the same pool. It could be by imagination (don't think it is), but I believe it is 2 different drives that showed as removed this time.
I shutdown the running VMs, and rebooted Napp-IT (no shutdown - just an init 6 from the CLI) and the drives have showed up again as online, and started reslivering again.
So - I'm looking for suggestions as to what I should do next. I can confirm that I've already physically looked at the server, but I haven't cracked it open. I haven't been inside the chassis for a few months. The server is quite old (2x Xeon(R) CPU E5-2650 v2 CPUs), as are the HBAs (3x IBM Serveraid M1015), but still meets my performance needs.
Thank in advance for your help!
PS - I have a copy of what dmesg shows, but I'm not clear on what the correct way of sharing that kind of info is on STH. It shows "multipath status: degraded:" errors for the two devices that were "removed" (sd26 and sd30)
Last edited: