Hi STH,
Help
For a few years I have been running a small ESXi box on an Intel Skull Canyon NUC (NUC6i7KYK, 32GB non-ECC RAM) with 3 active VM's. Nothing under heavy load and everything was fine. Though after 3 years of 24/7 service it gave up. Fan died and it kept acting up in different ways.
Due to that I upgraded to new hardware. On the new hardware I re-installed VMware on a new USB stick but moved the SSD's from the old hardware and imported the existing datastores.
But now nothing is stable. After about a day or two of uptime with all VM's running I get a PSOD with "Spin Count Exceeded - Possible Deadlock". See screenshots attached below. I tried searching the forums but the two post about Spin Count doesn't appear to be related.
Hardware:
Though if i run with only vCenter and GrayLog VM's it appears stable. Also stable with no VM's running. So it appears it is somehow related to the FMC VM. Though I have to admit I cannot remember if I tested with the FMC VM alone. That test I just started.
I have tried to run run MemTest86 v8.4 numerous times with 100% pass. What else could I try to figure out what is going on? Is there some data I could capture and share to help?
Any help is greatly appreciated.
Errors seen:
From the iKVM: "ID: 1 CPU_CATERR sensor of type processor logged a IERR"
From the BIOS:
PSOD's:
1:
2:
3:
MemTest:
/Klaus
Help
For a few years I have been running a small ESXi box on an Intel Skull Canyon NUC (NUC6i7KYK, 32GB non-ECC RAM) with 3 active VM's. Nothing under heavy load and everything was fine. Though after 3 years of 24/7 service it gave up. Fan died and it kept acting up in different ways.
Due to that I upgraded to new hardware. On the new hardware I re-installed VMware on a new USB stick but moved the SSD's from the old hardware and imported the existing datastores.
But now nothing is stable. After about a day or two of uptime with all VM's running I get a PSOD with "Spin Count Exceeded - Possible Deadlock". See screenshots attached below. I tried searching the forums but the two post about Spin Count doesn't appear to be related.
Hardware:
- Barebone: RS100-E10-Pi2
- BIOS: Version 3103 (latest)
- FW: Version 1.13.6 (latest)
- CPU: Xeon E-2236
- RAM: 64GB of ECC RAM - Kingston KTH-PL424E/16G
- ESXi: 6.7u3 build 16075168 (also tested 7.0 and multiple 6.7u3 builds)
- Datastore:
- SSD1: INTEL SSDPEKKW51
- SSD2: WD - WDS500G3X0C-00SJ
- USB: SanDisk UltraFit 32GB - SDCZ430-032G-G46
- VMware vCenter
- 2 vCPU's
- 10GB RAM
- GrayLog
- 2vCPU's
- 4GB RAM
- Cisco Firepower Management Center (FMC)
- 4 vCPU's
- 16GB RAM
Though if i run with only vCenter and GrayLog VM's it appears stable. Also stable with no VM's running. So it appears it is somehow related to the FMC VM. Though I have to admit I cannot remember if I tested with the FMC VM alone. That test I just started.
I have tried to run run MemTest86 v8.4 numerous times with 100% pass. What else could I try to figure out what is going on? Is there some data I could capture and share to help?
Any help is greatly appreciated.
Errors seen:
From the iKVM: "ID: 1 CPU_CATERR sensor of type processor logged a IERR"
From the BIOS:
PSOD's:
1:
2:
3:
MemTest:
/Klaus