After playing around with a test setup for a week, I'm gotten reasonably familiar with OmniOS and napp-it, but I've still got a recurring problem that I can't seem to work out.
Occasionally, and it seems inevitably, the whole machine will simply hang. napp-it stops responding, shares disappear, the computer itself stops displaying anything to the screen (the monitor goes into standby, receiving no signal). I can do nothing but a hard-reset / hard-poweroff to get it running again.
Once I get it going again, I can't find any mention of an error anywhere (although I'm pretty clueless where to look)
The nearest I've found to something suspicious is in the System>Faults log, for fmdump -I. The first line logged after the machine is started up again is always:
"Jan 11 00:31:51.9433 ireport.os.sunos.panic.savecore_failure"
I thought this might be indicating something unpleasant, but then I've just discovered it writes that line even after I've correctly powered down the machine.
I've got equipment turning up next week to build my server proper, and I'm really hoping this is just an odd phantom problem with the Dell T7600 workstation I'm testing on, and it will vanish once I build the server. So far I've tested with different HBAs, and different zpool drives. The only thing in common has been the machine itself, and an old Intel 40GB SSD I've been using as a boot drive. The SSD has perfect SMART data, and doesn't seem to have any kind of faults in itself, so I'm presuming it's not that (although anything is possible I suppose)
Does anyone have any ideas what might be causing this, or can think of an effective way to diagnose the problem?
Occasionally, and it seems inevitably, the whole machine will simply hang. napp-it stops responding, shares disappear, the computer itself stops displaying anything to the screen (the monitor goes into standby, receiving no signal). I can do nothing but a hard-reset / hard-poweroff to get it running again.
Once I get it going again, I can't find any mention of an error anywhere (although I'm pretty clueless where to look)
The nearest I've found to something suspicious is in the System>Faults log, for fmdump -I. The first line logged after the machine is started up again is always:
"Jan 11 00:31:51.9433 ireport.os.sunos.panic.savecore_failure"
I thought this might be indicating something unpleasant, but then I've just discovered it writes that line even after I've correctly powered down the machine.
I've got equipment turning up next week to build my server proper, and I'm really hoping this is just an odd phantom problem with the Dell T7600 workstation I'm testing on, and it will vanish once I build the server. So far I've tested with different HBAs, and different zpool drives. The only thing in common has been the machine itself, and an old Intel 40GB SSD I've been using as a boot drive. The SSD has perfect SMART data, and doesn't seem to have any kind of faults in itself, so I'm presuming it's not that (although anything is possible I suppose)
Does anyone have any ideas what might be causing this, or can think of an effective way to diagnose the problem?