So I've been running into an incredibly annoying problem with my new-to-me setup.
Please take a quick look at my thread here on the nas4free forums for the entire timeline: System panics randomly and reboots - Forums
Every day (or two-three) or so, my nas4free system will panic (or outright freeze).
Now, initially, I had my two PERC H310s passed through with proxmox as the host. Thinking this was the problem, I tried nas4free on bare metal, but still have the same problem.
Long story short, I've tried all of the following with no luck: swap CPUs, swap motherboards, swap RAM (many different sticks), swap PCI slots for the HBAs, swap disk locations on the backplane, swap power supplies, add cooling to the IOH, add cooling to the HBAs, using only one HBA, change from VM to bare metal install, swapped to entirely different CPU/mobo combo.
When running in the VM, I get a panic and some output:
When running on bare metal, the whole screen freezes, and no logs are produced.
Either way, the symptoms are the same. System runs for hours to days without issue, very light workload (smbd for files, afp on another dataset for time machine, and transmission), 2 users total, then randomly freezes or panics, combined with high cpu temp (due to the freeze I assume).
Somebody on the nas4free forum suggested using only one HBA, but that made no difference
At this point all I can think of is some sort of issue with the HBAs, either a problem specific to flashed H310s, or the firmware I'm running (P19, also tried the lateset P20 release 7).
If anybody has any suggestions at all, please let me know.
I'm losing sleep over this thing.
Please take a quick look at my thread here on the nas4free forums for the entire timeline: System panics randomly and reboots - Forums
Every day (or two-three) or so, my nas4free system will panic (or outright freeze).
Now, initially, I had my two PERC H310s passed through with proxmox as the host. Thinking this was the problem, I tried nas4free on bare metal, but still have the same problem.
Long story short, I've tried all of the following with no luck: swap CPUs, swap motherboards, swap RAM (many different sticks), swap PCI slots for the HBAs, swap disk locations on the backplane, swap power supplies, add cooling to the IOH, add cooling to the HBAs, using only one HBA, change from VM to bare metal install, swapped to entirely different CPU/mobo combo.
When running in the VM, I get a panic and some output:
Code:
May 21 18:23:34 nas syslogd: kernel boot file is /boot/kernel/kernel
May 21 18:23:34 nas kernel:
May 21 18:23:34 nas kernel:
May 21 18:23:34 nas kernel: Fatal trap 12: page fault while in kernel mode
May 21 18:23:34 nas kernel: cpuid = 1; apic id = 01
May 21 18:23:34 nas kernel: fault virtual address = 0x10
May 21 18:23:34 nas kernel: fault code = supervisor read data, page not present
May 21 18:23:34 nas kernel: instruction pointer = 0x20:0xffffffff80a12e05
May 21 18:23:34 nas kernel: stack pointer = 0x28:0xfffffe0237e76980
May 21 18:23:34 nas kernel: frame pointer = 0x28:0xfffffe0237e769e0
May 21 18:23:34 nas kernel: code segment = base 0x0, limit 0xfffff, type 0x1b
May 21 18:23:34 nas kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
May 21 18:23:34 nas kernel: processor eflags = interrupt enabled, resume, IOPL = 0
May 21 18:23:34 nas kernel: current process = 2232 (transmission-daemon)
May 21 18:23:34 nas kernel: trap number = 12
May 21 18:23:34 nas kernel: panic: page fault
May 21 18:23:34 nas kernel: cpuid = 1
May 21 18:23:34 nas kernel: KDB: stack backtrace:
May 21 18:23:34 nas kernel: #0 0xffffffff80a909d0 at kdb_backtrace+0x60
May 21 18:23:34 nas kernel: #1 0xffffffff80a531f6 at vpanic+0x126
May 21 18:23:34 nas kernel: #2 0xffffffff80a530c3 at panic+0x43
May 21 18:23:34 nas kernel: #3 0xffffffff80ed75fb at trap_fatal+0x36b
May 21 18:23:34 nas kernel: #4 0xffffffff80ed78fd at trap_pfault+0x2ed
May 21 18:23:34 nas kernel: #5 0xffffffff80ed6f7a at trap+0x47a
May 21 18:23:34 nas kernel: #6 0xffffffff80ebcfd2 at calltrap+0x8
May 21 18:23:34 nas kernel: #7 0xffffffff80a07869 at _fdrop+0x29
May 21 18:23:34 nas kernel: #8 0xffffffff80a0a10e at closef+0x21e
May 21 18:23:34 nas kernel: #9 0xffffffff80a07c18 at closefp+0x98
May 21 18:23:34 nas kernel: #10 0xffffffff80ed7fcf at amd64_syscall+0x40f
May 21 18:23:34 nas kernel: #11 0xffffffff80ebd2bb at Xfast_syscall+0xfb
May 21 18:23:34 nas kernel: Uptime: 18h54m55s
May 21 18:23:34 nas kernel: (da3:mps1:0:1:0): Synchronize cache failed
May 21 18:23:34 nas kernel: (da4:mps1:0:4:0): Synchronize cache failed
May 21 18:23:34 nas kernel: Automatic reboot in 15 seconds - press a key on the console to abort
May 21 18:23:34 nas kernel: Rebooting...
May 21 18:23:34 nas kernel: cpu_reset: Restarting BSP
May 21 18:23:34 nas kernel: cpu_reset_proxy: Stopped CPU 1s
Either way, the symptoms are the same. System runs for hours to days without issue, very light workload (smbd for files, afp on another dataset for time machine, and transmission), 2 users total, then randomly freezes or panics, combined with high cpu temp (due to the freeze I assume).
Somebody on the nas4free forum suggested using only one HBA, but that made no difference
At this point all I can think of is some sort of issue with the HBAs, either a problem specific to flashed H310s, or the firmware I'm running (P19, also tried the lateset P20 release 7).
If anybody has any suggestions at all, please let me know.
I'm losing sleep over this thing.
Last edited: