PFSense interrupt issue?

gigatexal · Oct 13, 2015

I've got an old lenovo/ibm think station desktop I've converted into a PFSense router. I noticed this in the top output and got alarmed. I enabled device polling but disabled all the optional offloading in the advanced tab.

Any help would be welcome.

Specs, Core 2 Duo E8500.
4GB or so ram.
80GB SATA hdd.

Code:

last pid: 96142;  load averages:  0.76,  0.24,  0.09  up 0+22:56:07    21:29:47
129 processes: 5 running, 110 sleeping, 14 waiting

Mem: 15M Active, 88M Inact, 144M Wired, 279M Buf, 3173M Free
Swap: 8192M Total, 8192M Free


  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
    9 root     -16 ki-1     0K    16K CPU1    1   1:25  99.27% [idlepoll]
   11 root     155 ki31     0K    32K RUN     1  22.9H  54.88% [idle{idle: cpu1}]
   11 root     155 ki31     0K    32K RUN     0  22.9H  41.89% [idle{idle: cpu0}]
73306 root      22    0   223M 31844K piperd  0   0:00   0.10% php-fpm: pool lighty (php-fpm)
    0 root     -92    0     0K   256K -       0   1:39   0.00% [kernel{em2 que}]
    0 root     -16    0     0K   256K swapin  1   0:39   0.00% [kernel{swapper}]
    0 root     -92    0     0K   256K -       1   0:37   0.00% [kernel{em1 que}]
    5 root     -16    -     0K    16K pftm    0   0:17   0.00% [pf purge]
   12 root     -60    -     0K   224K WAIT    0   0:17   0.00% [intr{swi4: clock}]
17935 root      20    0 12456K  2176K select  1   0:07   0.00% /usr/local/sbin/apinger -c /var/etc/apinge
    4 root     -16    -     0K    32K -       0   0:07   0.00% [cam{scanner}]
   15 root     -16    -     0K    16K -       0   0:05   0.00% [rand_harvestq]
26910 unbound   20    0 55212K 23028K kqread  0   0:04   0.00% /usr/local/sbin/unbound -c /var/unbound/un
   12 root     -88    -     0K   224K WAIT    1   0:03   0.00% [intr{irq17: uhci1 uhc}]
38467 root      52   20 17136K  2424K wait    1   0:03   0.00% /bin/sh /var/db/rrd/updaterrd.sh
52536 root      20    0 21156K  4508K select  0   0:02   0.00% /usr/local/sbin/miniupnpd -f /var/etc/mini
   20 root      16    -     0K    16K syncer  0   0:02   0.00% [syncer]
33058 dhcpd     20    0 24844K 13124K select  1   0:02   0.00% /usr/local/sbin/dhcpd -user dhcpd -group _

adding to original post:

Here's what vmstat -i says polling off.

Code:

$ vmstat -i
interrupt                          total       rate
irq17: uhci1 uhci4+               140003          1
irq20: hpet0                    93869792       1126
irq257: em1                      7254111         87
irq258: em2                      8261133         99
Total                          109525039       1314

with polling on.

Code:

$ vmstat -i
interrupt total rate
irq17: uhci1 uhci4+ 3713 6
irq20: hpet0 610265 1125
irq257: em1 414 0
irq258: em2 583 1
Total 614975 1134

Danic · Oct 13, 2015

I've read mostly that polling should be avoided and not always supported by all drivers. Maybe vmstat -i could show which driver is causing all the polling? this thread may shine some light on the issue.

gigatexal · Oct 13, 2015

Checking that thread out now. Here's what vmstat -i says polling off.

Code:

$ vmstat -i
interrupt                          total       rate
irq17: uhci1 uhci4+               140003          1
irq20: hpet0                    93869792       1126
irq257: em1                      7254111         87
irq258: em2                      8261133         99
Total                          109525039       1314

with polling on.

Code:

$ vmstat -i
interrupt total rate
irq17: uhci1 uhci4+ 3713 6
irq20: hpet0 610265 1125
irq257: em1 414 0
irq258: em2 583 1
Total 614975 1134

gigatexal · Oct 13, 2015

might be a non-issue but running iperf i can saturate the gigabit link no problem. i still have anecdotal latency issues but bandwidth seems ok

Danic · Oct 13, 2015

Look into the cpu power states. I had a Core 2 Quad processor that would have serious time keeping issues when running powerd (in my case the cpu governor in linux) and the cpu power saving features were disabled in BIOS. Time issues = latency issues? Also you may not notice time drift because of NTP server/client in pfsense.

Just for humor and history, when I having time issues, my COD4 server would run like the Matrix lobby scene, all gun shooting slow mo with random speed ups. Also network file transfer would send data 'faster' than gigabit. I thought I had some amazing data compression going on, but nope it was all because it couldn't keep time. End solution for me was enable bios cpu power saving features (speedstep and c-states) and force the cpu governor to max performance.

gigatexal · Oct 14, 2015

set the power states to on but set to full power mode to no avail.

gigatexal · Oct 15, 2015

going to get a different nic and see if that's the issue. Haven't found any reliable ways to slow the interrupts on the HPET0

TuxDude · Oct 16, 2015

My familiarity with bsd is nowhere near where it is with linux - but I highly suspect that lots of interrupts from HPET is perfectly normal. Interrupts from HPET are for low-level kernel timekeeping, process accounting, etc. and having 1000 or more per second is not unusual.

gigatexal · Oct 16, 2015

but it pegs the cpu core at 100% in just interrupts :-(

TuxDude · Oct 16, 2015

Does it? The only shot you posted of 'top' output looks to me like the CPU is spending its time polling the NICs, which is unrelated to HPET interrupts, and is from when you said "enabled device polling but disabled all the optional offloading". At least, I'm assuming that that was the active configuration during that 'top' output capture, and makes sense to me. Using polling on the NICs means the OS is constantly asking the NIC "do you have any packets for me" over and over and over, which would cause high CPU usage (related to an idlepoll task? my lack of bsd familiarity is making me guess at things here). When you disabled polling you only showed before/after 'vmstat -i' output and not 'top' output - but we can see that the rate of HPET interrupts is virtually identical in either config, but with polling disabled now the NICs are starting to send some interrupts as well. With polling disabled, the OS is no longer repeating "any packets yet?" over and over and over, and is now actually idle (the idle{idle: cpux} tasks in 'top', but again I'm not familiar with bsd process/idle-time accounting), and when a NIC does receive a packet it sends an interrupt to let the OS know.

As a general best-practice, I would recommend keeping polling disabled, and enable as much optional offload as you can (excepting that some NICs/drivers/firmware's don't implement them all or have bugs and are unstable with some options enabled).

gigatexal · Oct 16, 2015

I'll try that. here's the output of uptime:

Code:

$ uptime
10:52PM up 2 days, 23:42, 1 user, load averages: 1.00, 1.03, 1.00

50% usage constantly with polling on. Turning it off now.

gigatexal · Oct 16, 2015

load averages are back to 0, perhaps this is a moot point, and everything is fine. I probably don't understand what a normal amount of interrupts looks like, thanks for setting me straight.

gigatexal · Oct 17, 2015

This is what was worrying:

11 root 155 ki31 0K 32K CPU0 0 66.8H 100.00% [idle{idle: cpu0}]
11 root 155 ki31 0K 32K RUN 1 25.4H 100.00% [idle{idle: cpu1}]

Code:

last pid: 24215; load averages: 0.00, 0.00, 0.00 up 3+10:05:46 09:16:09
130 processes: 3 running, 113 sleeping, 14 waiting

Mem: 17M Active, 119M Inact, 137M Wired, 415M Buf, 3146M Free
Swap: 8192M Total, 8192M Free


PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0K 32K CPU0 0 66.8H 100.00% [idle{idle: cpu0}]
11 root 155 ki31 0K 32K RUN 1 25.4H 100.00% [idle{idle: cpu1}]
14617 root 21 0 223M 36936K piperd 1 0:00 0.10% php-fpm: pool lighty (php-fpm)
9 root -16 ki-1 0K 16K pollid 0 71.7H 0.00% [idlepoll]
12 root -60 - 0K 224K WAIT 0 0:52 0.00% [intr{swi4: clock}]
5 root -16 - 0K 16K pftm 0 0:41 0.00% [pf purge]
0 root -16 0 0K 256K swapin 0 0:40 0.00% [kernel{swapper}]
26696 root 20 0 12456K 2176K select 1 0:18 0.00% /usr/local/sbin/apinger -c /var/etc/apinge
33916 unbound 20 0 87980K 57644K kqread 0 0:17 0.00% /usr/local/sbin/unbound -c /var/unbound/un
0 root -92 0 0K 256K - 0 0:12 0.00% [kernel{em2 que}]
15 root -16 - 0K 16K - 0 0:11 0.00% [rand_harvestq]
12 root -88 - 0K 224K WAIT 0 0:10 0.00% [intr{irq17: uhci1 uhc}]
47026 root 52 20 17136K 2424K wait 1 0:09 0.00% /bin/sh /var/db/rrd/updaterrd.sh
59223 root 20 0 21156K 4496K select 0 0:07 0.00% /usr/local/sbin/miniupnpd -f /var/etc/mini
0 root -92 0 0K 256K - 1 0:07 0.00% [kernel{em1 que}]
4 root -16 - 0K 32K - 0 0:07 0.00% [cam{scanner}]
20 root 16 - 0K 16K syncer 0 0:07 0.00% [syncer]
40985 dhcpd 20 0 24844K 13124K select 0 0:06 0.00% /usr/local/sbin/dhcpd -user dhcpd -group _

Shadow.X · Oct 29, 2015

Try a different nic (intel). Also, try again with different offloading options on/off.

gigatexal · Oct 29, 2015

both nics are intel. It's fine now, core temps are much more normal, and the box is silent and performant for my use case. I would like the interrupts to be much smaller but it is what it is. I'll probably have some more time to tinker with it this weekend but the internet is so vital to my house I have to schedule some downtime lol

Search

PFSense interrupt issue?

gigatexal

I'm here to learn

Danic

Member

gigatexal

I'm here to learn

gigatexal

I'm here to learn

Danic

Member

gigatexal

I'm here to learn

gigatexal

I'm here to learn

TuxDude

Well-Known Member

gigatexal

I'm here to learn

TuxDude

Well-Known Member

gigatexal

I'm here to learn

gigatexal

I'm here to learn

gigatexal

I'm here to learn

Shadow.X

New Member

gigatexal

I'm here to learn