pfSense on Physical Switch

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

SycoPath

Active Member
Oct 8, 2014
139
41
28
I've been running virtualized pfSense for years now under ESXi with a layer3 switch. I've been thinking, and I did some googling, but I don't see any switch hardware that will run pfSense as it's OS. There are some gateway style appliances, and I could whitebox it with one of the mini PC's from the great deals thread, but that would just be another power draw. Why is there no switching hardware that runs pfSense for it's OS? Seems like it would be massively powerful and solve a lot of problems. I'd love to install pfSense on my Brocade ICX6610-48P hardware. Why have no companies does this yet? Is there some security best practice I'm missing? Seems like it would save cost, power draw, and rack space in an enterprise environment. Am I just missing the most obvious answer that everyone is just using the built in routing OS that comes with thier hardware?
 

fohdeesha

Kaini Industries
Nov 20, 2016
2,728
3,078
113
33
fohdeesha.com
Because switches operate fundamentally differently, their "routing OS" running on the management cpu (the actual "computer" part of the switch you want to run pfsense on) doesn't actually do any of the routing. It simply programs routes etc into an ASIC, a highly specialized piece of silicon that actually does all the routing, switching, ACLs, etc. This is massively different than pfsense (and any x86 firewall distro) which performs traffic alteration right in the OS. For this reason, adding new features to a switch like NAT and other specialized traffic modification generally involves spinning up entirely new ASIC hardware.

Even if you could rewrite the entirety of pfsense to no longer alter traffic in the x86 domain but instead program the needed routes / firewall rules / etc into an ASIC, you'd have very slim pickings as it's pretty rare for these devices to support layer 4/5/6+ stuff like NAT and everything else firewalls get used for as it's extremely costly to an ASICs throughput/PPS. This is why layer 3 switches / hardware routers and "firewalls" have always been two distinct markets
 
Last edited:

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
Because switches operate fundamentally differently, their "routing OS" running on the management cpu (the actual "computer" part of the switch you want to run pfsense on) doesn't actually do any of the routing. It simply programs routes etc into an ASIC, a highly specialized piece of silicon that actually does all the routing, switching, ACLs, etc. This is massively different than pfsense (and any x86 firewall distro) which performs traffic alteration right in the OS. For this reason, adding new features to a switch like NAT and other specialized traffic modification generally involves spinning up entirely new ASIC hardware.

Even if you could rewrite the entirety of pfsense to no longer alter traffic in the x86 domain but instead program the needed routes / firewall rules / etc into an ASIC, you'd have very slim pickings as it's pretty rare for these devices to support layer 4/5/6+ stuff like NAT and everything else firewalls get used for as it's extremely costly to an ASICs throughput/PPS. This is why layer 3 switches / hardware routers and "firewalls" have always been two distinct markets
Just curious, regarding how is Fortigate "hardware acceleration" relates to the architecture than you're describing:
Hardware acceleration overview
 

SycoPath

Active Member
Oct 8, 2014
139
41
28
Because switches operate fundamentally differently, their "routing OS" running on the management cpu (the actual "computer" part of the switch you want to run pfsense on) doesn't actually do any of the routing. It simply programs routes etc into an ASIC, a highly specialized piece of silicon that actually does all the routing, switching, ACLs, etc. This is massively different than pfsense (and any x86 firewall distro) which performs traffic alteration right in the OS. For this reason, adding new features to a switch like NAT and other specialized traffic modification generally involves spinning up entirely new ASIC hardware.

Even if you could rewrite the entirety of pfsense to no longer alter traffic in the x86 domain but instead program the needed routes / firewall rules / etc into an ASIC, you'd have very slim pickings as it's pretty rare for these devices to support layer 4/5/6+ stuff like NAT and everything else firewalls get used for as it's extremely costly to an ASICs throughput/PPS. This is why layer 3 switches / hardware routers and "firewalls" have always been two distinct markets
I knew about the switching ASICs, but I don't see how it would be that costly. Everything could be electronically identical. Most of the specialty routing, VPN termination, NAT, etc. is already handled in the switche's CPU, which is usually connected to the ASIC via some internal ethernet connection anyway. I don't see why some BSD drivers to control the ASIC couldn't be added to pass configuration data to the existing ASIC and just run pfSense on the switch CPU. Many modern architectures are just off the shelf FPGA chips with super secrect vendor firmware anyway. The only real reason I can think of is vendors want to guard thier trade secrets and will not have any usable API to the ASIC, and will sue anyone who attempts to reverse engineer it out of existence.
 

SycoPath

Active Member
Oct 8, 2014
139
41
28
Just curious, regarding how is Fortigate "hardware acceleration" relates to the architecture than you're describing:
Hardware acceleration overview
From what I understand, it's essentially just like how Intel CPUs handle AES offloading. There's hardware instructions built in that are just called and passed data instead of using many sequential instructions to get the same result. Think of it like taking the backroads to drive to another city versus taking a purpose built highway with crazy high speed limits that only goes to 1 place.
 

fohdeesha

Kaini Industries
Nov 20, 2016
2,728
3,078
113
33
fohdeesha.com
I knew about the switching ASICs, but I don't see how it would be that costly. Everything could be electronically identical. Most of the specialty routing, VPN termination, NAT, etc. is already handled in the switche's CPU, which is usually connected to the ASIC via some internal ethernet connection anyway. I don't see why some BSD drivers to control the ASIC couldn't be added to pass configuration data to the existing ASIC and just run pfSense on the switch CPU. Many modern architectures are just off the shelf FPGA chips with super secrect vendor firmware anyway. The only real reason I can think of is vendors want to guard thier trade secrets and will not have any usable API to the ASIC, and will sue anyone who attempts to reverse engineer it out of existence.
Specialty routing/vpn/tunnels are not handled in the CPU on most switches, they're handled in the ASIC. Doing it on the cpu would be incredibly slow, for one most of the CPUs are typically very underpowered ARM devices as they're only meant for supervisory control, and the majority of switches only have a 100mbps or 1000mbps at best lane connecting the management cpu to the ASIC.

As I stated, even if you could rewrite pfsense to control an ASIC (and trust me, it would be an entire rewrite, the name pfsense is because it's doing firewalling with PF under freebsd, you would be totally removing and replacing this code stack), you still would not be able to do the things you're asking as fast switching silicon is not able to manipulate packets beyond layer 3/4, which is what is required for NAT, IDS, complex firewalling etc. This is why even with white box switching software like cumulus, this support is not there, because there's maybe a tiny handful of ASIC hardware out there that can do it, and it is incredibly expensive (layer 4/5/6/etc packet manipulation at billions of PPS is not cheap). If what you were asking was this easy and possible, companies like Cisco would not have to spin up two different silicon lines for their routing/switching hardware and their ASR firewall line - if they did it would save them millions of dollars, but the hardware requirements are too vastly different for them to get away with doing so without majorly crippling one or the other. Same with brocade and the fastiron vs adx line, or juniper and the srx vs ex line, etc. Vastly different silicon. I'm not sure what FPGA hardware you're talking about, maybe in some firewalls for specialty encryption tasks, but an fpga would never be fast enough for the hundred+ gigabit/s packet manipulation that even cheap switches support

And regarding ASIC APIs, the big ones are very open, broadcom (who does the switching silicon for at least half of the major switching vendors out right now) has it fully available on GitHub - this is part of what Cumulus uses for example. You can look at their API calls and silicon specifications and see pretty clearly there's no way to make them start doing things like NAT that involves modifying packet headers beyond the initial specification

This question gets asked a lot on the various white box software forums & mailing lists: Cumulus Linux support for NAT | Cumulus Networks community
 
Last edited:

sean

Member
Sep 26, 2013
67
33
18
CT
In addition to the ASIC APIs is the Linux kernel's switchdev. That allows the kernel to offload L2 and L3 (and only those, note no mentions of protocols higher in the stack) to a switch ASIC. This would probably get you the closest to what you want, but the L4+ routing still has to happen on the CPU. I recall tthe 6610 is an MPC8541-type processor. You can look it up how "fast" it is and guess whether it could even do 1 gigabit plain routing.
 
  • Like
Reactions: fohdeesha

oddball

Active Member
May 18, 2018
206
121
43
42
You can kinda do this with Juniper hardware.

In most cases Juniper gear is merchant silicon with a x86 control plane and a virtualized switch/firewall/router environment. All of these run on Wind River Linux as the hypervisor with special pass throughs.

They have the NFX platform (note, NFX250 is a SWEET device) that exposes all of this. You have 2x 10GbE ports, 12x 1GbE ports with Linux as the hypervisor and Junos running in a docker container. You can load pfSense as a VM and using some vlan glue magic route your traffic through the VM's.

We talked to the pfSense people about running TNSR on this platform, they said firewalling 20GbE full duplex wouldn't be an issue. Licensing was cheap, something like $900/yr for a 10G license.

You can even chain VM's through vlans so maybe you have an OpenBSD router that passes traffic internally to a pfSense machine before hitting the switch ports.

The NFX supports any KVM vm, we have a Palo Alto VM running passing traffic to Junos back to our core. You can chain anything together that you want.

In theory you can run pfSense on an Arista switch in a VM as well. The issue is the CPU doesn't have enough horsepower to firewall at 10/40/100GbE that the ports are at. There are also weird quirks about exposing a VM on the switch access to the switch itself. You can get 1GbE very easily, but anything more and you need tricks.

If you're really serious about this I'd look at a Juniper or Arista switch. Then research like crazy the chipset, the CPU mount, and the capabilities of the backplane. Then purchase the switch, rip out the CPU, upgrade to the fastest possible and boost the ram. From there load pfSense into a VM on the switch itself. With a little magic it wouldn't surprise me if you could get 2-3GbE of firewall capability. Or...just buy an NFX250 and get 20GbE of capability out of the box.
 

SycoPath

Active Member
Oct 8, 2014
139
41
28
I didn't realize the vendors had pushed so much into ASIC these days. Just did a lot 0f reading and it's really impressive the amount of work that has gone into squeaking out every tiny bit of performance possible. I still don't see why you would need much horsepower in a CPU, or why a product like this doesn't exist already. Wire speed routing of billions of packets per second would take insane horsepower if done through strictly the CPU. My question is still, why though? Why force 100% of all traffic through the CPU. Looking at this from an engineering standpoint, I don't see why the ASIC can't stay nearly identical to what it is and only push selective packets that need CPU interaction to the CPU, all other packets just pass through the ASIC as they would in a standard switch. The ASIC would need a hardware path designed to funnel this traffic to a CPU, and it would be limited to the link speed from ASIC to CPU, but it could still be very fast. There's no reason a 100GbE port couldn't go to the CPU, a direct PCIe bus, or use something akin to Intel's QPI links. A multi-core ARM CPU with some extra ASIC's or FPGA built into the die could do some amazing stuff here. I just don't understand why no companies have really done this yet. I'm no chip engineer, and I'm glossing over a good chunk of research and work that would be required, but it's entirely doable. Imagine having 48 GbE PoE ports, and 4-8 100GbE ports on a single 1U device that did all of your routing and firewalling that could be stacked with multiple switches if more ports were needed. This would be a huge top of rack solution, and would solve a real need, especially in medium size organizations and satellite offices. I can see workloads where this wouldn't be feasible, if for example, you expected the vast majority of your traffic to hit the CPU instead of the ASIC, you would need real big iron firewalls, but it seems like that should be the niche, not the mainstream. Granted, pfSense is probably a bad example because of it's specific architecture, but I was going more for the concept as an example, not a specific why doesn't pfSense run on my switch, more of a why doesn't this product category exist?
 

oddball

Active Member
May 18, 2018
206
121
43
42
ASICs are expensive and have long development cycles.

Multi core CPU's are cheap and plentiful. The big movement today is using DPDK or VPP. On pfSense, OpenBSD, iptables etc the process is single threaded. A stream of packets come in, they're operated on and shipped off. With that you're limited to a few Gbps, that's due to CPU speed.

The alternative is DPDK or VPP where multiple streams are processed at once, or a stream broken into pieces and processed on different threads. Juniper advertises 100G firewalling with a 17 core machine and 32-64GB of ram. Cisco supposedly demoed routing 1Tbps with VPP and DPDK. This is 100% the future. It's a LOT cheaper and it's scalable. Since everything is a VM you just reboot with a higher CPU allocation if you want more speed.

Look at the Juniper vSRX datasheet (here) . With 2 vCPU's and 4GB of RAM an x86 vSRX on a KVM can process 14Gbps (max) and 3Gbps (IMIX). As a comparison Juniper's SRX345 is a custom built octeon with eight cores and 4GB of RAM and it can handle 5Gbps at its absolute best, more like 1.5Gbps IMIX.

At 17 cores and 64GB of RAM Juniper claims 98Gbps (max, 27Gbps IMIX) of firewalling capacity. The only difference between the 14Gbps and 98Gbps is the number of cores and RAM allocated to the VM, it's perfectly scalable.

Almost everyone is going this route. The new Firepower stuff is a virtual machine in a container like Juniper's SRX. Palo Alto uses some custom ASIC's, but their VM-Series has higher throughput with less cores by utilizing DPDK. And of course pfSense abandoned their original product and rebuilt it using DPDK and they hit some crazy metrics too. In theory you can replicate the TNSR stuff on your own. Their CTO has mentioned on a few forums that 100% of their tech is open source and available, but you need to know where to look and understand how to hook it up. The new secret sauce is making it work together and making it easy to configure.

Fortigate is the only vendor still pouring into custom ASIC design. Outside of Fortigate everyone else realizes that it's an x86 with a merchant silicon fabric and a custom OS running in a container.

Link to the Cisco reference: Can a BSD system replicate the performance of high-end router appliance? : networking
 
  • Like
Reactions: chicken-of-the-cave

oddball

Active Member
May 18, 2018
206
121
43
42
One other point. I think the real reason no one combines these things is because if you peg the control plane at 100% from a badly configured firewall then the switch itself could stop passing traffic.

Switching traffic and firewalling are two completely different things. Switches are hardware, the x86 is just the control plane for configuration and L3 functionality. Most packets just forward straight off the ASIC. A firewall needs to inspect the packet. On a switch that means you'd have to have some mechanism to push the data off the data plane into the control plane for inspection. This is why the Arista VM's can't talk to the switch itself, there isn't a bridge between the data and control plane for traffic.

I would say that these devices do exist. Juniper's SRX branch switches have 16x GbE ports that are switch ports but are also firewalled. The same is true for Palo Alto and Fortigate. The use case is a branch with a few devices that can plug directly into the firewall.
 

blinkenlights

Active Member
May 24, 2019
157
65
28
Most packets just forward straight off the ASIC. A firewall needs to inspect the packet. On a switch that means you'd have to have some mechanism to push the data off the data plane into the control plane for inspection.
Right. Unless I'm mistaken, even the most impressive soft-switch/router solutions (thanks for mentioning TNSR) are still store-and-forward, introducing higher latency than on a switch with partial frame switching (cut-through).

As suggested, there are a number of white box SDN/NFV forums where you can learn about soft networking sorcery.