Firewall throughput measurements: OPNsense on APU4d4, OPNsense in a Proxmox VM, and OpenWRT on Turris Omnia

Why

For a few weeks, I have been struggling to make OPNsense perform well from a performance point of view on my low-power test box, an APU4d4. While OPNsense is very well done from a firewall rules management point of view (alhtough I am not happy that forwarding rules cannot specify both incoming and outgoing interfaces like it is possible with Linux Netfilter…) and has many features of expensive firewall products (including web interface based management for clustering/failover), the FreeBSD/HardenedBSD kernel seems to be struggling with higher throughputs. After not progressing with simple trial&error with various settings gathered from different howto guides, I decided to first measure my current status properly.

In the last ca. 10 years, I have running my home lab setup with OpenWRT based routers (for a long time on Mikrotik RB2011, which is extremely power efficient for what it can do), more recently a Turris Omnia for the automatic updates coupled with maximum flexbility (and the snapshot features are really well integrated). However, for teaching our course on “Network Security” at the Institute of Networks and Security at JKU Linz, we decided to use OPNsense because it comes with an easy-to-understand web interface and is open source. A direct comparison therefore seems useful.

All systems under test have a roughly equal IP (v4 and V6) and firewall rules configuration. For completeness, I compare the OPNsense installation on the APU4d4 to a similarly configured OPNsense instance inside a VM on the same Proxmox host.

How

My setup is pretty simple: a Proxmox server hosting a small number of VMs that are connected to a DMZ VLAN, attached through a Linux host bridge that connects virtio network interfaces for the VMs with a tagged VLAN on the hardware NIC as a trunk to the local Ethernet switch. On the same switch, I have a desktop connected through a 1Gbps link. The switch is configured as a pure L2 switch (with multiple VLANs), all routing is done through the firewalls under test. One VM on the Proxmox host runs the iperf3 service, the host itself (on a different VLAN) as well as the separate desktop run iperf3 clients.

The three systems to compare are:

Turris Omnia APU4d4 OPNsense VM OPNsense
CPU Marvell Armada 380/385 AMD GX-412TC Intel Celeron G3900
CPU speed 1.6 GHz 1 GHz (1.4 GHz boost) 2.8 GHz
CPU cores 2 4 2
RAM 2GB 4GB 4GB
NIC builtin Intel i211AT virtio (vhost_net) / Intel 82599
OS version 5.1.10 (Linux kernel 4.14.222) 21.1.4 (FreeBSD 12.1-RELEASE-p15-HBSD) OPNsense 21.1 (FreeBSD 12.1-RELEASE-p12-HBSD)
Power usage (under load) 14-16W 9-14W marginal (the host is running anyways)

OPNsense on the APU4d4 has the recommended settings from here and here applied. OPNsense inside the VM has NIC hardware offload features disabled and VM configured with recommended settings from here as well as all VLANs terminated on the Linux kernel host side and bridged into the VM as independent virtual network interfaces (the consensus seems to be that VLAN tag handling is faster on Linux than BSD in such a virtualized setting).

Results

First I took a baseline mearurement with an iperf3 client running on the VM host itself, connecting to an iperf3 server running within a Debian 10 VM without any of these test systems in the routing path, but simply a virtio network connection on a single VLAN / IP subnet. The limit was CPU bound, as my Proxmox host (wich a low-power CPU) ran around 90% load over both cores during this baseline test.

All measurements were taken with iperf3 in TCP mode with 1 or 4 parallel streams and 20 repetitions:

iperf3 -c <server IP> -P <number of streams> -t 20

Baseline Average throughput (retry packets)
IPv4 1 stream 4.47 Gbps (273 retries)
IPv6 1 stream 4.25 Gbps (229 retries)
IPv4 4 streams 4.45 (4233 retries)
IPv6 4 streams 4.48 Gbps (5247 retries)

Measuring from the VM host to the VM (but on different VLANs, forcing traffic to be routed through the firewall under test), first without IPsec active (all transfer rates in Mbps):

VM->VM no IPsec Turris Omnia APU4D4 OPNsense VM OPNsense
IPv4 1 stream 695 (1485 retr) 493 (1095 retr) 1090 (148 retr)
IPv6 1 stream 422 (1327 retr) 341 (719 retr) 714 (156)
IPv4 4 streams 732 (10981 retr) 736 (12236 retr) 1140 (1993 retr)
IPv6 4 streams 415 (6570 retr) 629 (10386 retr) 793 (997 retr)

Note that the VM->VM measurements, when going through the VM OPNsense instance, are all on the same physical host and therefore not bound by any hardware network limits, but only CPU and efficiency of the 3 network stacks involved. It’s interesting that IPv6 traffic was faster than IPv4 in this case.

Then with two IPsec tunnels to external sites configured and up/routed, but with the traffic under test explicitly not being routed through the tunnels. That is, those tunnel policies are loaded in the kernel, but the test traffic should not be matched by those policies. As we can see in the results, there is however a very clear performance impact on OPNsense when we hit CPU limits (I have only measured 4 streams, as we already know that single stream performance is limited on OPNsense with low power CPUs):

VM->VM with IPsec Turris Omnia APU4D4 OPNsense VM OPNsense
IPv4 4 streams 689 (10369 retr) 551 (9520 retr) 869 (1526 retr)
IPv6 4 streams 405 (5854 retr) 413 (4911 retr) 799 (430 retr)

Repeating the measurements from a physically separate client, with the test traffic going through a physical switch to the (physical or virtual) firewall under test, then (for the two physical firewalls, not the VM one) through the same switch (different VLAN) and to the Proxmox host, where the VLAN tagged traffic is bridged into the VM running the iperf3 server:

VM->VM no IPsec Turris Omnia APU4D4 OPNsense VM OPNsense
IPv4 1 stream 771 (669 retr) 525 (62 retr) 737 (15 retr)
IPv6 1 stream 435 (586 retr) 401 (38 retr) 653 (9 retr)
IPv4 4 streams 777 (1805 retr) 867 (302 retr) 784 (661 retr)
IPv6 4 streams 413 (873 retr) 663 (662 retr) 699 (513 retr)
VM->VM with IPsec Turris Omnia APU4D4 OPNsense VM OPNsense
IPv4 4 streams 737 (1577 retr) 562 (304 retr) 710 (505 retr)
IPv6 4 streams 402 (1584 retr) 585 (206 retr) 728 (220 retr)

Conclusions

For standard routing and firewalling of multiple parallel streams, OPNsense on a low-power APU4d4 system performs a bit better (noticably better with IPv6) than a Turris Omnia with slightly less electrical power draw under load. OPNsense has the advantage of much nicer UI for firewall rules (including the possibility to define host objects and groups spanning IPv4 and IPv6), more control in terms of monitoring the firewall, nicely integrated modules like VPN protocols, and the beginnings of an API for automated configuration. Pretty much all of that can also be done with OpenWRT, but mostly on the shell or through config files of wide variety. None of these physical systems reach a full Gbps firewalling speed like the even lower powered Mikrotik systems with RouterOS and Fasttrack do.

However, there are currently two areas of concern:

  • Single stream performance is worse, and this is a known problem for FreeBSD kernels. For a single stream (e.g. uploads/downloads to/from a local fileserver), OPNsense (both physical on the APU4d4 and virtual on a power efficient server CPU) is limited to about half the maximum Ethernet throughput. This may or may not be relevant to your use case.
  • When IPsec is active - even if the relevant traffic is not part of the IPsec policy - throughput is decreased by nearly 1/3. This seems like a real performance issue / bug in the FreeBSD/HardenedBSD kernel. I will need to try with VTI based IPsec routing to see if the in-kernel policy matching is a problem.
  • These tests intentionally deactivated some of the interesting OPNsense features such as traffic analysis with samplicate/flowd_aggregate. Enabling them will again cost around 150-200Mbps throughput on the APU4d4, stacked on top of the performance drop of IPsec if all are enabled.
Avatar
René Mayrhofer
Professor of Networks and Security & Director of Engineering at Android Platform Security; pacifist, privacy fan, recovering hypocrite; generally here to question and learn

Related