I feel like we need some agreement on what "offload" really means.
Modern NICs can "offload" some of the packet checksumming to reduce [kernel] overhead when processing packets (either TX and/or RX). The purpose of enabling switchdev mode in a NIC is to properly place the packet (based upon a flow tuple) in the right space of memory to reduce memory copies, interrupts, etc. This is typically done in conjunction with SRIOV and virtual machines where these problems exist. So "offload" in this context is more of a L2/L3/L4 header match.
Intel had some good videos on this topic years ago:
and
I'm not saying it isn't possible to treat a PCI NIC as an in-hardware router, but
some application is going to need to make decisions on what to do with a new flow (eg. tc, learning switch in OVS, etc.)
Note: If you want to use the eswitch/switchdev mode, you need to be using the created representor interfaces. Not the actual ones.