Netfilter’s flowtable infrastructure

最新推荐文章于 2025-02-19 11:30:00 发布

maimang09

最新推荐文章于 2025-02-19 11:30:00 发布

阅读量344

点赞数

CC 4.0 BY-SA版权

分类专栏： LInux kernel 文章标签：网络

原文链接：https://www.kernel.org/doc/html/latest/networking/nf_flowtable.html

LInux kernel 专栏收录该内容

7 篇文章

订阅专栏

本文档详细介绍了Linux内核中的Netfilter流表基础设施，它允许定义快速的数据路径，并提供硬件卸载支持。流表适用于IPv4和IPv6的第3层以及TCP和UDP的第4层协议。当流的第一个包成功通过IP转发路径后，后续包可以通过流表进行快速转发，避免经典IP转发路径。流表使用可调整大小的哈希表，基于多个选择器进行查找，包括第2层协议封装、第3层源和目的地址、第4层源和目的端口以及输入接口。此外，流表条目还可以存储NAT配置，支持硬件卸载和软件卸载两种模式，并提供计数器同步。该基础设施自Linux内核5.13版起，支持VLAN和PPPoE的自动发现，并可以与桥接和IP转发配合使用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Netfilter’s flowtable infrastructure — The Linux Kernel documentation

Netfilter’s flowtable infrastructure¶

This documentation describes the Netfilter flowtable infrastructure which allows you to define a fastpath through the flowtable datapath. This infrastructure also provides hardware offload support. The flowtable supports for the layer 3 IPv4 and IPv6 and the layer 4 TCP and UDP protocols.

Overview¶

Once the first packet of the flow successfully goes through the IP forwarding path, from the second packet on, you might decide to offload the flow to the flowtable through your ruleset. The flowtable infrastructure provides a rule action that allows you to specify when to add a flow to the flowtable.

A packet that finds a matching entry in the flowtable (ie. flowtable hit) is transmitted to the output netdevice via neigh_xmit(), hence, packets bypass the classic IP forwarding path (the visible effect is that you do not see these packets from any of the Netfilter hooks coming after ingress). In case that there is no matching entry in the flowtable (ie. flowtable miss), the packet follows the classic IP forwarding path.

The flowtable uses a resizable hashtable. Lookups are based on the following n-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3 source and destination, layer 4 source and destination ports and the input interface (useful in case there are several conntrack zones in place).

The ‘flow add’ action allows you to populate the flowtable, the user selectively specifies what flows are placed into the flowtable. Hence, packets follow the classic IP forwarding path unless the user explicitly instruct flows to use this new alternative forwarding path via policy.

The flowtable datapath is represented in Fig.1, which describes the classic IP forwarding path including the Netfilter hooks and the flowtable fastpath bypass.

                                       userspace process
                                        ^              |
                                        |              |
                                   _____|____     ____\/___
                                  /          \   /         \
                                  |   input   |  |  output  |
                                  \__________/   \_________/
                                       ^               |
                                       |               |
    _________      __________      ---------     _____\/_____
   /         \    /          \     |Routing |   /            \
-->  ingress  ---> prerouting ---> |decision|   | postrouting |--> neigh_xmit
   \_________/    \__________/     ----------   \____________/          ^
     |      ^                          |               ^                |
 flowtable  |                     ____\/___            |                |
     |      |                    /         \           |                |
  __\/___   |                    | forward |------------                |
  |-----|   |                    \_________/                            |
  |-----|   |                 'flow offload' rule                       |
  |-----|   |                   adds entry to                           |
  |_____|   |                     flowtable                             |
     |      |                                                           |
    / \     |                                                           |
   /hit\_no_|                                                           |
   \ ? /                                                                |
    \ /                                                                 |
     |__yes_________________fastpath bypass ____________________________|

             Fig.1 Netfilter hooks and flowtable interactions

The flowtable entry also stores the NAT configuration, so all packets are mangled according to the NAT policy that is specified from the classic IP forwarding path. The TTL is decremented before calling neigh_xmit(). Fragmented traffic is passed up to follow the classic IP forwarding path given that the transport header is missing, in this case, flowtable lookups are not possible. TCP RST and FIN packets are also passed up to the classic IP forwarding path to release the flow gracefully. Packets that exceed the MTU are also passed up to the classic forwarding path to report packet-too-big ICMP errors to the sender.

Example configuration¶

Enabling the flowtable bypass is relatively easy, you only need to create a flowtable and add one rule to your forward chain:

table inet x {
        flowtable f {
                hook ingress priority 0; devices = { eth0, eth1 };
        }
        chain y {
                type filter hook forward priority 0; policy accept;
                ip protocol tcp flow add @f
                counter packets 0 bytes 0
        }
}

This example adds the flowtable ‘f’ to the ingress hook of the eth0 and eth1 netdevices. You can create as many flowtables as you want in case you need to perform resource partitioning. The flowtable priority defines the order in which hooks are run in the pipeline, this is convenient in case you already have a nftables ingress chain (make sure the flowtable priority is smaller than the nftables ingress chain hence the flowtable runs before in the pipeline).

The ‘flow offload’ action from the forward chain ‘y’ adds an entry to the flowtable for the TCP syn-ack packet coming in the reply direction. Once the flow is offloaded, you will observe that the counter rule in the example above does not get updated for the packets that are being forwarded through the forwarding bypass.

You can identify offloaded flows through the [OFFLOAD] tag when listing your connection tracking table.

# conntrack -L
tcp      6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2

Layer 2 encapsulation¶

Since Linux kernel 5.13, the flowtable infrastructure discovers the real netdevice behind VLAN and PPPoE netdevices. The flowtable software datapath parses the VLAN and PPPoE layer 2 headers to extract the ethertype and the VLAN ID / PPPoE session ID which are used for the flowtable lookups. The flowtable datapath also deals with layer 2 decapsulation.

You do not need to add the PPPoE and the VLAN devices to your flowtable, instead the real device is sufficient for the flowtable to track your flows.

Bridge and IP forwarding¶

Since Linux kernel 5.13, you can add bridge ports to the flowtable. The flowtable infrastructure discovers the topology behind the bridge device. This allows the flowtable to define a fastpath bypass between the bridge ports (represented as eth1 and eth2 in the example figure below) and the gateway device (represented as eth0) in your switch/router.

        fastpath bypass
 .-------------------------.
/                           \
|           IP forwarding   |
|          /             \ \/
|       br0               eth0 ..... eth0
.       / \                          *host B*
 -> eth1  eth2
     .           *switch/router*
     .
     .
   eth0
 *host A*

The flowtable infrastructure also supports for bridge VLAN filtering actions such as PVID and untagged. You can also stack a classic VLAN device on top of your bridge port.

If you would like that your flowtable defines a fastpath between your bridge ports and your IP forwarding path, you have to add your bridge ports (as represented by the real netdevice) to your flowtable definition.

Counters¶

The flowtable can synchronize packet and byte counters with the existing connection tracking entry by specifying the counter statement in your flowtable definition, e.g.

table inet x {
        flowtable f {
                hook ingress priority 0; devices = { eth0, eth1 };
                counter
        }
}

Counter support is available since Linux kernel 5.7.

Hardware offload¶

If your network device provides hardware offload support, you can turn it on by means of the ‘offload’ flag in your flowtable definition, e.g.

table inet x {
        flowtable f {
                hook ingress priority 0; devices = { eth0, eth1 };
                flags offload;
        }
}

There is a workqueue that adds the flows to the hardware. Note that a few packets might still run over the flowtable software path until the workqueue has a chance to offload the flow to the network device.

You can identify hardware offloaded flows through the [HW_OFFLOAD] tag when listing your connection tracking table. Please, note that the [OFFLOAD] tag refers to the software offload mode, so there is a distinction between [OFFLOAD] which refers to the software flowtable fastpath and [HW_OFFLOAD] which refers to the hardware offload datapath being used by the flow.

The flowtable hardware offload infrastructure also supports for the DSA (Distributed Switch Architecture).

Limitations¶

The flowtable behaves like a cache. The flowtable entries might get stale if either the destination MAC address or the egress netdevice that is used for transmission changes.

This might be a problem if: