# FlowBlaze Stateful Packet Processing in Hardware Salvatore Pontarelli, **Roberto Bifulco**, Marco Bonola, Carmelo Cascone, Marco Spaziani, Valerio Bruschi, Davide Sanvito, Giuseppe Siracusano, Antonio Capone, Michio Honda, Felipe Huici, Giuseppe Bianchi UNIVERSITA' DEGLI STUDI DI ROMA TOR VERGATA This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No 761493 ("5GTANGO") and No.762057 ("5G-PICTURE") \*Now at ONF Roberto **State Machines** ### State Machines State machines to the rescue of complex forms \Orchestrating a brighter world The Rise Of The State Machines ### Programmable NICs ### NICs with programmable ASICs, SoC, FPGAs... E.g., Microsoft [AccelNet NSDI '17, NSDI '18] ### Making programming easier # High-level Synthesis **Faster programming Expressive** ### **Hardware expertise** ClickNP [Sigcomm '16], Emu [ATC '17] ### Match-Action Abstraction **Faster programming NF Logic focused** ### **Limited support for state** P4 [CCR '14], Domino [Sigcomm '16] 6 ### Match-Action Abstraction Limitations ### **Match-Action pipeline** State in tables large read only (wr from cplane) State in registers small read/write ### **Extending Match-Action abstractions** ### **Match-Action pipeline** ### **FlowBlaze** \Orchestrating a brighter world 8 ### Match-Action vs Finite State Machine (FSM) if match then action a table if (match, state) then action ### Multiple state machines? ## Example: Drop a flow after its 10th packet Any pkt, $c=10 \rightarrow drop$ Any pkt $\rightarrow$ c=c+1, fwd Any pkt → drop | Flow ID | State | |---------------------|-------| | IPdst = 192.168.0.1 | S | | IPdst = 192.168.0.2 | S | | IPdst = 192.168.0.3 | В | Each flow's FSM evolves on its own Per-flow state is common in network functions ## Introducing per-flow state | Flow ID<br>(IP dst) | State | Register | Pkt<br>Header | State | Cond. | Action | |---------------------|-------------------------------|-------------------|--------------------|----------|-----------------------------------|---------------| | 192.168.0.1 | <u>S</u> | c=7 | * | <u>B</u> | * | drop | | 192.168.0.2 | <u>S</u> | c=4 | * | <u>S</u> | c=10 | State=B, drop | | 192.168.0.3 | <u>B</u> | c=10 | * | <u>S</u> | c<10 | fwd | | t headers, metada | ow<br>ctx<br>ble<br>ow<br>ate | if match if state | ALUs Global State | Any pl | Any p $ct \rightarrow c=c+1$ , fv | | ### Implementation issues ### Insertion in the flow context table ### Implementation issues #### Insertion in the flow context table Pkt headers, metadata **Table Flow** Ctx **Table** f match Insertion = new flow **Flow** State Global State Cuckoo Variable insertion time hash table ### Handling variable insertion time Flow table: Cuckoo hash Efficient Constant lookup-time Variable insertion-time Waiting for Insertion!! **Latency increase** ### Flow context insertion handling # lookup time scales with pkt arrival rate # insertion time scales with flow arrival rate ### Implementation issues ### Insertion in the flow context table # **State update latency** ### Avoiding race conditions ### Avoiding race conditions **Throughput** reduction \Orchestrating a brighter world ### **Latency increase** ### Avoiding race conditions **Performance** degradation only in unlikely cases Any pkt → drop Any pkt $\rightarrow$ c=c+1, fwd ### Implementation issues ### Insertion in the flow context table # **State update latency** ### Does it work? ### Use case Server Load Balancer UDP Stateful Firewall Port Knocking Firewall Flowlet load balancer Traffic Policer Big Flow Detector SYN flood Detection and Mitigation TCP optimistic ACK detection TCP super spreader detection Dynamic NAT vEPC subscriber's quota verification Switch Paxos Coordinator Switch Paxos Acceptor In-network KVS cache 21 **FlowBlaze** provides the same performance for all use cases ### FlowBlaze: NetFPGA@156.25MHz # Compared to: DPDK-VPP on Xeon X3470@2.93GHz, Intel 82599 10GbE NIC ### Stress test Test: 40Gb/s@64B (NetFPGA line rate) | | Max # active flows | | | Max # new flows/ms | | | | |-------|--------------------|--------|-------|--------------------|--------|-------|--| | Trace | IP s | IP s,d | 5 tpl | IP s | IP s,d | 5 tpl | | | UNI1 | 575 | 997 | 4k | 13 | 19 | 39 | | | UNI2 | 948 | 3k | 7k | 20 | 42 | 42 | | | MW15 | 12k | 130k | 152k | 38 | 112 | 114 | | | CHI15 | 92k | 147k | 178k | 135 | 144 | 144 | | Flow distributions ### Conclusion ### **FlowBlaze** - FSM Abstraction for packet processing - Efficient FPGA implementation ### **Benefits** - Can keep state for 100Ks flows in flow tables - Save several CPU cores for stateful NFs - Power efficient (check the paper!) - Low latency (check the paper!) ## Check the paper, there's a lot more! 23 # FlowBlaze is open Both software and hardware implementations maintained by https://github.com/axbryd/FlowBlaze Thank you! visit us and check our demo at the poster session