SlideShare a Scribd company logo
Baker: Scaling OVN with
Kubernetes API Server
Han Zhou
OpenStack Summit Boston 2017
Why OVN?
OVS is GREAT.
OVN makes it GREATER!
2
OVN Challenges
● OVN is distributed, but not fully …
○ Can we distributed Northd?
3
Northd
NB
SB
OVN-Controller
OVS
HV
HV
…
Central
OVSDB
HV
OVN Challenges
● OVSDB SB
○ No clustering (yet)
4
Northd
NB
SB
OVN-Controller
OVS
HV
HV
…
Central
OVSDB
HV
It is nothing but distributed
state management ...
Scale-out with Baker
● Distributed northd
○ Computes lflows for local only
● Scale-out central cluster
○ K8S API server framework
○ Backed by ETCD
○ Clustering
● Distributed agents
○ Watch for local objects only
○ Translate objects to NB DB
5
Northd
NB
SB
OVN-Controller
OVSHV
Central
ETCD
ETCD
ETCDBaker
API
server
Baker
API
server
Baker
API
server
Baker
Agent
HV
… HV
RESTful
API
One more thing ...
6
● Northd and ovn-controller are all distributed
● They process data related to local HV only
But what does this mean?
In terms of overlay ...
7
● Logical-to-physical mapping states
(port-binding) for connectivity
● Doesn’t scale when everyone talks to
everyone else in a *large* zone
○ Maybe not the case for public
cloud, or small-to-medium
enterprise cloud.
○ But it is typical use case for
eBay’s private cloud.
Are we solving the right problem?
8
● Connectivity v.s. Segmentation
● L2 Segmentation v.s. L3 segmentation
● Address sets (L3) based segmentation
○ ACL: default deny, whitelist access
○ IPAM:
■ Use ip efficiently
■ Summarized CIDRs to reduce address set size
Flat network
9
● Reuse OVN abstraction and pipeline
○ Port security
○ ARP proxy
○ ACL
○ LB
○ …
○ But NOT overlay
● Use localnet port to connect to physical
network directly
● Data to be processed by each HV
depends on size of AddressSet used by
ACLs that apply to ports on the HV
Baker Object Model
● Similar as OVN NB Schema
○ Logical Port
■ Addresses
■ Port security
○ ACL
○ Address Set
○ Load balancer (TBD)
○ ...
● Differences
○ No Logical Switch (local)
○ Port-SecGroup binding
○ ACL: SecGroup instead of
individual ports in inport/outport
10
Neutron Plugin
● Support security group, with API extensions
○ Address set - support external IPs from legacy systems
○ Security group rule packet logging
11
Scalability - Control plane throughput
12
● Test
○ E2E: Neutron - Baker - OVS
○ Simulated 1k HVs on 10 BMs
■ OVS/OVN 2.7
○ 1 node Neutron + mysql
○ 1 node Baker API server + ETCD
■ K8s 1.6 pre-release, etcd 3.0
● Result for single client (parallel test TBD)
○ Result impacted by SG (address set) size
○ ~3 ports/sec for SG size 1K
Scalability - Latency
13
● Test
○ E2E from Neutron to OVS flow installation for the port created
■ Create port from neutron, bind port in ovs on HV
■ Wait:
● ovn-nbctl wait-until Logical_Switch_Port <port> up=true
● ovn-nbctl --wait=hv sync
○ Create ports on top of existing 10K ports, 1K HVs, SG size 1K
○ 10K * 3 (flows/ACL) = 30K flows / ovs port
● Result
○ Avg 2 sec
Improvement - ovn-controller
14
● Flow computation blocks flow installation
● Improvement: avoid repeated computation when in-flight
messages to OVS pending
● Test result (SG size 10k, flow installation for 10 ports on HV):
○ 10k * 3 * 10 = 300k OVS flows
○ Before: 50 min
○ After: 16 sec
Other Lessons learned
15
● Postpone ACL expanding from Neutron to HV
○ Introduce port-group binding object in Baker
○ Use port-group instead of lport in “inport/outport” in ACL
○ Baker agent expand ACL on HV for local lports only
○ Benefit:
■ Reduced Neutron overhead
■ Reduced API calls from Neutron to Baker
■ Reduced data size in Baker
Other Lessons learned
16
● Baker RESTful API: use Protobuf instead of JSON-RPC
○ 10 - 20 % throughput increase for SG size 1k - 10k
○ Lower CPU cost on API-server
Thanks!
Q & A

More Related Content

What's hot (20)

PDF
BKK16-203 Irq prediction or how to better estimate idle time
Linaro
 
PDF
20160401 guster-roadmap
Gluster.org
 
PDF
M|18 Deep Dive: InnoDB Transactions and Write Paths
MariaDB plc
 
PDF
HBaseCon2017 Transactions in HBase
HBaseCon
 
PDF
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
Yiran Wang
 
PDF
Tungsten University: Introduction to Continuent Tungsten 2.0
Continuent
 
PDF
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
NETWAYS
 
PDF
Gluster for sysadmins
Gluster.org
 
PDF
CRuby Committers Who's Who in 2013
nagachika t
 
PDF
Ceph Block Devices: A Deep Dive
joshdurgin
 
PDF
Live migration: pros, cons and gotchas -- Pavel Emelyanov
OpenVZ
 
PDF
Simon
AFRINIC
 
PPTX
M|18 Battle of the Online Schema Change Methods
MariaDB plc
 
PDF
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward
 
PDF
Cloud storage: the right way OSS EU 2018
Orit Wasserman
 
PDF
New bare-metal provisioning setup built around Collins
leboncoin engineering
 
PDF
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
PDF
Fast, deterministic, and verifiable computations with WebAssembly. WASM on th...
Fluence Labs
 
PDF
Couchbase live 2016
Pierre Mavro
 
PDF
A day in the life of a log message
Josef Karásek
 
BKK16-203 Irq prediction or how to better estimate idle time
Linaro
 
20160401 guster-roadmap
Gluster.org
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
MariaDB plc
 
HBaseCon2017 Transactions in HBase
HBaseCon
 
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
Yiran Wang
 
Tungsten University: Introduction to Continuent Tungsten 2.0
Continuent
 
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
NETWAYS
 
Gluster for sysadmins
Gluster.org
 
CRuby Committers Who's Who in 2013
nagachika t
 
Ceph Block Devices: A Deep Dive
joshdurgin
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
OpenVZ
 
Simon
AFRINIC
 
M|18 Battle of the Online Schema Change Methods
MariaDB plc
 
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward
 
Cloud storage: the right way OSS EU 2018
Orit Wasserman
 
New bare-metal provisioning setup built around Collins
leboncoin engineering
 
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
Fast, deterministic, and verifiable computations with WebAssembly. WASM on th...
Fluence Labs
 
Couchbase live 2016
Pierre Mavro
 
A day in the life of a log message
Josef Karásek
 

Similar to Baker: Scaling OVN with Kubernetes API Server (20)

PPTX
OVN DBs HA with scale test
Aliasgar Ginwala
 
PPTX
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
James Denton
 
PDF
Kubernetes from scratch at veepee sysadmins days 2019
🔧 Loïc BLOT
 
PPTX
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
PDF
Open Source Backends for OpenStack Neutron
mestery
 
PDF
Ovn vancouver
Mason Mei
 
PDF
OpenDaylight OpenStack Integration
LinuxCon ContainerCon CloudOpen China
 
PPTX
OVN - Basics and deep dive
Trinath Somanchi
 
PDF
Distributed routing
Murali Reddy
 
PPTX
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
PDF
LF_OVS_17_State of the OVN
LF_OpenvSwitch
 
PDF
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
OpenStack
 
PDF
WSO2 Kubernetes Reference Architecture - Nov 2017
Imesh Gunaratne
 
PDF
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
PDF
Moving from CellsV1 to CellsV2 at CERN
Belmiro Moreira
 
PDF
HKG15-301: OVS implemented via ODP & vendor SDKs
Linaro
 
PPTX
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Altoros
 
PPTX
OpenEBS hangout #4
OpenEBS
 
PDF
Open vSwitch for networking solution for L2
HaseebAhmed360060
 
PDF
Automating auto-scaled load balancer based on linux and vm orchestrator
Andrew Yongjoon Kong
 
OVN DBs HA with scale test
Aliasgar Ginwala
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
James Denton
 
Kubernetes from scratch at veepee sysadmins days 2019
🔧 Loïc BLOT
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
Open Source Backends for OpenStack Neutron
mestery
 
Ovn vancouver
Mason Mei
 
OpenDaylight OpenStack Integration
LinuxCon ContainerCon CloudOpen China
 
OVN - Basics and deep dive
Trinath Somanchi
 
Distributed routing
Murali Reddy
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
LF_OVS_17_State of the OVN
LF_OpenvSwitch
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
OpenStack
 
WSO2 Kubernetes Reference Architecture - Nov 2017
Imesh Gunaratne
 
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
Moving from CellsV1 to CellsV2 at CERN
Belmiro Moreira
 
HKG15-301: OVS implemented via ODP & vendor SDKs
Linaro
 
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Altoros
 
OpenEBS hangout #4
OpenEBS
 
Open vSwitch for networking solution for L2
HaseebAhmed360060
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Andrew Yongjoon Kong
 
Ad

Recently uploaded (20)

PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Designing Production-Ready AI Agents
Kunal Rai
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Biography of Daniel Podor.pdf
Daniel Podor
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Designing Production-Ready AI Agents
Kunal Rai
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Ad

Baker: Scaling OVN with Kubernetes API Server

  • 1. Baker: Scaling OVN with Kubernetes API Server Han Zhou OpenStack Summit Boston 2017
  • 2. Why OVN? OVS is GREAT. OVN makes it GREATER! 2
  • 3. OVN Challenges ● OVN is distributed, but not fully … ○ Can we distributed Northd? 3 Northd NB SB OVN-Controller OVS HV HV … Central OVSDB HV
  • 4. OVN Challenges ● OVSDB SB ○ No clustering (yet) 4 Northd NB SB OVN-Controller OVS HV HV … Central OVSDB HV It is nothing but distributed state management ...
  • 5. Scale-out with Baker ● Distributed northd ○ Computes lflows for local only ● Scale-out central cluster ○ K8S API server framework ○ Backed by ETCD ○ Clustering ● Distributed agents ○ Watch for local objects only ○ Translate objects to NB DB 5 Northd NB SB OVN-Controller OVSHV Central ETCD ETCD ETCDBaker API server Baker API server Baker API server Baker Agent HV … HV RESTful API
  • 6. One more thing ... 6 ● Northd and ovn-controller are all distributed ● They process data related to local HV only But what does this mean?
  • 7. In terms of overlay ... 7 ● Logical-to-physical mapping states (port-binding) for connectivity ● Doesn’t scale when everyone talks to everyone else in a *large* zone ○ Maybe not the case for public cloud, or small-to-medium enterprise cloud. ○ But it is typical use case for eBay’s private cloud.
  • 8. Are we solving the right problem? 8 ● Connectivity v.s. Segmentation ● L2 Segmentation v.s. L3 segmentation ● Address sets (L3) based segmentation ○ ACL: default deny, whitelist access ○ IPAM: ■ Use ip efficiently ■ Summarized CIDRs to reduce address set size
  • 9. Flat network 9 ● Reuse OVN abstraction and pipeline ○ Port security ○ ARP proxy ○ ACL ○ LB ○ … ○ But NOT overlay ● Use localnet port to connect to physical network directly ● Data to be processed by each HV depends on size of AddressSet used by ACLs that apply to ports on the HV
  • 10. Baker Object Model ● Similar as OVN NB Schema ○ Logical Port ■ Addresses ■ Port security ○ ACL ○ Address Set ○ Load balancer (TBD) ○ ... ● Differences ○ No Logical Switch (local) ○ Port-SecGroup binding ○ ACL: SecGroup instead of individual ports in inport/outport 10
  • 11. Neutron Plugin ● Support security group, with API extensions ○ Address set - support external IPs from legacy systems ○ Security group rule packet logging 11
  • 12. Scalability - Control plane throughput 12 ● Test ○ E2E: Neutron - Baker - OVS ○ Simulated 1k HVs on 10 BMs ■ OVS/OVN 2.7 ○ 1 node Neutron + mysql ○ 1 node Baker API server + ETCD ■ K8s 1.6 pre-release, etcd 3.0 ● Result for single client (parallel test TBD) ○ Result impacted by SG (address set) size ○ ~3 ports/sec for SG size 1K
  • 13. Scalability - Latency 13 ● Test ○ E2E from Neutron to OVS flow installation for the port created ■ Create port from neutron, bind port in ovs on HV ■ Wait: ● ovn-nbctl wait-until Logical_Switch_Port <port> up=true ● ovn-nbctl --wait=hv sync ○ Create ports on top of existing 10K ports, 1K HVs, SG size 1K ○ 10K * 3 (flows/ACL) = 30K flows / ovs port ● Result ○ Avg 2 sec
  • 14. Improvement - ovn-controller 14 ● Flow computation blocks flow installation ● Improvement: avoid repeated computation when in-flight messages to OVS pending ● Test result (SG size 10k, flow installation for 10 ports on HV): ○ 10k * 3 * 10 = 300k OVS flows ○ Before: 50 min ○ After: 16 sec
  • 15. Other Lessons learned 15 ● Postpone ACL expanding from Neutron to HV ○ Introduce port-group binding object in Baker ○ Use port-group instead of lport in “inport/outport” in ACL ○ Baker agent expand ACL on HV for local lports only ○ Benefit: ■ Reduced Neutron overhead ■ Reduced API calls from Neutron to Baker ■ Reduced data size in Baker
  • 16. Other Lessons learned 16 ● Baker RESTful API: use Protobuf instead of JSON-RPC ○ 10 - 20 % throughput increase for SG size 1k - 10k ○ Lower CPU cost on API-server