SlideShare a Scribd company logo
Kubernetes @ Squarespace
Microservices on Kubernetes in a Datacenter
Kevin Lynch
klynch@squarespace.com
Agenda
01 The problem with static infrastructure
02 Kubernetes fundamentals
03 Kubernetes networking in a datacenter
04 Adapting microservices to Kubernetes
05 Managing Kubernetes clusters
Microservices Journey: A Story of Growth
2013: small (< 50 engineers)
build product & grow customer base
whatever works
2014: medium (< 100 engineers)
we have a lot of customers now!
whatever works doesn't work anymore
2016: large (100+ engineers)
architect for scalability and reliability
organizational structures
2017: XL (200+ engineers)
Challenges with a Monolith
● Reliability
● Performance
● Engineering agility/speed, cross-team coupling
● More time spent fire fighting rather than building new functionality
What were the increasingly difficult challenges with a
monolith?
Challenges with a Monolith
● Minimize failure domains
● Developers are more confident in their changes
● Squarespace can move faster
Solution: Microservices!
Operational Challenges
● Engineering org grows…
● More features...
● More services…
● More infrastructure to spin up…
● Ops becomes a blocker...
Stuck in a loop
Traditional Provisioning Process
● Pick ESX with available resources
● Pick IP
● Register host to Cobbler
● Register DNS entry
● Create new VM on ESX
● PXE boot VM and install OS and base configuration
● Install system dependencies (LDAP, NTP, CollectD, Sensu…)
● Install app dependencies (Java, FluentD/Filebeat, Consul, Mongo-
S…)
● Install the app
● App registers with discovery system and begins receiving traffic
Containerization & Kubernetes Orchestration
● Difficult to find resources
● Slow to provision and scale
● Discovery is a must
● Metrics system must support short lived metrics
● Alerts are usually per instance
Static infrastructure and microservices do not mix!
Kubernetes Provisioning Process
● kubectl apply -f app.yaml
Kubernetes Fundamentals
● ApiVersion & Kind
○ type of object
● Metadata
○ Names, annotations, labels
● Spec & Status
○ What you want to happen...
○ … versus reality
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
annotations:
squarespace.net/build: nginx-42
labels:
app: frontend
...
spec:
containers:
- name: nginx
image: nginx:latest
...
status:
hostIP: 10.122.1.201
podIP: 10.123.185.9
phase: Running
qosClass: BestEffort
startTime: 2017-07-31T02:08:25Z
...
Common Objects: Pods
● Basic deployable workload
● Group of 1+ containers
● Define resource requirements
● Defines storage volumes
○ Ephemeral storage
○ Shared storage (NFS, CephFS)
○ Block storage (RBD)
○ Secrets
○ ConfigMaps
○ more...
spec:
containers:
- name: location
image: .../location:master-269
ports: ...
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
volumeMounts:
- name: config
mountPath: /service/config
- name: log-dir
mountPath: /data/logs
volumes:
- name: config
configMap:
name: location-config
- name: log-dir
emptyDir: {}
Common Objects: Deployments
● Declarative
● Defines a type of pod to run
● Defines desired #
● Supports basic operations
○ Can be rolled back quickly!
○ Can be scaled up/down
● Meant to be stateless apps!
kind: Deployment
spec:
replicas: 3
selector:
matchLabels:
service: location
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate
template:
... pod info here ...
Common Objects: Services
● Make pods addressable internally
and externally
○ IP
○ DNS
○ External Load Balancer
apiVersion: v1
kind: Service
metadata:
name: location
namespace: core-services
spec:
type: ClusterIP
clusterIP: 10.123.79.211
selector:
service: location
ports:
- name: traffic
port: 8080
- name: admin
port: 8081
Kubernetes in a datacenter?
Kubernetes Networking
● Kubernetes CNI (Container Network Interface) is pluggable
● Different plugins for different network topologies
○ Flannel
○ Calico
○ Weave
○ Kubenet
○ VXLan
Calico Networking
● No network overlay required!
○ No nasty MTU issues
○ No performance impact
● Communicates directly with existing L3 network
● BGP Peering with Top of Rack switch
Kubernetes Architecture
Calico Networking
● Engineers can think of Pod IPs as normal hosts
(they’re not)
○ Ping works
○ Consul works normally
○ Browser communication works
○ Shell sorta works (kubectl exec -it pod sh)
Spine and Leaf Layer 3 Clos Topology
● All work is performed at the leaf/ToR switch
● Each leaf switch is separate Layer 3 domain
● Each leaf is a separate BGP domain (ASN)
● No Spanning Tree Protocol issues seen in L2 networks (convergence
time, loops)
Leaf Leaf Leaf Leaf
Spine Spine
Spine and Leaf Layer 3 Clos Topology
● Simple to understand
● Easy to scale
● Predictable and consistent latency (hops = 2)
● Allows for Anycast IPs
Leaf Leaf Leaf Leaf
Spine Spine
Calico Networking
● Each worker announces it’s pod IP ranges
○ Aggregated to /26
● Each master announces an External Anycast IP
○ Used for component communication
● Each ingress tier announces the Service IP range
ip addr add 10.123.0.0/17 dev lo
etcdctl set
/calico/bgp/v1/global/custom_filters/v4/services
'if ( net = 10.123.0.0/17 ) then { accept; }'
Spine and Leaf Layer 3 Clos Topology
Leaf Leaf Leaf Leaf
Spine Spine
Host Host Host Host
Spine and Leaf Layer 3 Clos Topology
Leaf Leaf Leaf Leaf
Spine Spine
Host Host Host Host
Host Host Host Host
Spine and Leaf Layer 3 Clos Topology
Leaf Layer 3
Leaf Layer 2
Spine
Host
● Not quite that easy…
● Switches started issuing ICMP
Redirects
● Eventually crashed...
● Causing routes to be dropped
Host
Pod
Pod
Spine and Leaf Layer 3 Clos Topology
Leaf Layer 3
Leaf Layer 2
Spine
Host
● Switches issued ICMP Redirects
○ Allows for more efficient routes
○ Each host is a router!
● Eventually routes flapped to Calico,
dropping connections
○ A redirect was issued on every
packet
Host
Pod
Pod
Spine and Leaf Layer 3 Clos Topology
Leaf Layer 3
Leaf Layer 2
Spine
Host
● Route Reflectors pass full routing table
to Calico
○ Host traffic on the same switch is
no longer routed
○ No ICMP Redirects!
Host
Pod
Pod
RR
Calico
Calico
How do we run Java in a
container?
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
Microservice Pod
Java Microservice
fluentd consul
Quality of Service Classes
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● BestEffort
○ No resource constraints
○ First to be killed under pressure
● Guaranteed
○ Requests == Limits
○ Last to kill under pressure
○ Easier to reason about resources
● Burstable
○ Take advantage of unused resources!
○ Can be tricky with some languages
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● Kubernetes assumes no other processes are
consuming significant resources
● Completely Fair Scheduler (CFS)
○ Schedules a task based on CPU Shares
○ Throttles a task once it hits CPU Quota
● OOM Killed when memory limit exceeded
Microservice Pod Definition
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
● Shares = CPU Request * 1024
● Total Kubernetes Shares = # Cores * 1024
● Quota = CPU Limit * 100ms
● Period = 100ms
Java in a Container
● JVM is able to detect # of cores via sysconf(_SC_NPROCESSORS_ONLN)
● Many libraries rely on Runtime.getRuntime.availableProcessors()
○ Jetty
○ ForkJoinPool
○ GC Threads
○ That mystery dependency...
Java in a Container
● Provide a base container that calculates the container’s resources!
● Detect # of “cores” assigned
○ /sys/fs/cgroup/cpu/cpu.cfs_quota_us divided by
/sys/fs/cgroup/cpu/cpu.cfs_period_us
● Automatically tune the JVM:
○ -XX:ParallelGCThreads=${core_limit}
○ -XX:ConcGCThreads=${core_limit}
○ -Djava.util.concurrent.ForkJoinPool.common.parallelism=${core_limit}
Java in a Container
● Use Linux preloading to override availableProcessors()
#include <stdlib.h>
#include <unistd.h>
int JVM_ActiveProcessorCount(void) {
char* val = getenv("CONTAINER_CORE_LIMIT");
return val != NULL ? atoi(val) : sysconf(_SC_NPROCESSORS_ONLN);
}
https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
How do we monitor our
applications?
● Graphite does not scale well with ephemeral instances
● Easy to have combinatoric explosion of metrics
Traditional Monitoring & Alerting
● Application and system alerts are tightly coupled
● Difficult to create alerts on SLAs
● Difficult to route alerts
Host System
Traditional Monitoring & Alerting
Application
Push
metrics
Check
application
health
Check service
Check system health
Traditional Monitoring & Alerting
● Efficient for ephemeral instances
● Stores tagged data
● Easy to have many smaller instances (per team or complex system)
● Prometheus Operator runs everything in Kubernetes!
Kubernetes Monitoring & Alerting
● Alerts are defined with the application code!
● Easy to define SLA alerts
● Routing is still difficult
Prometheus Operator
Kubernetes Monitoring & Alerting
How do we keep everything
organized?
● Namespaces
○ Isolates groups of objects
■ Developer
■ Team
■ System or Service
○ Good for permission boundaries
○ Good for network boundaries
● Most objects are namespaced
apiVersion: v1
kind: Namespace
metadata:
name: core-services
annotations:
squarespace.net/contact: |
team@squarespace.com
spec:
finalizers:
- kubernetes
status:
phase: Active
● Need to keep certain objects up to date in each namespace
● Need to keep objects synchronized across different datacenters
○ RBAC Policies
○ Prometheus instance per team
○ Keys to access Ceph
○ External Service Endpoints
○ Consul configurations and keys
● kubectl gets too complicated to manage these…
● Everything gets out of sync very quickly
● Kubernetes CustomResourceDefinitions allow us to define types
○ Deploy a service in Kubernetes to manage Kubernetes!
Namespace
Operator
API Server
SRE Team
Detect new team
definition
Keep namespace
in sync
Core Services
Team
SRE
Namespace
Core Services
Namespace
How do we handle
dependencies?
Dependency Management
Microservice Pod
Java Microservice
fluentd consul
● Deployments are committed alongside
the service code
● Deployments also define their own
dependencies...
● How do you update Consul across 1
service? 5 services? 100 services?
Deployment
Dependency Management
● Kubernetes 1.7 introduces Custom Initializers
○ Register Sidecar Initializer for Deployments
○ Deploy service
○ Inject sidecar containers
○ …
○ Profit!
Pod Template
Java Microservice
fluentd consul
Sidecar
Injector
FluentD
Sidecar
Consul
Sidecar
Dependency Management
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: location
namespace: core-services
annotations:
sidecar.injector.squarespace.net/consul: "true"
apiVersion: injector.squarespace.net/v1alpha1
kind: Sidecar
metadata:
name: consul
spec:
annotation: sidecar.injector.squarespace.net/consul
containers:
- name: consul
image: consul:0.8.5
...
Conclusion
● Kubernetes and containerization is hard…
○ Don’t give up!
● Services first!
○ Monitor the service, not the instances
● The Kubernetes API model is powerful!
○ Declare what you want and write code to manage that state
QUESTIONS?
Thank you!
squarespace.com/careers
Kevin Lynch
klynch@squarespace.com

More Related Content

PPTX
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
PPTX
Kubernetes 101 for Beginners
Oktay Esgul
 
PPTX
Kubernetes #6 advanced scheduling
Terry Cho
 
PDF
Kubernetes Basis: Pods, Deployments, and Services
Jian-Kai Wang
 
PPTX
Kubernetes introduction
Dongwon Kim
 
PDF
Kubernetes for Java developers
Robert Barr
 
PDF
Kubernetes Basic Operation
Simon Su
 
PDF
Kubernetes in 30 minutes (2017/03/10)
lestrrat
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
Kubernetes 101 for Beginners
Oktay Esgul
 
Kubernetes #6 advanced scheduling
Terry Cho
 
Kubernetes Basis: Pods, Deployments, and Services
Jian-Kai Wang
 
Kubernetes introduction
Dongwon Kim
 
Kubernetes for Java developers
Robert Barr
 
Kubernetes Basic Operation
Simon Su
 
Kubernetes in 30 minutes (2017/03/10)
lestrrat
 

What's hot (20)

PPTX
Secure container: Kata container and gVisor
Ching-Hsuan Yen
 
PDF
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
Bob Cotton
 
PDF
Kubernetes Architecture and Introduction – Paris Kubernetes Meetup
Stefan Schimanski
 
PDF
Container Orchestration from Theory to Practice
Docker, Inc.
 
PDF
Load Balancing Applications with NGINX in a CoreOS Cluster
Kevin Jones
 
PDF
Build Your Kubernetes Operator with the Right Tool!
Rafał Leszko
 
PDF
Redis Meetup TLV - K8s Session 28/10/2018
Danni Moiseyev
 
ODP
OpenStack Nova Scheduler
Peeyush Gupta
 
PDF
Kubernetes Node Deep Dive
Lei (Harry) Zhang
 
PDF
KubeCon EU 2016: Kubernetes Storage 101
KubeAcademy
 
PPTX
Docker 1.5
rajdeep
 
PDF
Time series denver an introduction to prometheus
Bob Cotton
 
PDF
Tech Talk by Gal Sagie: Kuryr - Connecting containers networking to OpenStack...
nvirters
 
PDF
Introduction to MidoNet
Taku Fukushima
 
PDF
Docker and Kubernetes 101 workshop
Sathish VJ
 
PDF
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
Juraj Hantak
 
PDF
Kubernetes Walk Through from Technical View
Lei (Harry) Zhang
 
PDF
Kubernetes in 20 minutes - HDE Monthly Technical Session 24
lestrrat
 
PDF
An Updated Performance Comparison of Virtual Machines and Linux Containers
Kento Aoyama
 
PDF
WTF is Twisted?
hawkowl
 
Secure container: Kata container and gVisor
Ching-Hsuan Yen
 
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
Bob Cotton
 
Kubernetes Architecture and Introduction – Paris Kubernetes Meetup
Stefan Schimanski
 
Container Orchestration from Theory to Practice
Docker, Inc.
 
Load Balancing Applications with NGINX in a CoreOS Cluster
Kevin Jones
 
Build Your Kubernetes Operator with the Right Tool!
Rafał Leszko
 
Redis Meetup TLV - K8s Session 28/10/2018
Danni Moiseyev
 
OpenStack Nova Scheduler
Peeyush Gupta
 
Kubernetes Node Deep Dive
Lei (Harry) Zhang
 
KubeCon EU 2016: Kubernetes Storage 101
KubeAcademy
 
Docker 1.5
rajdeep
 
Time series denver an introduction to prometheus
Bob Cotton
 
Tech Talk by Gal Sagie: Kuryr - Connecting containers networking to OpenStack...
nvirters
 
Introduction to MidoNet
Taku Fukushima
 
Docker and Kubernetes 101 workshop
Sathish VJ
 
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
Juraj Hantak
 
Kubernetes Walk Through from Technical View
Lei (Harry) Zhang
 
Kubernetes in 20 minutes - HDE Monthly Technical Session 24
lestrrat
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
Kento Aoyama
 
WTF is Twisted?
hawkowl
 
Ad

Similar to Kubernetes @ Squarespace (SRE Portland Meetup October 2017) (20)

PDF
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
PDF
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
PDF
Kubernetes from scratch at veepee sysadmins days 2019
🔧 Loïc BLOT
 
PDF
Netflix Container Scheduling and Execution - QCon New York 2016
aspyker
 
PDF
Scheduling a fuller house - Talk at QCon NY 2016
Sharma Podila
 
PDF
Kubernetes for Beginners
DigitalOcean
 
PDF
Containers > VMs
David Timothy Strauss
 
PDF
Scaling Up Logging and Metrics
Ricardo Lourenço
 
PDF
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
HostedbyConfluent
 
PPTX
Container orchestration and microservices world
Karol Chrapek
 
PPTX
OpenEBS hangout #4
OpenEBS
 
PPTX
Introduction to kubernetes
Rishabh Indoria
 
PDF
Experiences building a distributed shared log on RADOS - Noah Watkins
Ceph Community
 
PDF
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
PDF
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
PDF
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
PDF
Kubernetes: My BFF
Jonathan Yu
 
PPTX
Comparison of existing cni plugins for kubernetes
Adam Hamsik
 
PDF
Kubernetes in Docker
Docker, Inc.
 
PDF
Using eBPF to Measure the k8s Cluster Health
ScyllaDB
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
Kubernetes from scratch at veepee sysadmins days 2019
🔧 Loïc BLOT
 
Netflix Container Scheduling and Execution - QCon New York 2016
aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Sharma Podila
 
Kubernetes for Beginners
DigitalOcean
 
Containers > VMs
David Timothy Strauss
 
Scaling Up Logging and Metrics
Ricardo Lourenço
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
HostedbyConfluent
 
Container orchestration and microservices world
Karol Chrapek
 
OpenEBS hangout #4
OpenEBS
 
Introduction to kubernetes
Rishabh Indoria
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Ceph Community
 
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Kubernetes: My BFF
Jonathan Yu
 
Comparison of existing cni plugins for kubernetes
Adam Hamsik
 
Kubernetes in Docker
Docker, Inc.
 
Using eBPF to Measure the k8s Cluster Health
ScyllaDB
 
Ad

Recently uploaded (20)

PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
Exploring AI Agents in Process Industries
amoreira6
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Activate_Methodology_Summary presentatio
annapureddyn
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
Exploring AI Agents in Process Industries
amoreira6
 
Presentation about variables and constant.pptx
kr2589474
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)

  • 1. Kubernetes @ Squarespace Microservices on Kubernetes in a Datacenter Kevin Lynch [email protected]
  • 2. Agenda 01 The problem with static infrastructure 02 Kubernetes fundamentals 03 Kubernetes networking in a datacenter 04 Adapting microservices to Kubernetes 05 Managing Kubernetes clusters
  • 3. Microservices Journey: A Story of Growth 2013: small (< 50 engineers) build product & grow customer base whatever works 2014: medium (< 100 engineers) we have a lot of customers now! whatever works doesn't work anymore 2016: large (100+ engineers) architect for scalability and reliability organizational structures 2017: XL (200+ engineers)
  • 4. Challenges with a Monolith ● Reliability ● Performance ● Engineering agility/speed, cross-team coupling ● More time spent fire fighting rather than building new functionality What were the increasingly difficult challenges with a monolith?
  • 5. Challenges with a Monolith ● Minimize failure domains ● Developers are more confident in their changes ● Squarespace can move faster Solution: Microservices!
  • 6. Operational Challenges ● Engineering org grows… ● More features... ● More services… ● More infrastructure to spin up… ● Ops becomes a blocker... Stuck in a loop
  • 7. Traditional Provisioning Process ● Pick ESX with available resources ● Pick IP ● Register host to Cobbler ● Register DNS entry ● Create new VM on ESX ● PXE boot VM and install OS and base configuration ● Install system dependencies (LDAP, NTP, CollectD, Sensu…) ● Install app dependencies (Java, FluentD/Filebeat, Consul, Mongo- S…) ● Install the app ● App registers with discovery system and begins receiving traffic
  • 8. Containerization & Kubernetes Orchestration ● Difficult to find resources ● Slow to provision and scale ● Discovery is a must ● Metrics system must support short lived metrics ● Alerts are usually per instance Static infrastructure and microservices do not mix!
  • 9. Kubernetes Provisioning Process ● kubectl apply -f app.yaml
  • 10. Kubernetes Fundamentals ● ApiVersion & Kind ○ type of object ● Metadata ○ Names, annotations, labels ● Spec & Status ○ What you want to happen... ○ … versus reality apiVersion: v1 kind: Pod metadata: name: nginx namespace: default annotations: squarespace.net/build: nginx-42 labels: app: frontend ... spec: containers: - name: nginx image: nginx:latest ... status: hostIP: 10.122.1.201 podIP: 10.123.185.9 phase: Running qosClass: BestEffort startTime: 2017-07-31T02:08:25Z ...
  • 11. Common Objects: Pods ● Basic deployable workload ● Group of 1+ containers ● Define resource requirements ● Defines storage volumes ○ Ephemeral storage ○ Shared storage (NFS, CephFS) ○ Block storage (RBD) ○ Secrets ○ ConfigMaps ○ more... spec: containers: - name: location image: .../location:master-269 ports: ... resources: limits: cpu: 2 memory: 4Gi requests: cpu: 2 memory: 4Gi volumeMounts: - name: config mountPath: /service/config - name: log-dir mountPath: /data/logs volumes: - name: config configMap: name: location-config - name: log-dir emptyDir: {}
  • 12. Common Objects: Deployments ● Declarative ● Defines a type of pod to run ● Defines desired # ● Supports basic operations ○ Can be rolled back quickly! ○ Can be scaled up/down ● Meant to be stateless apps! kind: Deployment spec: replicas: 3 selector: matchLabels: service: location strategy: rollingUpdate: maxSurge: 100% maxUnavailable: 0 type: RollingUpdate template: ... pod info here ...
  • 13. Common Objects: Services ● Make pods addressable internally and externally ○ IP ○ DNS ○ External Load Balancer apiVersion: v1 kind: Service metadata: name: location namespace: core-services spec: type: ClusterIP clusterIP: 10.123.79.211 selector: service: location ports: - name: traffic port: 8080 - name: admin port: 8081
  • 14. Kubernetes in a datacenter?
  • 15. Kubernetes Networking ● Kubernetes CNI (Container Network Interface) is pluggable ● Different plugins for different network topologies ○ Flannel ○ Calico ○ Weave ○ Kubenet ○ VXLan
  • 16. Calico Networking ● No network overlay required! ○ No nasty MTU issues ○ No performance impact ● Communicates directly with existing L3 network ● BGP Peering with Top of Rack switch
  • 18. Calico Networking ● Engineers can think of Pod IPs as normal hosts (they’re not) ○ Ping works ○ Consul works normally ○ Browser communication works ○ Shell sorta works (kubectl exec -it pod sh)
  • 19. Spine and Leaf Layer 3 Clos Topology ● All work is performed at the leaf/ToR switch ● Each leaf switch is separate Layer 3 domain ● Each leaf is a separate BGP domain (ASN) ● No Spanning Tree Protocol issues seen in L2 networks (convergence time, loops) Leaf Leaf Leaf Leaf Spine Spine
  • 20. Spine and Leaf Layer 3 Clos Topology ● Simple to understand ● Easy to scale ● Predictable and consistent latency (hops = 2) ● Allows for Anycast IPs Leaf Leaf Leaf Leaf Spine Spine
  • 21. Calico Networking ● Each worker announces it’s pod IP ranges ○ Aggregated to /26 ● Each master announces an External Anycast IP ○ Used for component communication ● Each ingress tier announces the Service IP range ip addr add 10.123.0.0/17 dev lo etcdctl set /calico/bgp/v1/global/custom_filters/v4/services 'if ( net = 10.123.0.0/17 ) then { accept; }'
  • 22. Spine and Leaf Layer 3 Clos Topology Leaf Leaf Leaf Leaf Spine Spine Host Host Host Host
  • 23. Spine and Leaf Layer 3 Clos Topology Leaf Leaf Leaf Leaf Spine Spine Host Host Host Host Host Host Host Host
  • 24. Spine and Leaf Layer 3 Clos Topology Leaf Layer 3 Leaf Layer 2 Spine Host ● Not quite that easy… ● Switches started issuing ICMP Redirects ● Eventually crashed... ● Causing routes to be dropped Host Pod Pod
  • 25. Spine and Leaf Layer 3 Clos Topology Leaf Layer 3 Leaf Layer 2 Spine Host ● Switches issued ICMP Redirects ○ Allows for more efficient routes ○ Each host is a router! ● Eventually routes flapped to Calico, dropping connections ○ A redirect was issued on every packet Host Pod Pod
  • 26. Spine and Leaf Layer 3 Clos Topology Leaf Layer 3 Leaf Layer 2 Spine Host ● Route Reflectors pass full routing table to Calico ○ Host traffic on the same switch is no longer routed ○ No ICMP Redirects! Host Pod Pod RR Calico Calico
  • 27. How do we run Java in a container?
  • 28. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi Microservice Pod Java Microservice fluentd consul
  • 29. Quality of Service Classes resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● BestEffort ○ No resource constraints ○ First to be killed under pressure ● Guaranteed ○ Requests == Limits ○ Last to kill under pressure ○ Easier to reason about resources ● Burstable ○ Take advantage of unused resources! ○ Can be tricky with some languages
  • 30. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● Kubernetes assumes no other processes are consuming significant resources ● Completely Fair Scheduler (CFS) ○ Schedules a task based on CPU Shares ○ Throttles a task once it hits CPU Quota ● OOM Killed when memory limit exceeded
  • 31. Microservice Pod Definition resources: requests: cpu: 2 memory: 4Gi limits: cpu: 2 memory: 4Gi ● Shares = CPU Request * 1024 ● Total Kubernetes Shares = # Cores * 1024 ● Quota = CPU Limit * 100ms ● Period = 100ms
  • 32. Java in a Container ● JVM is able to detect # of cores via sysconf(_SC_NPROCESSORS_ONLN) ● Many libraries rely on Runtime.getRuntime.availableProcessors() ○ Jetty ○ ForkJoinPool ○ GC Threads ○ That mystery dependency...
  • 33. Java in a Container ● Provide a base container that calculates the container’s resources! ● Detect # of “cores” assigned ○ /sys/fs/cgroup/cpu/cpu.cfs_quota_us divided by /sys/fs/cgroup/cpu/cpu.cfs_period_us ● Automatically tune the JVM: ○ -XX:ParallelGCThreads=${core_limit} ○ -XX:ConcGCThreads=${core_limit} ○ -Djava.util.concurrent.ForkJoinPool.common.parallelism=${core_limit}
  • 34. Java in a Container ● Use Linux preloading to override availableProcessors() #include <stdlib.h> #include <unistd.h> int JVM_ActiveProcessorCount(void) { char* val = getenv("CONTAINER_CORE_LIMIT"); return val != NULL ? atoi(val) : sysconf(_SC_NPROCESSORS_ONLN); } https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
  • 35. How do we monitor our applications?
  • 36. ● Graphite does not scale well with ephemeral instances ● Easy to have combinatoric explosion of metrics Traditional Monitoring & Alerting ● Application and system alerts are tightly coupled ● Difficult to create alerts on SLAs ● Difficult to route alerts
  • 37. Host System Traditional Monitoring & Alerting Application Push metrics Check application health Check service Check system health
  • 39. ● Efficient for ephemeral instances ● Stores tagged data ● Easy to have many smaller instances (per team or complex system) ● Prometheus Operator runs everything in Kubernetes! Kubernetes Monitoring & Alerting ● Alerts are defined with the application code! ● Easy to define SLA alerts ● Routing is still difficult
  • 42. How do we keep everything organized?
  • 43. ● Namespaces ○ Isolates groups of objects ■ Developer ■ Team ■ System or Service ○ Good for permission boundaries ○ Good for network boundaries ● Most objects are namespaced apiVersion: v1 kind: Namespace metadata: name: core-services annotations: squarespace.net/contact: | [email protected] spec: finalizers: - kubernetes status: phase: Active
  • 44. ● Need to keep certain objects up to date in each namespace ● Need to keep objects synchronized across different datacenters ○ RBAC Policies ○ Prometheus instance per team ○ Keys to access Ceph ○ External Service Endpoints ○ Consul configurations and keys ● kubectl gets too complicated to manage these… ● Everything gets out of sync very quickly
  • 45. ● Kubernetes CustomResourceDefinitions allow us to define types ○ Deploy a service in Kubernetes to manage Kubernetes! Namespace Operator API Server SRE Team Detect new team definition Keep namespace in sync Core Services Team SRE Namespace Core Services Namespace
  • 46. How do we handle dependencies?
  • 47. Dependency Management Microservice Pod Java Microservice fluentd consul ● Deployments are committed alongside the service code ● Deployments also define their own dependencies... ● How do you update Consul across 1 service? 5 services? 100 services?
  • 48. Deployment Dependency Management ● Kubernetes 1.7 introduces Custom Initializers ○ Register Sidecar Initializer for Deployments ○ Deploy service ○ Inject sidecar containers ○ … ○ Profit! Pod Template Java Microservice fluentd consul Sidecar Injector FluentD Sidecar Consul Sidecar
  • 49. Dependency Management apiVersion: apps/v1beta1 kind: Deployment metadata: name: location namespace: core-services annotations: sidecar.injector.squarespace.net/consul: "true" apiVersion: injector.squarespace.net/v1alpha1 kind: Sidecar metadata: name: consul spec: annotation: sidecar.injector.squarespace.net/consul containers: - name: consul image: consul:0.8.5 ...
  • 50. Conclusion ● Kubernetes and containerization is hard… ○ Don’t give up! ● Services first! ○ Monitor the service, not the instances ● The Kubernetes API model is powerful! ○ Declare what you want and write code to manage that state

Editor's Notes

  • #4: Small: Whatever works Ship features Medium: running into scaling problems code complexity issues A single engineer can’t understand the entire stack Large: Millions of customers depend on us Failure is not an option How do we plan for the future today?
  • #6: Split up the code Define smaller fault domains scalable units Clearly defined scope Engineers can confidently make changes move faster
  • #7: What does this mean for operations? last year less than a dozen services existed, today more than 50 are in production or actively developed
  • #8: Typical workflow for provisioning a VM at Squarespace Currently takes about 30 minutes to provision a VM There are definitely some optimizations to be made here: Use VM templates (hard to generalize space constraints in general, but not so much of a problem for microservices) Use VMware vMotion and other tools for auto migrating and finding free resources
  • #9: The big takeaway Requires a robust discovery mechanism for services; can’t easily get by with static names This can be as simple DNS or load balancers or something more complex (zookeeper, etcd, Consul) Each has tradeoffs Metrics: Graphite metrics are not meant to be ephemeral long lived metrics that are expensive to create, and are not efficiently aggregated (no tagging support!) Difficult to control where data is coming from and how much data is coming in Easy to blow out disk, or send faulty metrics Centralized metrics can lead to Alerts Sensu alerts are per instance; system
  • #10: Declarative Infrastructure: State what you want, and Kubernetes will do the rest A bit simplified, as there are a lot of moving parts
  • #11: API Objects == YAML All objects are represented by YAML descriptions Each object kind/apiversion maps directly to an API endpoint Metadata has information about the object The magic lies in the Spec & Status
  • #12: Pods are the building blocks in Kubernetes These are our modern day VMs Everything else in the Kubernetes API makes pods easier to work with and reason about
  • #14: AWS and GCE integrate seamlessly with Services
  • #15: Depends on Networking
  • #17: BGP peering with ToR not required… Can do full mesh in AWS but I won’t get into it here.
  • #20: Very SIMPLE Each leaf is a Top of Rack switch All devices are exactly the same number of segments away
  • #22: Calico is backed by Etcd… It’s super easy to leverage this
  • #24: Not quite that easy...
  • #25: Not quite that easy...
  • #27: Not quite that easy...
  • #28: Our services could talk to each other now Terrible performance problems
  • #29: Now that we know what each pods look like in Kubernetes, this is our typical pod structure at Squarespace Most microservices written in Java Customized Framework Fluentd for log shipping Consul sidecar for discovery Seems simple enough, but there are so many problems we didn’t realize we had.
  • #30: Kubernetes resource constraints aren’t enough Need an understand of CGroups
  • #31: Kubernetes resource constraints aren’t enough Need an understand of CGroups
  • #32: Kubernetes resource constraints aren’t enough
  • #34: can’t rely on shares difficult to predict inside of the container how many shares are allocated on the whole machine… Could do some deeper introspection… Will allow for
  • #36: Push vs Pull metrics Same Grafana Same ELK
  • #38: Difficult to separate health of system from health of application Difficult to separate health of instance from service Metrics are collected per host
  • #39: Sensu: app and system alerts are tightly coupled Overwhelming & confusing to everyone except the guy who designed the system does not present a sense of ownership Hard to get a single view: graphite checks vs instance checks
  • #42: Alerts are defined with code Alerts are Service oriented only relevent alerts are defined: active requests, error rates, response times, # of instances up Encourages developer ownership
  • #43: Infrastructure as code
  • #44: I mentioned earlier that all of the other objects are meant to be make it easier to work with pods... Most objects are namespaced Used for organization Security Access Control How do we keep them in sync?
  • #48: Difficult to keep sidecards updated and correct across many service repositories Upgrading Consul can become a nightmare with versioning In the Ansible world this was easy… Host groups and roles could be applied from a centralized configuration.
  • #51: We learned a lot running Kubernetes in our datacenters