SlideShare a Scribd company logo
Using Kubernetes to increase
developer velocity
(without sacrificing quality)
Adam Schepis
Architect @ CloudHealth Technologies
About Me
● At CloudHealth I build things in the cloud
that help our customers to confidently
build things in the cloud.
● I love working on distributed systems with
high scalability requirements.
● I have met Spiderman.
@aschepis
Our Challenges
Growth
Our Challenges
Maturing Market
Our Challenges
Innovation in the Cloud
● AWS - 100+ feature announcements in 2018
● Azure - 13 announcements (chunkier)
● GCP - Next '18 in July (more than 100
announcements last year)
● Started our Kubernetes journey in early
2017
● Running a number of production workloads
● Kubernetes is a key component of platform
overhaul in 2018
CloudHealth
and Kubernetes
7 © 2018 CLOUDHEALTH® TECHNOLOGIES INC.
Our Stack
Helm
● Service Lifecycle
● Values templates
● Trivial rollback
● Canary
deployments were
a bit tricky (would
love suggestions)
Linkerd
● Service discovery
● Circuit breakers
● Metrics
● Distributed tracing
(via Zipkin)
Romana
● CNI
● Cloud Native
● Works well with
AWS
● Not a full mesh
network
8 © 2018 CLOUDHEALTH® TECHNOLOGIES INC.
Primary Clusters
Development
● ~50 Nodes
● Namespace per
developer
● Devs given free
reign within their
namespace.
● Collaboration via
linkerd
Test/Staging
● ~20 Nodes
● Stable version of
each service
● Namespace per-
pull request
● Integration tests w/
new code + stable
services
Production
● ~50 Nodes
● Namespace per-
service group
● Tight restrictions on
access
● Distributed tracing
9 © 2018 CLOUDHEALTH® TECHNOLOGIES INC.
Development Environments
● ch CLI tool
● Light wrapper around setup, dev, and service lifecycle
● simplifies and accelerates dev workflow
● commands
○ ch init
○ ch new service <foo>
○ ch build
○ ch deploy
○ ch run (build + deploy)
○ ch supervisor
Consistency drives both velocity and quality
10 © 2018 CLOUDHEALTH® TECHNOLOGIES INC.
Development Environments
● Service endpoints are always in helm chart values
○ Injected as environment variables
○ can use any namespace https://auth/graphql for my namespace or
https://auth.david/graphql for David's namespace
● Test locally by using linkerd endpoint as http_proxy to reach remote services in
desired namespace
● Namespace for each developer
● Collaborate without deploying the world.
Collaborating in a shared dev cluster
11 © 2018 CLOUDHEALTH® TECHNOLOGIES INC.
Our Build Pipeline
Pull Request
Human Gate
Staging
PR namespace
Prod
Canary
Prod
Svc Group
- Unit
- Integration
- Pact
(Contracts)
- Javadoc
- RDoc
- etc.
Published to
S3.
12 © 2018 CLOUDHEALTH® TECHNOLOGIES INC.
When things go wrong
● Each PR has its own namespace to deploy service into
● Integration tests operate against stable versions of services it depends on
● When failure happens dev can:
○ Access resources in namespace through linkerd with customer header
○ Shell into a pod to check it out
○ Look at logs
○ Exercise failing service manually
○ Access UI of failing service (if one exists)
Failed Builds
13 © 2018 CLOUDHEALTH® TECHNOLOGIES INC.
When things go wrong
● Human gates between canary and full prod deploy
● Canaries can be validated by
○ verifying that it is serving production requests
○ looking at error reporting service
○ using linkerd headers to ensure a request against canary
○ tailing logs
○ looking at performance and application metrics compared to current production
code
● Can temporarily scale down to 0.
● Rollback with helm is fast and trivial.
Bad Canaries
14 © 2018 CLOUDHEALTH® TECHNOLOGIES INC.
● Consistency in tooling == velocity and quality
● Shared dev cluster
○ collaboration, shared understanding
● Namespace/deploy per PR
○ Velocity - faster to diagnose and fix test failures
○ Quality - easier to reach root cause via live debugging
● Canary Builds
○ Quickly detect bad deploys without heavy impact to customers
○ Confidence in deploys post-canary
In Summary
Thank you! Questions?
Adam Schepis
@aschepis

More Related Content

PPTX
From development to production: Deploying Java and Scala apps to kubernetes
Olanga Ochieng'
 
PDF
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
PPTX
Develop, deploy, and operate services at reddit scale oscon 2018
Gregory Taylor
 
PDF
Kubernetes at Reddit: An Origin Story - KubeCon NA 2018
Gregory Taylor
 
PPTX
Migrating a Large Fortune 100 Healthcare Company to Kubernetes in 7 months
Konveyor Community
 
PPTX
[Konveyor] migrate and modernize your application portfolio to kubernetes wit...
Konveyor Community
 
PPTX
Helm at reddit: from local dev, staging, to production
Gregory Taylor
 
PPTX
Is a ORCHESTRATION a new milestone?
Piotr Perzyna
 
From development to production: Deploying Java and Scala apps to kubernetes
Olanga Ochieng'
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
Develop, deploy, and operate services at reddit scale oscon 2018
Gregory Taylor
 
Kubernetes at Reddit: An Origin Story - KubeCon NA 2018
Gregory Taylor
 
Migrating a Large Fortune 100 Healthcare Company to Kubernetes in 7 months
Konveyor Community
 
[Konveyor] migrate and modernize your application portfolio to kubernetes wit...
Konveyor Community
 
Helm at reddit: from local dev, staging, to production
Gregory Taylor
 
Is a ORCHESTRATION a new milestone?
Piotr Perzyna
 

What's hot (20)

PDF
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
WSO2
 
PDF
Continuous Deployment for Staging and Production Environments
OlyaSurits
 
PPTX
Meetup 23 - 03 - Application Delivery on K8S with GitOps
Vietnam Open Infrastructure User Group
 
PDF
The Building Blocks of DX: K8s Evolution from CLI to GitOps
OlyaSurits
 
PDF
Beyond OpenStack | OpenStack in Real Life
Opsta
 
PPTX
C# development workflow @ criteo
Ibrahim Abubakari
 
PDF
Designing a complete ci cd pipeline using argo events, workflow and cd products
Julian Mazzitelli
 
PDF
GitOps (& Flux) for Helm Users with Scott Rigby
Weaveworks
 
PDF
Security, Automation and the Software Supply Chain
OlyaSurits
 
PDF
The what, why and how of knative
Mofizur Rahman
 
PPTX
[Konveyor] address technical risks when implementing workload modernization u...
Konveyor
 
PDF
Building Event-Driven Workflows with Knative and Tekton
Leon Stigter
 
PDF
Knative Intro
Joe Searcy
 
PPTX
Cost Control and Rapid Innovation in Kubernetes with OpenRewrite
Konveyor Community
 
PDF
WKP Team Workspaces Webinar
Weaveworks
 
PDF
Accelerate your business and reduce cost with OpenStack
Opsta
 
PDF
SFScon18 - Gerhard Sulzberger - Jason Tevnan - gitops with gitlab + terraform
South Tyrol Free Software Conference
 
PDF
Exploring Kubeflow on Kubernetes for AI/ML | DevNation Tech Talk
Red Hat Developers
 
PDF
E bpf and profilers
LibbySchulze
 
PDF
Cicd pixelfederation
Juraj Hantak
 
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
WSO2
 
Continuous Deployment for Staging and Production Environments
OlyaSurits
 
Meetup 23 - 03 - Application Delivery on K8S with GitOps
Vietnam Open Infrastructure User Group
 
The Building Blocks of DX: K8s Evolution from CLI to GitOps
OlyaSurits
 
Beyond OpenStack | OpenStack in Real Life
Opsta
 
C# development workflow @ criteo
Ibrahim Abubakari
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Julian Mazzitelli
 
GitOps (& Flux) for Helm Users with Scott Rigby
Weaveworks
 
Security, Automation and the Software Supply Chain
OlyaSurits
 
The what, why and how of knative
Mofizur Rahman
 
[Konveyor] address technical risks when implementing workload modernization u...
Konveyor
 
Building Event-Driven Workflows with Knative and Tekton
Leon Stigter
 
Knative Intro
Joe Searcy
 
Cost Control and Rapid Innovation in Kubernetes with OpenRewrite
Konveyor Community
 
WKP Team Workspaces Webinar
Weaveworks
 
Accelerate your business and reduce cost with OpenStack
Opsta
 
SFScon18 - Gerhard Sulzberger - Jason Tevnan - gitops with gitlab + terraform
South Tyrol Free Software Conference
 
Exploring Kubeflow on Kubernetes for AI/ML | DevNation Tech Talk
Red Hat Developers
 
E bpf and profilers
LibbySchulze
 
Cicd pixelfederation
Juraj Hantak
 
Ad

Similar to Kubernetes: Increasing velocity without sacrificing quality (20)

PDF
Kubecon seattle 2018 workshop slides
Weaveworks
 
PDF
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
Łukasz Piątkowski
 
PDF
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
DevOps.com
 
PDF
GOTOpia 2/2021 "Cloud Native Development Without the Toil: An Overview of Pra...
Daniel Bryant
 
PDF
Cloud native development without the toil
Ambassador Labs
 
PDF
Java on AWS Without the Headaches - Fast Builds, Cheap Deploys, No Kubernetes
VictorSzoltysek
 
PDF
DevOps Days Boston 2017: Developer first workflows for Kubernetes
Ambassador Labs
 
PDF
Velocity NY 2018 "The Cloud Native Developer Workflow"
Daniel Bryant
 
PDF
CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"
Daniel Bryant
 
PDF
Future of Kubernetes and its Impact on Technology Industry.pdf
Urolime Technologies
 
PDF
Xpdays: Kubernetes CI-CD Frameworks Case Study
Denys Vasyliev
 
PDF
Kubernetes Best Practices 1st Edition Brendan Burns Eddie Villalba
duukkoofi65
 
PDF
Immediate download Kubernetes Best Practices 1st Edition Brendan Burns ebooks...
seinersofhia
 
PDF
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Sonja Schweigert
 
PDF
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Weaveworks
 
PDF
DevOpsDays Houston 2019 - Dan Kirkpatrick - My Kubernetes Tool Chain: Open-So...
DevOpsDays Houston
 
PDF
The Docker Kubernetes Training - Docker Kubernetes Course Online 2025.pdf
venkatakrishnavisual
 
PDF
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
Daniel Bryant
 
PDF
Cocktail of Environments. How to Mix Test and Development Environments and St...
Aleksandr Tarasov
 
PPTX
muCon 2019: "Creating an Effective Developer Experience for Cloud-Native Apps"
Daniel Bryant
 
Kubecon seattle 2018 workshop slides
Weaveworks
 
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
Łukasz Piątkowski
 
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
DevOps.com
 
GOTOpia 2/2021 "Cloud Native Development Without the Toil: An Overview of Pra...
Daniel Bryant
 
Cloud native development without the toil
Ambassador Labs
 
Java on AWS Without the Headaches - Fast Builds, Cheap Deploys, No Kubernetes
VictorSzoltysek
 
DevOps Days Boston 2017: Developer first workflows for Kubernetes
Ambassador Labs
 
Velocity NY 2018 "The Cloud Native Developer Workflow"
Daniel Bryant
 
CNCF Webinar Series: "Creating an Effective Developer Experience on Kubernetes"
Daniel Bryant
 
Future of Kubernetes and its Impact on Technology Industry.pdf
Urolime Technologies
 
Xpdays: Kubernetes CI-CD Frameworks Case Study
Denys Vasyliev
 
Kubernetes Best Practices 1st Edition Brendan Burns Eddie Villalba
duukkoofi65
 
Immediate download Kubernetes Best Practices 1st Edition Brendan Burns ebooks...
seinersofhia
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Sonja Schweigert
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Weaveworks
 
DevOpsDays Houston 2019 - Dan Kirkpatrick - My Kubernetes Tool Chain: Open-So...
DevOpsDays Houston
 
The Docker Kubernetes Training - Docker Kubernetes Course Online 2025.pdf
venkatakrishnavisual
 
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
Daniel Bryant
 
Cocktail of Environments. How to Mix Test and Development Environments and St...
Aleksandr Tarasov
 
muCon 2019: "Creating an Effective Developer Experience for Cloud-Native Apps"
Daniel Bryant
 
Ad

Recently uploaded (20)

PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Software Development Methodologies in 2025
KodekX
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Doc9.....................................
SofiaCollazos
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 

Kubernetes: Increasing velocity without sacrificing quality

  • 1. Using Kubernetes to increase developer velocity (without sacrificing quality) Adam Schepis Architect @ CloudHealth Technologies
  • 2. About Me ● At CloudHealth I build things in the cloud that help our customers to confidently build things in the cloud. ● I love working on distributed systems with high scalability requirements. ● I have met Spiderman. @aschepis
  • 5. Our Challenges Innovation in the Cloud ● AWS - 100+ feature announcements in 2018 ● Azure - 13 announcements (chunkier) ● GCP - Next '18 in July (more than 100 announcements last year)
  • 6. ● Started our Kubernetes journey in early 2017 ● Running a number of production workloads ● Kubernetes is a key component of platform overhaul in 2018 CloudHealth and Kubernetes
  • 7. 7 © 2018 CLOUDHEALTH® TECHNOLOGIES INC. Our Stack Helm ● Service Lifecycle ● Values templates ● Trivial rollback ● Canary deployments were a bit tricky (would love suggestions) Linkerd ● Service discovery ● Circuit breakers ● Metrics ● Distributed tracing (via Zipkin) Romana ● CNI ● Cloud Native ● Works well with AWS ● Not a full mesh network
  • 8. 8 © 2018 CLOUDHEALTH® TECHNOLOGIES INC. Primary Clusters Development ● ~50 Nodes ● Namespace per developer ● Devs given free reign within their namespace. ● Collaboration via linkerd Test/Staging ● ~20 Nodes ● Stable version of each service ● Namespace per- pull request ● Integration tests w/ new code + stable services Production ● ~50 Nodes ● Namespace per- service group ● Tight restrictions on access ● Distributed tracing
  • 9. 9 © 2018 CLOUDHEALTH® TECHNOLOGIES INC. Development Environments ● ch CLI tool ● Light wrapper around setup, dev, and service lifecycle ● simplifies and accelerates dev workflow ● commands ○ ch init ○ ch new service <foo> ○ ch build ○ ch deploy ○ ch run (build + deploy) ○ ch supervisor Consistency drives both velocity and quality
  • 10. 10 © 2018 CLOUDHEALTH® TECHNOLOGIES INC. Development Environments ● Service endpoints are always in helm chart values ○ Injected as environment variables ○ can use any namespace https://auth/graphql for my namespace or https://auth.david/graphql for David's namespace ● Test locally by using linkerd endpoint as http_proxy to reach remote services in desired namespace ● Namespace for each developer ● Collaborate without deploying the world. Collaborating in a shared dev cluster
  • 11. 11 © 2018 CLOUDHEALTH® TECHNOLOGIES INC. Our Build Pipeline Pull Request Human Gate Staging PR namespace Prod Canary Prod Svc Group - Unit - Integration - Pact (Contracts) - Javadoc - RDoc - etc. Published to S3.
  • 12. 12 © 2018 CLOUDHEALTH® TECHNOLOGIES INC. When things go wrong ● Each PR has its own namespace to deploy service into ● Integration tests operate against stable versions of services it depends on ● When failure happens dev can: ○ Access resources in namespace through linkerd with customer header ○ Shell into a pod to check it out ○ Look at logs ○ Exercise failing service manually ○ Access UI of failing service (if one exists) Failed Builds
  • 13. 13 © 2018 CLOUDHEALTH® TECHNOLOGIES INC. When things go wrong ● Human gates between canary and full prod deploy ● Canaries can be validated by ○ verifying that it is serving production requests ○ looking at error reporting service ○ using linkerd headers to ensure a request against canary ○ tailing logs ○ looking at performance and application metrics compared to current production code ● Can temporarily scale down to 0. ● Rollback with helm is fast and trivial. Bad Canaries
  • 14. 14 © 2018 CLOUDHEALTH® TECHNOLOGIES INC. ● Consistency in tooling == velocity and quality ● Shared dev cluster ○ collaboration, shared understanding ● Namespace/deploy per PR ○ Velocity - faster to diagnose and fix test failures ○ Quality - easier to reach root cause via live debugging ● Canary Builds ○ Quickly detect bad deploys without heavy impact to customers ○ Confidence in deploys post-canary In Summary
  • 15. Thank you! Questions? Adam Schepis @aschepis

Editor's Notes

  • #3: I'm adam architect at cloudhealth What gets me excited in the morning is building systems (often distributed) with high scalability requirements
  • #4: We have grown (a lot!) 30-260 in 3 years eng 10 -> 70 code "grew organically" with us More devs + big, complex platform + tribal knowledge = a drag on velocity
  • #5: Market has matured QUALITY! Our customers aren't early adopters any more No tolerance for product or data quality issues
  • #6: Innovation in the Cloud VELOCITY! 100+ announcements in 6 weeks of 2018 Azure - 13 very chunk announcements Hybrid Cloud/Datacenters enterprise customers ask for this cloud + datacenter will exist for the foreseeable future in large enterprises International growth Alibaba supporting many currencies
  • #7: Decided on k8s in early 2017 Evaluated ECS, Mesos, Docker Swarm We run production workloads for background analytics and batch jobs for serving data in mainline customer requests in application Kubernetes is one of the backbones of our platform overhaul strategy in 2018
  • #8: We use helm (wrapped in some light custom tooling) for managing service lifecycles It has worked very well canary deployments were a bit painful i would love to talk to people who have done canaries via helm or use helm and do canary deploys Linkerd for our service mesh daemonset in k8s teams don't have to worry about deploying sidecars Platform team doesn't have to run around explaining why they should) we get distributed tracing via zipkin telemeter Romana CNI Originally used weave but had some issues as cluster approached 50 nodes this may have been our inexperience Romana has been beneficial for us since we are on AWS and it intelligently manages route tables for us, avoiding limitation imposed by AWS Like pretty much everyone else we also use a whole bunch of other technologies for building, delivering, and monitoring services.
  • #9: Dev Cluster shared by engineering team each engineer has a namespace and they can deploy
  • #10: Golang built for macOS, linux light wrapper enough to make faster, not so heavy that you can't see under the covers. ch init set up dev env setup dev tools (kubectl, helm, ...) minikube (not by default anymore) Self-provision access to development/staging cluster through google auth
  • #11: Adding service http_proxy No native support on Node. 😢
  • #13: Reasons for failures Unit test failure integration test failure performance regressions contract validation failures What can a dev do Because the failing build still lives in a namespace a dev can inspect the running service, perform tests, etc