SlideShare a Scribd company logo
THE CLOUD
CONNECTIVITY COMPANY
GopalaKrishna Sandur Kustigi
Rajshekhar Deodurg
Date: 16th
July 2020
THE CLOUD CONNECTIVITY
COMPANY
1
Implementing a Reliable, Auto-Healing
Scalable Platform at VMware
THE CLOUD
CONNECTIVITY COMPANY 2
Robust API Gateway is Critical for IT
Enterprise Integration
VMware IT identified
implementing an API
gateway in a reliable,
auto-healing scalable
platform as a priority
We chose multiple Kubernetes
platforms to enable auto-failover
and auto-scalable capabilities in
a dual data center
Centralized log management and
analytics provide deep operational
visibility and enable faster
troubleshooting
Customized monitoring
scripts execute auto-failover
between the data centers
Combined use of several
open-source and VMware
tools equips VMware IT to
perform chaos engineering
THE CLOUD
CONNECTIVITY COMPANY 3
It Starts with Architecture
CDN
Global Traffic manager
Data center-1
Active
Kubernetes Cluster
Cassandra Pods
Kong Pods
Application Pods
Load Balancer
Proxy
Server
Proxy
Server
Load Balancer
Service
Kubernetes Cluster
Cassandra Pods
Kong Pods
Application Pods
Load Balancer
Proxy
Server
Proxy
Server
Load Balancer
Service
Data center-2
Standby
Firewall
Firewall
Proxy
Server
Proxy
Server
Proxy
Server
THE CLOUD
CONNECTIVITY COMPANY 4
Architecture Continued…
Active/Standby
dual data center
setup
Auto-failover upon
failure detection in
primary data center
Auto-failover
implemented with
custom monitoring
script at Layer-1
Integrated with
monitoring and
log aggregation
tools
THE CLOUD
CONNECTIVITY COMPANY 5
Metrics and Dashboard Monitoring
Enables Early Detection and Notification Across Application Stack
VMware Tanzu Observability by Wavefront tool
monitors Kubernetes cluster and pods
Tanzu Observability collects data from multiple
services and sources across the entire application
stack, providing real-time resource utilization of the
Kubernetes platform and at the KONG application Pod
level
Tanzu Observability also provides alert feature setup for
different failure scenarios such as pod restarts, resource limit
exceeded and more
THE CLOUD
CONNECTIVITY COMPANY 6
Tanzu Observability Dashboard
THE CLOUD
CONNECTIVITY COMPANY 7
Centralized Log Management and Analytics
We integrated KONG API gateway log and Kubernetes platform logs with
vRealize Log lnsight to achieve centralized log management and analytics
during a production issue
vRealize Log lnsight helps us manage logs with its actionable
dashboards
Log pattern monitoring was enabled with vRealize Log Insight to
provide deep operational visibility and faster troubleshooting
across different layers
THE CLOUD
CONNECTIVITY COMPANY
vRealize™ Log Insight Dashboard
THE CLOUD
CONNECTIVITY COMPANY
Chaos Engineering Enabled
Make-Break-Remake
Chaos is the art of breaking things
purposefully but in a controlled
manner
We used a combination of
open-source, in-house tools and
VMware products to create two types
of chaos
Self-induced Network
Failures
CPU Load, Memory Load, Kill Process
Fault, Disk Space Fault etc..
Network delay on container network
overlays and the virtual ethernet port,
packet drop
Self-Induced Infrastructure failures
THE CLOUD
CONNECTIVITY COMPANY
Chaos Dashboard
THE CLOUD
CONNECTIVITY COMPANY 11
Synthetic API Monitoring Results
To measure the availability of the
Kong System we use a custom
monitoring tool which
continuously invokes Tier 1 API’s
at regular intervals.
With these in-house tools for
monitoring, analytics, alerting in
place and multi-data center
architecture, we have achieved
high availability.
THE CLOUD
CONNECTIVITY COMPANY 12
Q&A

More Related Content

What's hot (20)

PDF
Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...
OlyaSurits
 
PDF
Building successful business Java apps: How to deliver more, code less, and c...
Red Hat Developers
 
PDF
APIOps: Automated Processes for Even Better APIs
OlyaSurits
 
PDF
12 FACTOR APP WITH DOCKER
TREEPTIK
 
PDF
Crossing the Streams! Rollout Strategies to Keep Your Users Happy!
VMware Tanzu
 
PPTX
Meetup 23 - 03 - Application Delivery on K8S with GitOps
Vietnam Open Infrastructure User Group
 
PPTX
[Konveyor] adding security to dev ops for your kubernetes native applications
Konveyor Community
 
ODP
Open shift 2.x and MongoDB
plarsen67
 
PDF
Future of Microservices - Jakub Hadvig
WEBtlak
 
PDF
Cloud Native Java with Spring Cloud Services
VMware Tanzu
 
PDF
Flux is incubating + the road ahead
LibbySchulze
 
PPTX
Thriving in the cloud: Going beyond the 12 factors
Grace Jansen
 
PDF
There is no such thing as “Vanilla Kubernetes”
Kangaroot
 
PPTX
Automate Workflows With The Open-source Cloud-native Tool Boomerang Flow
Konveyor Community
 
PPTX
How to Modernize Virtualized Workloads
Konveyor Community
 
PPTX
Mass Migrate Virtual Machines to Kubevirt with Tool Forklift 2.0
Konveyor Community
 
PDF
[APIdays Paris 2019] From Microservices to APIs: The API operator in Kubernetes
WSO2
 
PPTX
Mobile Testing Challenges at Zalando Tech
Zalando Technology
 
PPTX
Running Spring Boot in Kubernetes and Intro to Helm
Carlos E. Salazar
 
PDF
Exploring Kubeflow on Kubernetes for AI/ML | DevNation Tech Talk
Red Hat Developers
 
Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...
OlyaSurits
 
Building successful business Java apps: How to deliver more, code less, and c...
Red Hat Developers
 
APIOps: Automated Processes for Even Better APIs
OlyaSurits
 
12 FACTOR APP WITH DOCKER
TREEPTIK
 
Crossing the Streams! Rollout Strategies to Keep Your Users Happy!
VMware Tanzu
 
Meetup 23 - 03 - Application Delivery on K8S with GitOps
Vietnam Open Infrastructure User Group
 
[Konveyor] adding security to dev ops for your kubernetes native applications
Konveyor Community
 
Open shift 2.x and MongoDB
plarsen67
 
Future of Microservices - Jakub Hadvig
WEBtlak
 
Cloud Native Java with Spring Cloud Services
VMware Tanzu
 
Flux is incubating + the road ahead
LibbySchulze
 
Thriving in the cloud: Going beyond the 12 factors
Grace Jansen
 
There is no such thing as “Vanilla Kubernetes”
Kangaroot
 
Automate Workflows With The Open-source Cloud-native Tool Boomerang Flow
Konveyor Community
 
How to Modernize Virtualized Workloads
Konveyor Community
 
Mass Migrate Virtual Machines to Kubevirt with Tool Forklift 2.0
Konveyor Community
 
[APIdays Paris 2019] From Microservices to APIs: The API operator in Kubernetes
WSO2
 
Mobile Testing Challenges at Zalando Tech
Zalando Technology
 
Running Spring Boot in Kubernetes and Intro to Helm
Carlos E. Salazar
 
Exploring Kubeflow on Kubernetes for AI/ML | DevNation Tech Talk
Red Hat Developers
 

Similar to Implementing a Reliable, Auto-Healing Scalable Platform at VMware (20)

PDF
apidays LIVE Australia 2021 - How to Achieve Zero-Trust Security With Kuma Se...
apidays
 
PDF
apidays LIVE Paris 2021 - How to Achieve Zero-Trust Security With Kuma Servic...
apidays
 
PDF
On Engineering Analytics of Elastic IoT Cloud Systems
Hong-Linh Truong
 
PDF
Connectivity Is the Future
OlyaSurits
 
PDF
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
FogGuru MSCA Project
 
PDF
Kubernetes Native Infrastructure and CoreOS Operator Framework for 5G Edge Cl...
Hidetsugu Sugiyama
 
PPTX
Multi cloud network leveraging sd-wan reference architecture
Matsuo Sawahashi
 
PDF
Online Meetup #3 - Solo.io, Tidepool, Weaveworks, Buoyant
Solo.io
 
PDF
Cloud & Data Center Networking
Thamalsha Wijayarathna
 
PPTX
Nagios Conference 2013 - Fernando Hönig - Distributed Monitoring and Cloud Sc...
Nagios
 
PDF
Big Data and Internet of Things: A Roadmap For Smart Environments, Fog Comput...
Jiang Zhu
 
PDF
Fog Computing: A Platform for Internet of Things and Analytics
HarshitParkar6677
 
PDF
Bridging The Cloud and Application Security Gaps Meetup 15102024
lior mazor
 
PDF
From IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
DaoliCloud Ltd
 
PDF
Banv meetup 04162014
ozkan01
 
PDF
K8s-native Infrastructure as Code: einfach, deklarativ, produktiv
QAware GmbH
 
PDF
OCC-Executive-Summary-20150323
Les Williams
 
PDF
Design of an Autonomous Management and Orchestration for Fog Computing
Sabelo Dlamini
 
PDF
Kalix: Tackling the The Cloud to Edge Continuum
Jonas Bonér
 
PDF
Cloud to Edge
Wesley Reisz
 
apidays LIVE Australia 2021 - How to Achieve Zero-Trust Security With Kuma Se...
apidays
 
apidays LIVE Paris 2021 - How to Achieve Zero-Trust Security With Kuma Servic...
apidays
 
On Engineering Analytics of Elastic IoT Cloud Systems
Hong-Linh Truong
 
Connectivity Is the Future
OlyaSurits
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
FogGuru MSCA Project
 
Kubernetes Native Infrastructure and CoreOS Operator Framework for 5G Edge Cl...
Hidetsugu Sugiyama
 
Multi cloud network leveraging sd-wan reference architecture
Matsuo Sawahashi
 
Online Meetup #3 - Solo.io, Tidepool, Weaveworks, Buoyant
Solo.io
 
Cloud & Data Center Networking
Thamalsha Wijayarathna
 
Nagios Conference 2013 - Fernando Hönig - Distributed Monitoring and Cloud Sc...
Nagios
 
Big Data and Internet of Things: A Roadmap For Smart Environments, Fog Comput...
Jiang Zhu
 
Fog Computing: A Platform for Internet of Things and Analytics
HarshitParkar6677
 
Bridging The Cloud and Application Security Gaps Meetup 15102024
lior mazor
 
From IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
DaoliCloud Ltd
 
Banv meetup 04162014
ozkan01
 
K8s-native Infrastructure as Code: einfach, deklarativ, produktiv
QAware GmbH
 
OCC-Executive-Summary-20150323
Les Williams
 
Design of an Autonomous Management and Orchestration for Fog Computing
Sabelo Dlamini
 
Kalix: Tackling the The Cloud to Edge Continuum
Jonas Bonér
 
Cloud to Edge
Wesley Reisz
 
Ad

Recently uploaded (20)

PDF
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Ad

Implementing a Reliable, Auto-Healing Scalable Platform at VMware

  • 1. THE CLOUD CONNECTIVITY COMPANY GopalaKrishna Sandur Kustigi Rajshekhar Deodurg Date: 16th July 2020 THE CLOUD CONNECTIVITY COMPANY 1 Implementing a Reliable, Auto-Healing Scalable Platform at VMware
  • 2. THE CLOUD CONNECTIVITY COMPANY 2 Robust API Gateway is Critical for IT Enterprise Integration VMware IT identified implementing an API gateway in a reliable, auto-healing scalable platform as a priority We chose multiple Kubernetes platforms to enable auto-failover and auto-scalable capabilities in a dual data center Centralized log management and analytics provide deep operational visibility and enable faster troubleshooting Customized monitoring scripts execute auto-failover between the data centers Combined use of several open-source and VMware tools equips VMware IT to perform chaos engineering
  • 3. THE CLOUD CONNECTIVITY COMPANY 3 It Starts with Architecture CDN Global Traffic manager Data center-1 Active Kubernetes Cluster Cassandra Pods Kong Pods Application Pods Load Balancer Proxy Server Proxy Server Load Balancer Service Kubernetes Cluster Cassandra Pods Kong Pods Application Pods Load Balancer Proxy Server Proxy Server Load Balancer Service Data center-2 Standby Firewall Firewall Proxy Server Proxy Server Proxy Server
  • 4. THE CLOUD CONNECTIVITY COMPANY 4 Architecture Continued… Active/Standby dual data center setup Auto-failover upon failure detection in primary data center Auto-failover implemented with custom monitoring script at Layer-1 Integrated with monitoring and log aggregation tools
  • 5. THE CLOUD CONNECTIVITY COMPANY 5 Metrics and Dashboard Monitoring Enables Early Detection and Notification Across Application Stack VMware Tanzu Observability by Wavefront tool monitors Kubernetes cluster and pods Tanzu Observability collects data from multiple services and sources across the entire application stack, providing real-time resource utilization of the Kubernetes platform and at the KONG application Pod level Tanzu Observability also provides alert feature setup for different failure scenarios such as pod restarts, resource limit exceeded and more
  • 6. THE CLOUD CONNECTIVITY COMPANY 6 Tanzu Observability Dashboard
  • 7. THE CLOUD CONNECTIVITY COMPANY 7 Centralized Log Management and Analytics We integrated KONG API gateway log and Kubernetes platform logs with vRealize Log lnsight to achieve centralized log management and analytics during a production issue vRealize Log lnsight helps us manage logs with its actionable dashboards Log pattern monitoring was enabled with vRealize Log Insight to provide deep operational visibility and faster troubleshooting across different layers
  • 9. THE CLOUD CONNECTIVITY COMPANY Chaos Engineering Enabled Make-Break-Remake Chaos is the art of breaking things purposefully but in a controlled manner We used a combination of open-source, in-house tools and VMware products to create two types of chaos Self-induced Network Failures CPU Load, Memory Load, Kill Process Fault, Disk Space Fault etc.. Network delay on container network overlays and the virtual ethernet port, packet drop Self-Induced Infrastructure failures
  • 11. THE CLOUD CONNECTIVITY COMPANY 11 Synthetic API Monitoring Results To measure the availability of the Kong System we use a custom monitoring tool which continuously invokes Tier 1 API’s at regular intervals. With these in-house tools for monitoring, analytics, alerting in place and multi-data center architecture, we have achieved high availability.