This document discusses VMware IT's implementation of a reliable, auto-healing, and scalable API gateway platform. They used multiple Kubernetes platforms across two data centers to enable auto-failover and auto-scaling. Centralized logging and monitoring tools provide visibility and enable fast troubleshooting. Custom scripts perform auto-failover between data centers upon failure detection. A combination of open-source and VMware tools allow for chaos engineering and testing of resiliency.