SlideShare a Scribd company logo
Secrets of Performance
Tuning Java on
Kubernetes
Understand impact of resource constraints in the JVM
Bruno Borges
Microsoft Developer Division
Java Engineering Group (JEG), September 2022
DevDiv Java Engineering Group
Agenda
JVM inside Containers and on Kubernetes: what you must know!
• JVM Default Ergonomics
• Understand the default values of the JVM
• How the amount of memory and CPU impacts selection of Garbage Collector
• JVM Garbage Collectors
• Recommendations for better starting points in Cloud native applications
• How to tune GCs
• Java on Kubernetes
• Recommended starting points
• Topologies
• Conclusion
JVM Default Ergonomics
There’s always premature optimization
Secrets of Performance Tuning Java on Kubernetes
Survey Summary (150 ppl)
• Most devs are deploying JVM
workloads in containers with:
• Up to 4 CPUs (65%)
• Up to 4 GB RAM (65%)
• I/O intensive (50%)
• Overall
• Up to 2 GB (48%)
• Up to 3 CPUs (50%)
65% of ppl
DevDiv Java Engineering Group
JVM Ergonomics
• APM Partner
• Millions of production JVMs analysed
• Majority with 1 CPU
• Majority with 1GB or less RAM
• Majority with GC not configured
• Typical ‘fixes’ to Perf issues:
• Increase heap size
• More replicas
• Migration to another stack
• Ultimately, increased COGS
DevDiv Java Engineering Group
JVM Ergonomics
Default settings when no JVM tuning is specified.
• Available Processors
• Runtime.availableProcessors()
• Read by:
• ForkJoinPool
• Garbage Collector
• Frameworks and libraries
• If inside containers (Linux only)
• Will compute based on:
cpu.period, cpu.quota, cpu.share
• TLDR: 1-1000m = 1 cpu; 1001-2000m =
2 cpus …
• If not, whole machine set of processors
• Java 11 or later: SerialGC or G1GC
• Java 8: SerialGC or ParallelGC
• The real default : SerialGC
• Other (G1, Parallel) if 2+ processors and
1792MB or more memory available
• Default Max Heap Size
• It can be 50%, 25%, or 1/64 of available
memory.
• It depends!
DevDiv Java Engineering Group
Warning
Upcoming changes in July and October
• JDK-8230305: JDK 11 cgroups v2 – memory limits not respected
• Fix coming on OpenJDK 11 update in July 2022
• Red Hat, Corretto, Microsoft, Temurin, etc
• Other vendors may or may not have patched already
• JDK-8281181: JDK 11,17,18 – do not use CPU Share by default
• “cpu.shares” is computed based on the request value
• To compute the # of active processors, later versions won’t (by default) read CPU Shares.
• This was giving inconsistent values such as
DevDiv Java Engineering Group
JVM Ergonomics Demo
DevDiv Java Engineering Group
Main lesson
Always configure your JVM.
Do NOT
$ java –jar myapp.jar
JVM Garbage Collectors
DevDiv Java Engineering Group
What to know
• Poorly tuned GC leads to
• High pause times
• High % of time spent pausing
• Starvation of threads
• OutOfMemoryError (OOME)
• Tuning GC is worth
• Performance gains lead to Cost savings
• Setting Heap size is not enough
• Understanding the workload is key
• Select appropriate Garbage Collector
• Enough CPUs
• Performance requirements and SLAs
• The JVM Heap
• Contiguous block of memory
• Entire space is reserved
• Only some space is allocated
• Broken up into different areas or regions
• Object Creation / Removal
• Objects are created by application
(mutator) threads
• Objects are removed or relocated by
Garbage Collection
DevDiv Java Engineering Group
Heap Size Configuration
• Manually configure Heap
• -Xmx
• Set value in MB: 256m
• Set value in GB: 2g
• Great for well-sized workloads
• -XX:MaxRAMPercentage
• Set value in percentage: 75
• Great for workloads to be scaled
along container memory limits
• WARNING: watch out for Off Heap
Memory usage (e.g., Apache Spark,
Pinot, Elasticsearch, etc)
• Recommended starting point
• Servers
• Set to whatever the application needs
• Containers
• Set to whatever the application needs but
no more than 75% of container memory
limit
DevDiv Java Engineering Group
Garbage Collectors
Recommendations
Serial Parallel G1 Z Shenandoah
Number of cores 1 2+ 2+ 2+ 2+
Multi-threaded No Yes Yes Yes Yes
Java Heap size <4GBytes <4Gbytes >4GBytes >4GBytes >4GBytes
Pause Yes Yes Yes Yes (<1ms) Yes (<10ms)
Overhead Minimal Minimal Moderate Moderate Moderate
Tail-latency Effect High High High Low Moderate
JDK version All All JDK 8+ JDK 17+ JDK 11+
Best for Single core, small
heaps
Multi-core small
heaps.
Batch jobs, with any
heap size.
Responsive in
medium to large
heaps (request-
response/DB
interactions)
responsive in medium to
large heaps (request-
response/DB
interactions)
responsive in
medium to large
heaps (request-
response/DB
interactions)
DevDiv Java Engineering Group
“[GC] Tuning is basically trying to
optimize this [object] moving as
‘move as little as possible as late
as possible so as to not disturb the
flow.’”
Monica Beckwith
Principal Software Engineer
Microsoft Java Engineering Group
Watch Monica’s Tuning and Optimizing Java
Garbage Collection (infoq.com)
JVM Ergonomics and GCs – Summary
Java 11+ - OpenJDK Ergonomics will use, by default, SerialGC or G1GC
• G1GC only when 2+ available processors and 1792+ MB available memory – regardless of heap size.
• SerialGC otherwise.
ParallelGC in general outperforms G1GC for smaller heaps
• Up to 4GB, ParallelGC performs better as a throughput GC.
• Between 2-4GB, ParallelGC may still perform better for throughput, but G1GC could be considered.
• ParallelGC still triggers Stop the World (StW), impacting in latency on tail performance.
Heap size not being properly dimensioned for containers by Ergonomics
• Default ergonomics will allocate 1/4 of available memory when inside containers, and 1/64 if not in
container.
• Make sure a heap size is defined, either with -Xmx or with -XX:MaxRAMPercentage. Allocate at least 75%.
Java on Kubernetes
DevDiv Java Engineering Group
Kubernetes CPU Throttling
How it impacts the JVM
• CPU requests on Kubernetes are for CPU time
• “1000m” does NOT mean a single vCPU, or core.
• “1000m” means the application can consume a full CPU cycle per CPU period.
• “1000m” allows an application with multiple threads to run in parallel.
• When all threads combined consume “1000m” in CPU time, the application is throttled.
Example
• Thread A spends 400m; Thread B spends 500m. Thread C spends 100m.
• App now must wait 500m for the next cycle.
• Java applications are, in general, multi-threaded
• Concurrent GCs will have their own threads.
• Web apps and REST/gRPC microservices will have their own threads.
• Database Connection Pools will have their own threads.
DevDiv Java Engineering Group
JVM on Kubernetes
• Trick the JVM
• Limit may be 1000m, but you may still tell
the JVM it can use 2 or more processors!
• Use this flag:
-XX:ActiveProcessorCount
• JVM Available Processors
• Up to 1000m: 1 proc
• 1001-2000m: 2 procs
• 2001-3000m: 3 procs
• …
DevDiv Java Engineering Group
CPU 1
CPU 2
CPU 3
CPU 4
CPU Throttling
How the JVM is throttled on Kubernetes
Java Virtual Machine
Garbage
Collector
GC
Thread
GC
Thread
GC
Thread
Application
HTTP
Request
HTTP
Request
HTTP
Request
HTTP
Request
HTTP
Request
HTTP
Request
• Each request: 200m
CPU Limit: 1000m
• Remaining CPU time: 200m
Remaining CFS Period: 100ms
• GC Work (total): 200m
80ms
• Remaining CPU time: 0m 60ms
Application throttled for 60ms
DevDiv Java Engineering Group
Kubernetes: Better Starting Points
Recommendations to follow instead of JVM Ergonomics
CPU Limits
Up to 1000m
Not recommended <2GB or >4GB
ParallelGC 2 to 4GB
2000m
ParallelGC Up to 4GB
G1GC More than 4GB
4000m
G1GC More than 4GB
ZGC, Shenandoah 4GB to 32GB
ZGC 32GB or more
With 1000m or less, set:
--XX:ActiveProcessorCount=2
Memory Limits
JVM Heap at 75%
DevDiv Java Engineering Group
Benchmark
Latency: lower is better. Throughput: higher is better.
Latency
Throughout
github.com/brunoborges/aks-jvm-benchmark
DevDiv Java Engineering Group
JVM Memory
• Metaspace
• Native memory region for application
metadata such as class definitions.
• Grows as needed.
• May be cleaned up if classes or classloaders
are no longer reachable in the Heap
• JVM Flags: MetaspaceSize (initial) and
MaxMetaspaceSize
• MaxMetaspaceSize is a large number
JVM.large
Metaspace
Heap JVM.small
Metaspace
Heap
Same size
DevDiv Java Engineering Group
Azure Kubernetes Cluster
Short but wide – 6 x 4 = 24 vCPUs
VM 1
vCPU
vCPU
vCPU
vCPU
VM 2
vCPU
vCPU
vCPU
vCPU
VM 3
vCPU
vCPU
vCPU
vCPU
VM 4 VM 5 VM 6
Control
• D4 v3 VM $0.192/hour
• 4 vCPU
• 16 GB
• JVM
• 1 vCPU
• 2 GB RAM
• Garbage Collector selected by Ergonomics:
• Serial GC
• Concurrent/Parallel GCs won’t be effective
• Constant CPU Throttling on each JVM
• Constant Stop-the-World by GC
• High latency, low throughput
JVM
JVM
JVM
Control
JVM
JVM
JVM
Control
JVM
JVM
JVM
vCPU
vCPU
vCPU
vCPU
Control
JVM
JVM
JVM
vCPU
vCPU
vCPU
vCPU
JVM
JVM
JVM
JVM
vCPU
vCPU
vCPU
vCPU
JVM
JVM
JVM
JVM
• Total Resources Consumed
• 18 JVMs replicas
• 18 vCPUs
• 36 GB of RAM (of 96)
Estimate: $840.96
DevDiv Java Engineering Group
Azure Kubernetes Cluster
Tall but narrow – 3 x 8 = 24 vCPUs
VM 1 VM 2 VM 3
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
JVM JVM JVM
• D8 v3 VM $0.384/hour
• 8 vCPUs
• 32 GB
• JVM
• 8 GB RAM
• 4 vCPUs
• Total Resources Consumed
• 12 vCPUs
• 24 GB of RAM (of 96)
• Garbage Collector (recommended):
• G1GC
• Benefits
• CPU Throttling unlikely
• Lower latency, higher throughput
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Control Control Control
Savings:
- 9 vCPUs on standby
- 72 GB of RAM on standby
Estimate: $840.96 (same cost)
DevDiv Java Engineering Group
A/B Routing Multiple Topologies
Monitor the topologies for resource consumption, latency, and throughput.
Load Balancer
Topology A
Smaller JVMs
Multiple replicas
Topology B
Larger JVMs
Lesser replicas
DevDiv Java Engineering Group
Steps to Address Perf Issues
Optimize runtime for the workload
• Understand Your Tech Stack
• Understand how the runtime responds to workloads
• Understand JVM Ergonomics
• Understand JVM Garbage Collectors
• Observe and Analyze
• Monitor with Azure App Insights and other APM solutions
• Analyze JVM data with JDK Flight Recorder (JFR) and Microsoft JFR Streaming
• Analyze Garbage Collection logs with GC analyzers and Microsoft GCToolKit
• Reorganize existing resources
• Consume the same amount of resources
• Increase the performance
• Maintain or reduce the cost
DevDiv Java Engineering Group
Conclusion
Java on Kubernetes scaling
• Different workloads may need different topologies
• Scaling out with more replicas is not a silver bullet for performance increase
• Give more resources to JVMs in the beginning
• Lesser replicas, more CPU/memory
• Start with Parallel GC for smaller heaps
• Avoid JVM default ergonomics
• Ensure you know which GC is being used
• Increase performance by understanding bottlenecks
• Analyse JFR data
• Analyse GC logs
• Scale out, and up, as needed
DevDiv Java Engineering Group
developer.microsoft.com/java
Java development with Microsoft
Thank you!
Microsoft Developer Division
Java Engineering Group (JEG)

More Related Content

What's hot (20)

PDF
[Outdated] Secrets of Performance Tuning Java on Kubernetes
Bruno Borges
 
PDF
Common issues with Apache Kafka® Producer
confluent
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PPTX
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Lucas Jellema
 
PDF
Blazing Performance with Flame Graphs
Brendan Gregg
 
PDF
Scalability, Availability & Stability Patterns
Jonas Bonér
 
PDF
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Henning Jacobs
 
PDF
Designing a complete ci cd pipeline using argo events, workflow and cd products
Julian Mazzitelli
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PDF
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Henning Jacobs
 
PDF
Parquet performance tuning: the missing guide
Ryan Blue
 
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PPTX
Jenkins CI
Viyaan Jhiingade
 
PPTX
Introduction to Kafka Cruise Control
Jiangjie Qin
 
[Outdated] Secrets of Performance Tuning Java on Kubernetes
Bruno Borges
 
Common issues with Apache Kafka® Producer
confluent
 
Introduction to Apache Kafka
Jeff Holoman
 
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Kafka presentation
Mohammed Fazuluddin
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Lucas Jellema
 
Blazing Performance with Flame Graphs
Brendan Gregg
 
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Henning Jacobs
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Julian Mazzitelli
 
Kafka Streams: What it is, and how to use it?
confluent
 
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Henning Jacobs
 
Parquet performance tuning: the missing guide
Ryan Blue
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Jenkins CI
Viyaan Jhiingade
 
Introduction to Kafka Cruise Control
Jiangjie Qin
 

Similar to Secrets of Performance Tuning Java on Kubernetes (20)

PDF
JITServerTalk JCON World 2023.pdf
RichHagarty
 
PPTX
Memory Management: What You Need to Know When Moving to Java 8
AppDynamics
 
PPTX
Java performance tuning
Mohammed Fazuluddin
 
PPTX
Simple tweaks to get the most out of your jvm
Jamie Coleman
 
PPTX
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
InfinIT - Innovationsnetværket for it
 
PDF
Java-light-speed NebraskaCode.pdf
RichHagarty
 
PDF
Introduction of Java GC Tuning and Java Java Mission Control
Leon Chen
 
PDF
JITServerTalk Nebraska 2023.pdf
RichHagarty
 
PPTX
Considerations when deploying Java on Kubernetes
superserch
 
PPTX
JPrime_JITServer.pptx
Grace Jansen
 
PPTX
Simple tweaks to get the most out of your JVM
Jamie Coleman
 
PPTX
Javaland_JITServerTalk.pptx
Grace Jansen
 
PDF
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
RichHagarty
 
PPTX
Fastest Servlets in the West
Stuart (Pid) Williams
 
PPT
Optimizing your java applications for multi core hardware
IndicThreads
 
PDF
Basics of JVM Tuning
Vladislav Gangan
 
PDF
ZGC-SnowOne.pdf
Monica Beckwith
 
PDF
Java Performance Tuning
Ender Aydin Orak
 
PPTX
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
PPTX
A tour of Java and the JVM
Alex Birch
 
JITServerTalk JCON World 2023.pdf
RichHagarty
 
Memory Management: What You Need to Know When Moving to Java 8
AppDynamics
 
Java performance tuning
Mohammed Fazuluddin
 
Simple tweaks to get the most out of your jvm
Jamie Coleman
 
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
InfinIT - Innovationsnetværket for it
 
Java-light-speed NebraskaCode.pdf
RichHagarty
 
Introduction of Java GC Tuning and Java Java Mission Control
Leon Chen
 
JITServerTalk Nebraska 2023.pdf
RichHagarty
 
Considerations when deploying Java on Kubernetes
superserch
 
JPrime_JITServer.pptx
Grace Jansen
 
Simple tweaks to get the most out of your JVM
Jamie Coleman
 
Javaland_JITServerTalk.pptx
Grace Jansen
 
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
RichHagarty
 
Fastest Servlets in the West
Stuart (Pid) Williams
 
Optimizing your java applications for multi core hardware
IndicThreads
 
Basics of JVM Tuning
Vladislav Gangan
 
ZGC-SnowOne.pdf
Monica Beckwith
 
Java Performance Tuning
Ender Aydin Orak
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
A tour of Java and the JVM
Alex Birch
 
Ad

More from Bruno Borges (20)

PDF
From GitHub Source to GitHub Release: Free CICD Pipelines For JavaFX Apps
Bruno Borges
 
PDF
Making Sense of Serverless Computing
Bruno Borges
 
PPTX
Visual Studio Code for Java and Spring Developers
Bruno Borges
 
PDF
Taking Spring Apps for a Spin on Microsoft Azure Cloud
Bruno Borges
 
PDF
A Look Back at Enterprise Integration Patterns and Their Use into Today's Ser...
Bruno Borges
 
PPTX
Melhore o Desenvolvimento do Time com DevOps na Nuvem
Bruno Borges
 
PPTX
Tecnologias Oracle em Docker Containers On-premise e na Nuvem
Bruno Borges
 
PPTX
Java EE Arquillian Testing with Docker & The Cloud
Bruno Borges
 
PPTX
Migrating From Applets to Java Desktop Apps in JavaFX
Bruno Borges
 
PDF
Servidores de Aplicação: Por quê ainda precisamos deles?
Bruno Borges
 
PDF
Build and Monitor Cloud PaaS with JVM’s Nashorn JavaScripts [CON1859]
Bruno Borges
 
PDF
Cloud Services for Developers: What’s Inside Oracle Cloud for You? [CON1861]
Bruno Borges
 
PDF
Booting Up Spring Apps on Lightweight Cloud Services [CON10258]
Bruno Borges
 
PDF
Java EE Application Servers: Containerized or Multitenant? Both! [CON7506]
Bruno Borges
 
PDF
Running Oracle WebLogic on Docker Containers [BOF7537]
Bruno Borges
 
PPTX
Lightweight Java in the Cloud
Bruno Borges
 
PDF
Tweet for Beer - Beertap Powered by Java Goes IoT, Cloud, and JavaFX
Bruno Borges
 
PDF
Integrando Oracle BPM com Java EE e WebSockets
Bruno Borges
 
PPTX
The Developers Conference 2014 - Oracle Keynote
Bruno Borges
 
PDF
Crie Aplicações Mobile Híbridas Escritas em Java, para iOS e Android
Bruno Borges
 
From GitHub Source to GitHub Release: Free CICD Pipelines For JavaFX Apps
Bruno Borges
 
Making Sense of Serverless Computing
Bruno Borges
 
Visual Studio Code for Java and Spring Developers
Bruno Borges
 
Taking Spring Apps for a Spin on Microsoft Azure Cloud
Bruno Borges
 
A Look Back at Enterprise Integration Patterns and Their Use into Today's Ser...
Bruno Borges
 
Melhore o Desenvolvimento do Time com DevOps na Nuvem
Bruno Borges
 
Tecnologias Oracle em Docker Containers On-premise e na Nuvem
Bruno Borges
 
Java EE Arquillian Testing with Docker & The Cloud
Bruno Borges
 
Migrating From Applets to Java Desktop Apps in JavaFX
Bruno Borges
 
Servidores de Aplicação: Por quê ainda precisamos deles?
Bruno Borges
 
Build and Monitor Cloud PaaS with JVM’s Nashorn JavaScripts [CON1859]
Bruno Borges
 
Cloud Services for Developers: What’s Inside Oracle Cloud for You? [CON1861]
Bruno Borges
 
Booting Up Spring Apps on Lightweight Cloud Services [CON10258]
Bruno Borges
 
Java EE Application Servers: Containerized or Multitenant? Both! [CON7506]
Bruno Borges
 
Running Oracle WebLogic on Docker Containers [BOF7537]
Bruno Borges
 
Lightweight Java in the Cloud
Bruno Borges
 
Tweet for Beer - Beertap Powered by Java Goes IoT, Cloud, and JavaFX
Bruno Borges
 
Integrando Oracle BPM com Java EE e WebSockets
Bruno Borges
 
The Developers Conference 2014 - Oracle Keynote
Bruno Borges
 
Crie Aplicações Mobile Híbridas Escritas em Java, para iOS e Android
Bruno Borges
 
Ad

Recently uploaded (20)

PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Executive Business Intelligence Dashboards
vandeslie24
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 

Secrets of Performance Tuning Java on Kubernetes

  • 1. Secrets of Performance Tuning Java on Kubernetes Understand impact of resource constraints in the JVM Bruno Borges Microsoft Developer Division Java Engineering Group (JEG), September 2022
  • 2. DevDiv Java Engineering Group Agenda JVM inside Containers and on Kubernetes: what you must know! • JVM Default Ergonomics • Understand the default values of the JVM • How the amount of memory and CPU impacts selection of Garbage Collector • JVM Garbage Collectors • Recommendations for better starting points in Cloud native applications • How to tune GCs • Java on Kubernetes • Recommended starting points • Topologies • Conclusion
  • 3. JVM Default Ergonomics There’s always premature optimization
  • 5. Survey Summary (150 ppl) • Most devs are deploying JVM workloads in containers with: • Up to 4 CPUs (65%) • Up to 4 GB RAM (65%) • I/O intensive (50%) • Overall • Up to 2 GB (48%) • Up to 3 CPUs (50%) 65% of ppl
  • 6. DevDiv Java Engineering Group JVM Ergonomics • APM Partner • Millions of production JVMs analysed • Majority with 1 CPU • Majority with 1GB or less RAM • Majority with GC not configured • Typical ‘fixes’ to Perf issues: • Increase heap size • More replicas • Migration to another stack • Ultimately, increased COGS
  • 7. DevDiv Java Engineering Group JVM Ergonomics Default settings when no JVM tuning is specified. • Available Processors • Runtime.availableProcessors() • Read by: • ForkJoinPool • Garbage Collector • Frameworks and libraries • If inside containers (Linux only) • Will compute based on: cpu.period, cpu.quota, cpu.share • TLDR: 1-1000m = 1 cpu; 1001-2000m = 2 cpus … • If not, whole machine set of processors • Java 11 or later: SerialGC or G1GC • Java 8: SerialGC or ParallelGC • The real default : SerialGC • Other (G1, Parallel) if 2+ processors and 1792MB or more memory available • Default Max Heap Size • It can be 50%, 25%, or 1/64 of available memory. • It depends!
  • 8. DevDiv Java Engineering Group Warning Upcoming changes in July and October • JDK-8230305: JDK 11 cgroups v2 – memory limits not respected • Fix coming on OpenJDK 11 update in July 2022 • Red Hat, Corretto, Microsoft, Temurin, etc • Other vendors may or may not have patched already • JDK-8281181: JDK 11,17,18 – do not use CPU Share by default • “cpu.shares” is computed based on the request value • To compute the # of active processors, later versions won’t (by default) read CPU Shares. • This was giving inconsistent values such as
  • 9. DevDiv Java Engineering Group JVM Ergonomics Demo
  • 10. DevDiv Java Engineering Group Main lesson Always configure your JVM. Do NOT $ java –jar myapp.jar
  • 12. DevDiv Java Engineering Group What to know • Poorly tuned GC leads to • High pause times • High % of time spent pausing • Starvation of threads • OutOfMemoryError (OOME) • Tuning GC is worth • Performance gains lead to Cost savings • Setting Heap size is not enough • Understanding the workload is key • Select appropriate Garbage Collector • Enough CPUs • Performance requirements and SLAs • The JVM Heap • Contiguous block of memory • Entire space is reserved • Only some space is allocated • Broken up into different areas or regions • Object Creation / Removal • Objects are created by application (mutator) threads • Objects are removed or relocated by Garbage Collection
  • 13. DevDiv Java Engineering Group Heap Size Configuration • Manually configure Heap • -Xmx • Set value in MB: 256m • Set value in GB: 2g • Great for well-sized workloads • -XX:MaxRAMPercentage • Set value in percentage: 75 • Great for workloads to be scaled along container memory limits • WARNING: watch out for Off Heap Memory usage (e.g., Apache Spark, Pinot, Elasticsearch, etc) • Recommended starting point • Servers • Set to whatever the application needs • Containers • Set to whatever the application needs but no more than 75% of container memory limit
  • 14. DevDiv Java Engineering Group Garbage Collectors Recommendations Serial Parallel G1 Z Shenandoah Number of cores 1 2+ 2+ 2+ 2+ Multi-threaded No Yes Yes Yes Yes Java Heap size <4GBytes <4Gbytes >4GBytes >4GBytes >4GBytes Pause Yes Yes Yes Yes (<1ms) Yes (<10ms) Overhead Minimal Minimal Moderate Moderate Moderate Tail-latency Effect High High High Low Moderate JDK version All All JDK 8+ JDK 17+ JDK 11+ Best for Single core, small heaps Multi-core small heaps. Batch jobs, with any heap size. Responsive in medium to large heaps (request- response/DB interactions) responsive in medium to large heaps (request- response/DB interactions) responsive in medium to large heaps (request- response/DB interactions)
  • 15. DevDiv Java Engineering Group “[GC] Tuning is basically trying to optimize this [object] moving as ‘move as little as possible as late as possible so as to not disturb the flow.’” Monica Beckwith Principal Software Engineer Microsoft Java Engineering Group Watch Monica’s Tuning and Optimizing Java Garbage Collection (infoq.com)
  • 16. JVM Ergonomics and GCs – Summary Java 11+ - OpenJDK Ergonomics will use, by default, SerialGC or G1GC • G1GC only when 2+ available processors and 1792+ MB available memory – regardless of heap size. • SerialGC otherwise. ParallelGC in general outperforms G1GC for smaller heaps • Up to 4GB, ParallelGC performs better as a throughput GC. • Between 2-4GB, ParallelGC may still perform better for throughput, but G1GC could be considered. • ParallelGC still triggers Stop the World (StW), impacting in latency on tail performance. Heap size not being properly dimensioned for containers by Ergonomics • Default ergonomics will allocate 1/4 of available memory when inside containers, and 1/64 if not in container. • Make sure a heap size is defined, either with -Xmx or with -XX:MaxRAMPercentage. Allocate at least 75%.
  • 18. DevDiv Java Engineering Group Kubernetes CPU Throttling How it impacts the JVM • CPU requests on Kubernetes are for CPU time • “1000m” does NOT mean a single vCPU, or core. • “1000m” means the application can consume a full CPU cycle per CPU period. • “1000m” allows an application with multiple threads to run in parallel. • When all threads combined consume “1000m” in CPU time, the application is throttled. Example • Thread A spends 400m; Thread B spends 500m. Thread C spends 100m. • App now must wait 500m for the next cycle. • Java applications are, in general, multi-threaded • Concurrent GCs will have their own threads. • Web apps and REST/gRPC microservices will have their own threads. • Database Connection Pools will have their own threads.
  • 19. DevDiv Java Engineering Group JVM on Kubernetes • Trick the JVM • Limit may be 1000m, but you may still tell the JVM it can use 2 or more processors! • Use this flag: -XX:ActiveProcessorCount • JVM Available Processors • Up to 1000m: 1 proc • 1001-2000m: 2 procs • 2001-3000m: 3 procs • …
  • 20. DevDiv Java Engineering Group CPU 1 CPU 2 CPU 3 CPU 4 CPU Throttling How the JVM is throttled on Kubernetes Java Virtual Machine Garbage Collector GC Thread GC Thread GC Thread Application HTTP Request HTTP Request HTTP Request HTTP Request HTTP Request HTTP Request • Each request: 200m CPU Limit: 1000m • Remaining CPU time: 200m Remaining CFS Period: 100ms • GC Work (total): 200m 80ms • Remaining CPU time: 0m 60ms Application throttled for 60ms
  • 21. DevDiv Java Engineering Group Kubernetes: Better Starting Points Recommendations to follow instead of JVM Ergonomics CPU Limits Up to 1000m Not recommended <2GB or >4GB ParallelGC 2 to 4GB 2000m ParallelGC Up to 4GB G1GC More than 4GB 4000m G1GC More than 4GB ZGC, Shenandoah 4GB to 32GB ZGC 32GB or more With 1000m or less, set: --XX:ActiveProcessorCount=2 Memory Limits JVM Heap at 75%
  • 22. DevDiv Java Engineering Group Benchmark Latency: lower is better. Throughput: higher is better. Latency Throughout github.com/brunoborges/aks-jvm-benchmark
  • 23. DevDiv Java Engineering Group JVM Memory • Metaspace • Native memory region for application metadata such as class definitions. • Grows as needed. • May be cleaned up if classes or classloaders are no longer reachable in the Heap • JVM Flags: MetaspaceSize (initial) and MaxMetaspaceSize • MaxMetaspaceSize is a large number JVM.large Metaspace Heap JVM.small Metaspace Heap Same size
  • 24. DevDiv Java Engineering Group Azure Kubernetes Cluster Short but wide – 6 x 4 = 24 vCPUs VM 1 vCPU vCPU vCPU vCPU VM 2 vCPU vCPU vCPU vCPU VM 3 vCPU vCPU vCPU vCPU VM 4 VM 5 VM 6 Control • D4 v3 VM $0.192/hour • 4 vCPU • 16 GB • JVM • 1 vCPU • 2 GB RAM • Garbage Collector selected by Ergonomics: • Serial GC • Concurrent/Parallel GCs won’t be effective • Constant CPU Throttling on each JVM • Constant Stop-the-World by GC • High latency, low throughput JVM JVM JVM Control JVM JVM JVM Control JVM JVM JVM vCPU vCPU vCPU vCPU Control JVM JVM JVM vCPU vCPU vCPU vCPU JVM JVM JVM JVM vCPU vCPU vCPU vCPU JVM JVM JVM JVM • Total Resources Consumed • 18 JVMs replicas • 18 vCPUs • 36 GB of RAM (of 96) Estimate: $840.96
  • 25. DevDiv Java Engineering Group Azure Kubernetes Cluster Tall but narrow – 3 x 8 = 24 vCPUs VM 1 VM 2 VM 3 vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU JVM JVM JVM • D8 v3 VM $0.384/hour • 8 vCPUs • 32 GB • JVM • 8 GB RAM • 4 vCPUs • Total Resources Consumed • 12 vCPUs • 24 GB of RAM (of 96) • Garbage Collector (recommended): • G1GC • Benefits • CPU Throttling unlikely • Lower latency, higher throughput vCPU vCPU vCPU vCPU vCPU vCPU Control Control Control Savings: - 9 vCPUs on standby - 72 GB of RAM on standby Estimate: $840.96 (same cost)
  • 26. DevDiv Java Engineering Group A/B Routing Multiple Topologies Monitor the topologies for resource consumption, latency, and throughput. Load Balancer Topology A Smaller JVMs Multiple replicas Topology B Larger JVMs Lesser replicas
  • 27. DevDiv Java Engineering Group Steps to Address Perf Issues Optimize runtime for the workload • Understand Your Tech Stack • Understand how the runtime responds to workloads • Understand JVM Ergonomics • Understand JVM Garbage Collectors • Observe and Analyze • Monitor with Azure App Insights and other APM solutions • Analyze JVM data with JDK Flight Recorder (JFR) and Microsoft JFR Streaming • Analyze Garbage Collection logs with GC analyzers and Microsoft GCToolKit • Reorganize existing resources • Consume the same amount of resources • Increase the performance • Maintain or reduce the cost
  • 28. DevDiv Java Engineering Group Conclusion Java on Kubernetes scaling • Different workloads may need different topologies • Scaling out with more replicas is not a silver bullet for performance increase • Give more resources to JVMs in the beginning • Lesser replicas, more CPU/memory • Start with Parallel GC for smaller heaps • Avoid JVM default ergonomics • Ensure you know which GC is being used • Increase performance by understanding bottlenecks • Analyse JFR data • Analyse GC logs • Scale out, and up, as needed
  • 29. DevDiv Java Engineering Group developer.microsoft.com/java Java development with Microsoft
  • 30. Thank you! Microsoft Developer Division Java Engineering Group (JEG)