SlideShare a Scribd company logo
Brought to you by
Get Lower Latency and
Higher Throughput for
Java Applications
Simon Ritter
Deputy CTO at
Simon Ritter
Deputy CTO
■ Java Champion and two times JavaOne Rockstar
■ 99th
Percentile is the hard part of performance
■ Away from work, my son and I are restoring a Classic Mini
JVM Performance Challenges
■ Latency
● Biggest issue is Garbage Collection
● Stop-the-world pauses for almost all collectors
● Pauses are typically proportional to heap size, not live data
■ Throughput
● Adaptive JIT compilation: Interpreted, C1 compiled, C2 compiled
● Deoptimisations
● Level of optimisation is key
■ Warmup
● Time taken to get to fully optimised code for all hot methods
● Restart of an application requires the same warmup work to be carried out
Azul Platform Prime: An Alternative JVM
■ Based on OpenJDK source code
■ Passes all Java SE TCK/JCK tests
● Drop-in replacement for other JVMs
● No application code changes, no recompilation
■ Hotspot collectors replaced with C4
■ C2 JIT compiler replaced with Falcon
■ ReadyNow! warm up elimination technology
Azul Continuous Concurrent Compacting
Collector (C4)
C4 Basics
■ Generational (young and old)
● Uses the same GC collector for both
● For efficiency rather than pause containment
■ All phases are parallel
■ No STW compacting fallback
● Heap scales from 512Mb to 12Tb (with no change to GC latency)
■ Algorithm is mark, relocate, remap
■ Only supported on Linux
● Sophisticated OS memory management interaction
Loaded Value Barrier
■ Read barrier
● Tests all object references as they are loaded
■ Enforces two invariants
● Reference is marked through
● Reference points to correct object position
■ Minimal performance overhead
● Test and jump (2 instructions)
● x86 architecture reduces this to one micro-op
Concurrent Mark Phase
Root Set
GC Threads
App Threads
X
X
X
X
X
Relocation Phase
Compaction
A B C D E
A’ B’ C’ D’ E’
A -> A’ B -> B’ C -> C’ D -> D’ E -> E’
Remapping Phase
App Threads
GC Threads
A -> A’ B -> B’ C -> C’ D -> D’ E -> E’
X
X
X
Measuring Platform Performance
■ jHiccup
■ Spends most of its time asleep
● Minimal effect on performance
● Wakes every 1 ms
● Records delta of time it expects to wake up
● Measured effect is what would be experienced by your application
■ Generates histogram log files
● These can be graphed for easy evaluation
Eliminating ElasticSearch Latency
HotSpot Azul Prime
128Gb heap
Prime:128GB:
Prime:128GB:
Eliminating ElasticSearch Latency
HotSpot Azul Prime
128Gb heap
Prime:128GB:
Prime:128GB:
Azul Falcon JIT Compiler
Advancing Adaptive Compilation
■ Replacement for C2 JIT compiler
■ Azul Falcon compiler
● Based on latest compiler research
● LLVM project
■ Better performance
● Better intrinsics
● More inlining
● Fewer compiler excludes
Vector Code Example
■ Conditional array cell addition loop
● Hard for compiler to identify for vector instruction use
private void addArraysIfEven(int a[], int b[]) {
if (a.length != b.length)
throw new RuntimeException("length mismatch");
for (int i = 0; i < a.length; i++)
if ((b[i] & 0x1) == 0)
a[i] += b[i];
}
Traditional JVM JIT
Per element jumps
2 elements per iteration
Falcon JIT
Using AVX2 vector instructions
32 elements per iteration
Broadwell E5-2690-v4
Recent Customer Success Story
■ Leading cloud-based IT security company
● Cloud security, compliance and other services
■ Big Kafka user
● 2.5 billion messages across Kafka clusters daily
● Initially approached us about their Cassandra clusters and eliminating latency
■ Kafka improvements
● 20% performance gain, out-of-the-box, with no tuning
● Falcon improved code generation
● Resulted in a 15% saving in cloud hardware costs
● Platform Core was effectively cheaper than free!
ReadyNow! Warmup Elimination Technology
■ Save JVM JIT profiling information
● Classes loaded
● Classes initialised
● Instruction profiling data
● Speculative optimisation failure data
■ Data can be gathered over much longer period
● JVM/JIT profiles quickly
● Significant reduction in deoptimisations
■ Able to load, initialise and compile most code before main()
Impact on Latency
Before
After
Compile Stashing Effect
Performance
Time
Performance
Time
Without Compile Stashing
With Compile Stashing
Up to 80% reduction in compile time
and 60% reduction in CPU load
Summary
Improving Java Performance
■ Collect and re-use profiles to reduce warm-up time
■ Use alternative JIT compilation strategies
■ Eliminate GC STW pauses through use of read-barrier
■ Azul working to deliver better Java performance.
Brought to you by
Simon Ritter
sritter@azul.com
@speakjava

More Related Content

What's hot (20)

PDF
Keeping Latency Low and Throughput High with Application-level Priority Manag...
ScyllaDB
 
PDF
Whoops! I Rewrote It in Rust
ScyllaDB
 
PDF
Continuous Go Profiling & Observability
ScyllaDB
 
PDF
DB Latency Using DRAM + PMem in App Direct & Memory Modes
ScyllaDB
 
PDF
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Anne Nicolas
 
PDF
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
ScyllaDB
 
PDF
Understanding Apache Kafka P99 Latency at Scale
ScyllaDB
 
PDF
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
ScyllaDB
 
PDF
OSNoise Tracer: Who Is Stealing My CPU Time?
ScyllaDB
 
PDF
Spying on the Linux kernel for fun and profit
Andrea Righi
 
POTX
Performance Tuning EC2 Instances
Brendan Gregg
 
PDF
RxNetty vs Tomcat Performance Results
Brendan Gregg
 
PDF
ACM Applicative System Methodology 2016
Brendan Gregg
 
PDF
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
 
PDF
Linux Performance 2018 (PerconaLive keynote)
Brendan Gregg
 
PDF
New Ways to Find Latency in Linux Using Tracing
ScyllaDB
 
PDF
Rust Is Safe. But Is It Fast?
ScyllaDB
 
PDF
YOW2021 Computing Performance
Brendan Gregg
 
PDF
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
PPTX
Modern Linux Tracing Landscape
Kernel TLV
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
ScyllaDB
 
Whoops! I Rewrote It in Rust
ScyllaDB
 
Continuous Go Profiling & Observability
ScyllaDB
 
DB Latency Using DRAM + PMem in App Direct & Memory Modes
ScyllaDB
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Anne Nicolas
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
ScyllaDB
 
Understanding Apache Kafka P99 Latency at Scale
ScyllaDB
 
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
ScyllaDB
 
OSNoise Tracer: Who Is Stealing My CPU Time?
ScyllaDB
 
Spying on the Linux kernel for fun and profit
Andrea Righi
 
Performance Tuning EC2 Instances
Brendan Gregg
 
RxNetty vs Tomcat Performance Results
Brendan Gregg
 
ACM Applicative System Methodology 2016
Brendan Gregg
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
 
Linux Performance 2018 (PerconaLive keynote)
Brendan Gregg
 
New Ways to Find Latency in Linux Using Tracing
ScyllaDB
 
Rust Is Safe. But Is It Fast?
ScyllaDB
 
YOW2021 Computing Performance
Brendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
Modern Linux Tracing Landscape
Kernel TLV
 

Similar to Get Lower Latency and Higher Throughput for Java Applications (20)

PPTX
Building a Better JVM
Simon Ritter
 
PPTX
Keeping Your Java Hot by Solving the JVM Warmup Problem
Simon Ritter
 
PDF
JVM Mechanics: A Peek Under the Hood
Azul Systems Inc.
 
PPTX
JVM @ Taobao - QCon Hangzhou 2011
Kris Mok
 
PPT
Best Practices for performance evaluation and diagnosis of Java Applications ...
IndicThreads
 
PDF
The Art of Java Benchmarking
Azul Systems Inc.
 
PPTX
Jvm problem diagnostics
Danijel Mitar
 
KEY
JavaOne 2012 - JVM JIT for Dummies
Charles Nutter
 
PPTX
Java Jit. Compilation and optimization by Andrey Kovalenko
Valeriia Maliarenko
 
PDF
Seminar.2009.Performance.Intro
roialdaag
 
PDF
JVM Mechanics: When Does the JVM JIT & Deoptimize?
Doug Hawkins
 
PPT
Optimizing your java applications for multi core hardware
IndicThreads
 
PDF
Elastic JVM for Scalable Java EE Applications Running in Containers #Jakart...
Jelastic Multi-Cloud PaaS
 
PDF
What's Inside a JVM?
Azul Systems Inc.
 
PDF
Silicon Valley JUG: JVM Mechanics
Azul Systems, Inc.
 
PDF
Game of Performance: A Song of JIT and GC
Monica Beckwith
 
PDF
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India
 
PDF
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
srisatish ambati
 
KEY
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Charles Nutter
 
PPTX
Clr jvm implementation differences
Jean-Philippe BEMPEL
 
Building a Better JVM
Simon Ritter
 
Keeping Your Java Hot by Solving the JVM Warmup Problem
Simon Ritter
 
JVM Mechanics: A Peek Under the Hood
Azul Systems Inc.
 
JVM @ Taobao - QCon Hangzhou 2011
Kris Mok
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
IndicThreads
 
The Art of Java Benchmarking
Azul Systems Inc.
 
Jvm problem diagnostics
Danijel Mitar
 
JavaOne 2012 - JVM JIT for Dummies
Charles Nutter
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Valeriia Maliarenko
 
Seminar.2009.Performance.Intro
roialdaag
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
Doug Hawkins
 
Optimizing your java applications for multi core hardware
IndicThreads
 
Elastic JVM for Scalable Java EE Applications Running in Containers #Jakart...
Jelastic Multi-Cloud PaaS
 
What's Inside a JVM?
Azul Systems Inc.
 
Silicon Valley JUG: JVM Mechanics
Azul Systems, Inc.
 
Game of Performance: A Song of JIT and GC
Monica Beckwith
 
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
srisatish ambati
 
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Charles Nutter
 
Clr jvm implementation differences
Jean-Philippe BEMPEL
 
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
PDF
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
PDF
Leading a High-Stakes Database Migration
ScyllaDB
 
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
PDF
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Ad

Recently uploaded (20)

PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 

Get Lower Latency and Higher Throughput for Java Applications

  • 1. Brought to you by Get Lower Latency and Higher Throughput for Java Applications Simon Ritter Deputy CTO at
  • 2. Simon Ritter Deputy CTO ■ Java Champion and two times JavaOne Rockstar ■ 99th Percentile is the hard part of performance ■ Away from work, my son and I are restoring a Classic Mini
  • 3. JVM Performance Challenges ■ Latency ● Biggest issue is Garbage Collection ● Stop-the-world pauses for almost all collectors ● Pauses are typically proportional to heap size, not live data ■ Throughput ● Adaptive JIT compilation: Interpreted, C1 compiled, C2 compiled ● Deoptimisations ● Level of optimisation is key ■ Warmup ● Time taken to get to fully optimised code for all hot methods ● Restart of an application requires the same warmup work to be carried out
  • 4. Azul Platform Prime: An Alternative JVM ■ Based on OpenJDK source code ■ Passes all Java SE TCK/JCK tests ● Drop-in replacement for other JVMs ● No application code changes, no recompilation ■ Hotspot collectors replaced with C4 ■ C2 JIT compiler replaced with Falcon ■ ReadyNow! warm up elimination technology
  • 5. Azul Continuous Concurrent Compacting Collector (C4)
  • 6. C4 Basics ■ Generational (young and old) ● Uses the same GC collector for both ● For efficiency rather than pause containment ■ All phases are parallel ■ No STW compacting fallback ● Heap scales from 512Mb to 12Tb (with no change to GC latency) ■ Algorithm is mark, relocate, remap ■ Only supported on Linux ● Sophisticated OS memory management interaction
  • 7. Loaded Value Barrier ■ Read barrier ● Tests all object references as they are loaded ■ Enforces two invariants ● Reference is marked through ● Reference points to correct object position ■ Minimal performance overhead ● Test and jump (2 instructions) ● x86 architecture reduces this to one micro-op
  • 8. Concurrent Mark Phase Root Set GC Threads App Threads X X X X X
  • 9. Relocation Phase Compaction A B C D E A’ B’ C’ D’ E’ A -> A’ B -> B’ C -> C’ D -> D’ E -> E’
  • 10. Remapping Phase App Threads GC Threads A -> A’ B -> B’ C -> C’ D -> D’ E -> E’ X X X
  • 11. Measuring Platform Performance ■ jHiccup ■ Spends most of its time asleep ● Minimal effect on performance ● Wakes every 1 ms ● Records delta of time it expects to wake up ● Measured effect is what would be experienced by your application ■ Generates histogram log files ● These can be graphed for easy evaluation
  • 12. Eliminating ElasticSearch Latency HotSpot Azul Prime 128Gb heap Prime:128GB: Prime:128GB:
  • 13. Eliminating ElasticSearch Latency HotSpot Azul Prime 128Gb heap Prime:128GB: Prime:128GB:
  • 14. Azul Falcon JIT Compiler
  • 15. Advancing Adaptive Compilation ■ Replacement for C2 JIT compiler ■ Azul Falcon compiler ● Based on latest compiler research ● LLVM project ■ Better performance ● Better intrinsics ● More inlining ● Fewer compiler excludes
  • 16. Vector Code Example ■ Conditional array cell addition loop ● Hard for compiler to identify for vector instruction use private void addArraysIfEven(int a[], int b[]) { if (a.length != b.length) throw new RuntimeException("length mismatch"); for (int i = 0; i < a.length; i++) if ((b[i] & 0x1) == 0) a[i] += b[i]; }
  • 17. Traditional JVM JIT Per element jumps 2 elements per iteration
  • 18. Falcon JIT Using AVX2 vector instructions 32 elements per iteration Broadwell E5-2690-v4
  • 19. Recent Customer Success Story ■ Leading cloud-based IT security company ● Cloud security, compliance and other services ■ Big Kafka user ● 2.5 billion messages across Kafka clusters daily ● Initially approached us about their Cassandra clusters and eliminating latency ■ Kafka improvements ● 20% performance gain, out-of-the-box, with no tuning ● Falcon improved code generation ● Resulted in a 15% saving in cloud hardware costs ● Platform Core was effectively cheaper than free!
  • 20. ReadyNow! Warmup Elimination Technology ■ Save JVM JIT profiling information ● Classes loaded ● Classes initialised ● Instruction profiling data ● Speculative optimisation failure data ■ Data can be gathered over much longer period ● JVM/JIT profiles quickly ● Significant reduction in deoptimisations ■ Able to load, initialise and compile most code before main()
  • 22. Compile Stashing Effect Performance Time Performance Time Without Compile Stashing With Compile Stashing Up to 80% reduction in compile time and 60% reduction in CPU load
  • 24. Improving Java Performance ■ Collect and re-use profiles to reduce warm-up time ■ Use alternative JIT compilation strategies ■ Eliminate GC STW pauses through use of read-barrier ■ Azul working to deliver better Java performance.
  • 25. Brought to you by Simon Ritter [email protected] @speakjava