SlideShare a Scribd company logo
Java Team
OpenJDK: In the New Age of
Concurrent Garbage Collectors
HotSpot’s Regionalized GCs
Monica Beckwith
JVM Performance
java-performance@Microsoft
@mon_beck
Agenda
Part 1 – Groundwork & Commonalities
Laying the Groundwork
Stop-the-world (STW) vs concurrent collection
Heap layout – regions and generations
Basic Commonalities
Copying collector – from and to spaces
Regions – occupied and free
Collection set and priority
July 10th, 2020
Agenda
Part 2 – Introduction & Differences
Introduction to G1, Shenandoah and Z GCs
Algorithm
Basic Differences
GC phases
Marking
Barriers
Compaction
July 10th, 2020
Groundwork : Stop-the
world vs concurrent
collections
Stop-the-world aka STW GC
Application Threads
GC Threads
Application Threads
Safepoint
Requested
GC
Complete
d
Application Threads GC Threads Application Threads
Safepoint
Requested
GC
Complete
d
Handshakes
Thread local handshakes vs Global
Time To Safepoint
(TTSP)
Concurrent GC
Application Threads
GC Threads
Groundwork : Heap
layout - regions and
generations
Heap Layout
Heap
Z GC
Shenandoah GC
Young Generation
G1 GC
Old Generation
Commonalities : Copying
collector – from and to
spaces From To
HeapFrom Space To Space
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
GC ROOTS
THREAD
1 STACK
THREAD
N STACK
STATIC
VARIABLES
ANY JNI
REFERENCES
Copying aka Evacuating Collector
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
GC ROOTS
THREAD
1 STACK
THREAD
N STACK
O O
O O
O
STATIC
VARIABLES
ANY JNI
REFERENCES
O
OO
O
O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
Copying aka Evacuating Collector
Copying aka Evacuating Collector
O O O O O O O O
O O O O O O O
O O O O O O O
O O O
O O O
O O
Commonalities : Regions
– occupied and free
Occupied and Free Regions
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O
O O O O
O O O O
• List of free regions
• In case of generational heap (like G1), the occupied regions could be young, old or
humongous
Commonalities :
Collection set and
priority
Collection Priority and Collection Set
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O
O O O O
O O O O
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OOOO
• Priority is to reclaim regions with most garbage
• The candidate regions for collection/reclamation/relocation are said to be in a collection set
• There are threshold based on how expensive a region can get and maximum regions to
collect
• Incremental collection aka incremental compaction or partial compaction
• Usually needs a threshold that triggers the compaction
• Stops after the desired reclamation threshold or free-ness threshold is reached
• Doesn’t need to be stop-the-world
Introduction : G1,
Shenandoah & Z -
Algorithms
Algorithm and Other Considerations
Garbage Collectors G1 GC Shenandoah GC Z GC
Regionalized? Yes Yes Yes
Generational? Yes No No
Compaction? Yes, STW, Forwarding
address in header
Yes, Concurrent,
Forwarding Pointer
Yes, Concurrent,
Colored Pointers
Target Pause Times? 200ms 10ms 10ms
Concurrent Marking
Algorithm?
SATB SATB Striped
Differences – G1
GC Phases of Marking and Compaction
G1 GC Gist
Initial Mark Mark objects directly reachable by the roots
Concurrent Root Region
Scanning
Since initial mark is piggy-backed on a young collection, the
survivor regions need to be scanned
Concurrent Marking Snapshot-at-the-beginning (SATB) algorithm
Final Marking Drain SATB buffers; traverse unvisited live objects
Cleanup Identify and free completely free regions, sort regions based on
liveness and expense
STW Compaction Move objects in collection set to “to” regions; free regions in
collection set
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
Concurrent Marking
Logical snapshot of the heap
SATB marking guarantees that all garbage objects that are present at the start of the
concurrent marking phase will be identified by the snapshot
But application mutates its object graph
Any new objects are considered live
For any reference update, the mutator needs to log the previous value in a log
queue
This is enabled by a pre-write barrier
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
•https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf
Snapshot-at-the-beginning (SATB) Algorithm
Barriers
SATB Pre-Write Barrier
The pseudo-code of the pre-write barrier for an assignment of the form x.f := y is:
if (marking_is_active) {
pre_val := x.f;
if (pre_val != NULL) {
satb_enqueue(pre_val);
}
}
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
Barriers
Post Write Barrier
Consider the following assignment:
object.field = some_other_object
G1 GC will issue a write barrier after the reference is updated, hence the name.
G1 GC filters the need for a barrier by way of a simple check as explained below:
(&object.field XOR &some_other_object) >> RegionSize
If the check evaluates to zero, a barrier is not needed.
If the check != zero, G1 GC enqueues the card in the update log buffer
https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
STW Compaction
Forwarding Pointer in Header
BodyHeader
A Java Object
Pointer
Pointer to an
InstanceKlass
Mark Word
b b
GC workers compete to install the forwarding pointer
From source:
• An InstanceKlass is the VM level representation of a Java class. It contains all information needed for at
class at execution runtime.
• When marked the bits will be 11
Differences – Z
GC Phases of Marking and Compaction
Z GC Gist
Initial Mark Mark objects directly reachable by the roots
Concurrent Marking Striping - GC threads walk the object graph and
mark
Final Marking Traverse unvisited live objects; weak root cleaning
Concurrent Prepare for Compaction Identify collection set; reference processing
Start Compaction Handles roots into the collection set
Concurrent Compaction Move objects in collection set to “to” regions
Concurrent Remap (done with Concurrent Marking
of next cycle since walks the object graph)
Fixup of all the pointers to now-moved objects
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Striping
Heap divided into logical stripes
GC threads work on their own stripe
Minimizes shared state
Load barrier to detect loads of non-marked object pointers
Concurrent reference processing
Thread local handshakes
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Heap
GC
Thread0
GC
Thread1
GC
Threadn
…
0 1 … n 0 1 … n 0 1 … n
Stripe
0
Stripe
1
Stripe
n
Concurrent Marking
Barriers
Read Barrier – For References
Update a “bad” reference to a “good” reference
Can be self-healing/repairing barrier when updates the source memory
location
Imposes a set of invariants –
“All visible loaded reference values will be safely “marked through” by the
collector, if they haven’t been already.
All visible loaded reference values point to the current location of the safely
accessible contents of the target objects they refer to.”
Tene, G.; Iyengar, B. & Wolf, M. (2011), C4: The Continuously Concurrent Compacting
Collector, in 'Proceedings of the international symposium on Memory management' , ACM, New York, NY,
USA , pp. 79--88 .
Loaded Reference Barrier
Example
Object o = obj.fieldA; // Loading an object reference from
heap
load_barrier(register_for(o), address_of(obj.fieldA));
if (o & bad_bit_mask) {
slow_path(register_for(o),
address_of(obj.fieldA)); }
Example
mov 0x20(%rax), %rbx // Object o = obj.fieldA;
test %rbx, (0x16)%r15 // Bad color?
jnz slow_path // Yes -> Enter slow path and
mark/relocate/remap,
// adjust 0x20(%rax) and %rbx
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Core Concept
Colored Pointers
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Object Address
041
Unused
M
a
r
k
e
d
0
M
a
r
k
e
d
1
R
e
m
a
p
p
e
d
F
i
n
a
l
i
z
a
b
l
e
4663
Object is known to
be marked?
Object is known to
not be pointing into
the relocation set?
Object is reachable
only through a
Finalizer?
Metadata stores in the unused bits of the 64-bit pointers
Virtual address mapping/tagging
Multi-mapping on x86-64
Hardware support on SPARC, aarch64
Concurrent Compaction
Load barrier to detect object pointers into the collection set
Can be self-healing
Off-heap forwarding tables enable to immediately release and reuse
virtual and physical memory
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Off-Heap Forwarding Tables
Differences –
Shenandoah
GC Phases of Marking and Compaction
https://wiki.openjdk.java.net/display/shenandoah/Main
Shenandoah GC Gist
Initial Mark Mark objects directly reachable by the roots
Concurrent Marking Snapshot-at-the-beginning (SATB) algorithm
Final Marking Drain SATB buffers; traverse unvisited live objects;
identify collection set
Concurrent Cleanup Free completely free regions
Concurrent Compaction Move objects in collection set to “to” regions
Initial Update Reference Initialize the update reference phase
Concurrent Update Reference Scans the heap linearly; update any references to
objects that have moved
Final Update Reference Update roots to point to to-region copies
Concurrent Cleanup Free regions in collection set
Concurrent Marking
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
•https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf
Snapshot-at-the-beginning (SATB) Algorithm
Barriers
SATB Pre-Write Barrier - Recap
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
Needed for all updates
Check if “marking-is-active”
SATB_enqueue the pre_val
Barriers
Read Barrier – For Concurrent Compaction
Here’s an assembly code snippet for reading a field:
mov 0x10(%rsi),%rsi ; *getfield value
Here’s what the snippet looks like with Shenandoah:
mov -0x8(%rsi),%rsi ; read of forwarding pointer at address
object - 0x8
mov 0x10(%rsi),%rsi ; *getfield value
*Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016).
Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9.
10.1145/2972206.2972210.
Barriers
Copying Write Barrier – For Concurrent Compaction
Needed for all updates to ensure to-space invariant
Check if “evacuation_in_progress”
Check if “in_collection_set” and “not_yet_copied”
CAS (fwd-ptr(obj), obj, copy)
*Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016).
Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9.
10.1145/2972206.2972210.
Barriers
Read Barrier – For Concurrent Compaction
*Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016).
Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9.
10.1145/2972206.2972210.
Barriers
Copying Write Barrier – For Concurrent Compaction
*Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016).
Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9.
10.1145/2972206.2972210.
Barriers
Loaded Reference Barrier - Recap
Tene, G.; Iyengar, B. & Wolf, M. (2011), C4: The Continuously Concurrent Compacting
Collector, in 'Proceedings of the international symposium on Memory management' , ACM, New York, NY,
USA , pp. 79--88 .
https://developers.redhat.com/blog/2019/06/27/shenandoah-gc-in-jdk-13-part-1-load-reference-barriers/
Ensure strong ‘to-space invariant’
Utilize barriers at reference load
Check if fast-path-possible; else do-slow-path
Concurrent Compaction
Brooks Style Indirection Pointer
BodyHeader
A Java Object
Indirection
Pointer
Forwarding pointer is placed before the object
Additional work of dereferencing per object
Concurrent Compaction
Brooks Style Indirection Pointer
Forwarding pointer is placed before the object
Additional work of dereferencing per object
Concurrent Compaction
Forwarding Pointer in Header
BodyHeader
To Space Copy Java Object
Body
Forwarding
Pointer
From Space Java Object
X
https://developers.redhat.com/blog/2019/06/28/shenandoah-gc-in-jdk-13-part-2-eliminating-the-forward-
pointer-word/
Performance!
Variability: OpenJDK 8 LTS  OpenJDK 11 LTS
JDK 11 LTS significantly less variability than JDK 8 LTS for responsiveness
0.00
0.20
0.40
0.60
0.80
1.00
1.20
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
SPECjbb2015
JDK 8 LTS
Full System Capacity Responsiveness
0.00
0.20
0.40
0.60
0.80
1.00
1.20
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
SPECjbb2015
JDK 11 LTS
Full System Capacity Responsiveness
0
5
10
15
JDK 8 LTS JDK 11 LTS
% STD Dev Full System Capacity Responsiveness
With G1 GC
0.00
0.25
0.50
0.75
1.00
1.25
1.50
JDK 8 LTS JDK 11 LTS JDK 12 JDK 13
Full System Capacity Responsiveness
Out-of-box* GC Performance
OpenJDK 8 LTS - > OpenJDK 11 LTS
"-Xmx150g –Xms150g -Xmn130g"
G1 GC became the default GC
Higher is Better
Out-of-box* OpenJDK GC Performance
Innovation happens at tip
*With Xmx=Xms
0.85
0.90
0.95
1.00
1.05
1.10
Full System Capacity Responsiveness
PGC JDK tip vs JDK 11
G1GC JDK tip vs JDK 11
ZGC JDK tip vs JDK 11
Higher is Better
GCs Head-to-Head Performance
0.00
0.25
0.50
0.75
1.00
1.25
1.50
shenandoah z g1, base+ng parallel, base+xmn parallel, base+ng
Full System Capacity Responsiveness
Higher is Better
Further Reading
https://www.youtube.com/watch?v=VCeHkcwfF9Q
https://www.usenix.org/legacy/events/vee05/full_papers/p46-click.pdf
http://mail.openjdk.java.net/pipermail/zgc-dev/2017-December/000047.html
http://hg.openjdk.java.net/zgc/zgc/file/ffab403eaf14/src/hotspot/share/gc/z/zB
arrier.cpp
https://wiki.openjdk.java.net/display/zgc/Main
https://www.azul.com/files/c4_paper_acm1.pdf
© Copyright Microsoft Corporation. All rights reserved.

More Related Content

What's hot (20)

PDF
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
PPTX
Comparing 30 Elastic Search operations with Oracle SQL statements
Lucas Jellema
 
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PDF
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
PDF
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Databricks
 
PPTX
The Impala Cookbook
Cloudera, Inc.
 
PDF
Druid
Dori Waldman
 
PPTX
DNS Security Presentation ISSA
Srikrupa Srivatsan
 
PDF
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PPTX
Apache Flink and what it is used for
Aljoscha Krettek
 
PPTX
An Introduction to Druid
DataWorks Summit
 
PPTX
Transactional operations in Apache Hive: present and future
DataWorks Summit
 
PDF
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
HostedbyConfluent
 
PPTX
Jvm & Garbage collection tuning for low latencies application
Quentin Ambard
 
PDF
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
PDF
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mydbops
 
PDF
Square Engineering's "Fail Fast, Retry Soon" Performance Optimization Technique
ScyllaDB
 
PPTX
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
Comparing 30 Elastic Search operations with Oracle SQL statements
Lucas Jellema
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
Cassandra Introduction & Features
DataStax Academy
 
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Databricks
 
The Impala Cookbook
Cloudera, Inc.
 
DNS Security Presentation ISSA
Srikrupa Srivatsan
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Flink and what it is used for
Aljoscha Krettek
 
An Introduction to Druid
DataWorks Summit
 
Transactional operations in Apache Hive: present and future
DataWorks Summit
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
HostedbyConfluent
 
Jvm & Garbage collection tuning for low latencies application
Quentin Ambard
 
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mydbops
 
Square Engineering's "Fail Fast, Retry Soon" Performance Optimization Technique
ScyllaDB
 
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 

Similar to OpenJDK Concurrent Collectors (20)

PDF
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
Monica Beckwith
 
PDF
Understanding jvm gc advanced
Jean-Philippe BEMPEL
 
PDF
Understanding JVM GC: advanced!
Jean-Philippe BEMPEL
 
PDF
New Algorithms in Java
Krystian Zybała
 
PDF
Demystifying Garbage Collection in Java
Igor Braga
 
PDF
Jvm is-your-friend
ColdFusionConference
 
PDF
The JVM is your friend
Kai Koenig
 
PPTX
Intro to Garbage Collection
Monica Beckwith
 
PPT
An Introduction to JVM Internals and Garbage Collection in Java
Abhishek Asthana
 
PPT
«Большие объёмы данных и сборка мусора в Java
Olga Lavrentieva
 
PPT
Garbage collection in JVM
aragozin
 
PDF
ZGC-SnowOne.pdf
Monica Beckwith
 
PPTX
Garbage collection Overview
Eugenio Lentini
 
PDF
Let's talk about Garbage Collection
Haim Yadid
 
PDF
Advancements ingc andc4overview_linkedin_oct2017
Azul Systems Inc.
 
PDF
Hotspot Garbage Collection - The Useful Parts
jClarity
 
PDF
Understanding GC, JavaOne 2017
Azul Systems Inc.
 
PDF
JVM Memory Management Details
Azul Systems Inc.
 
PDF
Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...
jaxLondonConference
 
PDF
Low latency Java apps
Simon Ritter
 
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
Monica Beckwith
 
Understanding jvm gc advanced
Jean-Philippe BEMPEL
 
Understanding JVM GC: advanced!
Jean-Philippe BEMPEL
 
New Algorithms in Java
Krystian Zybała
 
Demystifying Garbage Collection in Java
Igor Braga
 
Jvm is-your-friend
ColdFusionConference
 
The JVM is your friend
Kai Koenig
 
Intro to Garbage Collection
Monica Beckwith
 
An Introduction to JVM Internals and Garbage Collection in Java
Abhishek Asthana
 
«Большие объёмы данных и сборка мусора в Java
Olga Lavrentieva
 
Garbage collection in JVM
aragozin
 
ZGC-SnowOne.pdf
Monica Beckwith
 
Garbage collection Overview
Eugenio Lentini
 
Let's talk about Garbage Collection
Haim Yadid
 
Advancements ingc andc4overview_linkedin_oct2017
Azul Systems Inc.
 
Hotspot Garbage Collection - The Useful Parts
jClarity
 
Understanding GC, JavaOne 2017
Azul Systems Inc.
 
JVM Memory Management Details
Azul Systems Inc.
 
Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...
jaxLondonConference
 
Low latency Java apps
Simon Ritter
 
Ad

More from Monica Beckwith (18)

PPTX
The ilities of software engineering.pptx
Monica Beckwith
 
PPTX
A G1GC Saga-KCJUG.pptx
Monica Beckwith
 
PDF
QCon London.pdf
Monica Beckwith
 
PPTX
Enabling Java: Windows on Arm64 - A Success Story!
Monica Beckwith
 
PDF
Applying Concurrency Cookbook Recipes to SPEC JBB
Monica Beckwith
 
PDF
The Performance Engineer's Guide to Java (HotSpot) Virtual Machine
Monica Beckwith
 
PDF
Garbage First Garbage Collector: Where the Rubber Meets the Road!
Monica Beckwith
 
PDF
JFokus Java 9 contended locking performance
Monica Beckwith
 
PDF
Java Performance Engineer's Survival Guide
Monica Beckwith
 
PDF
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
Monica Beckwith
 
PDF
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
Monica Beckwith
 
PDF
Java 9: The (G1) GC Awakens!
Monica Beckwith
 
PDF
Game of Performance: A Song of JIT and GC
Monica Beckwith
 
PDF
Way Improved :) GC Tuning Confessions - presented at JavaOne2015
Monica Beckwith
 
PDF
GC Tuning Confessions Of A Performance Engineer - Improved :)
Monica Beckwith
 
PDF
GC Tuning Confessions Of A Performance Engineer
Monica Beckwith
 
PPTX
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Monica Beckwith
 
PPTX
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Monica Beckwith
 
The ilities of software engineering.pptx
Monica Beckwith
 
A G1GC Saga-KCJUG.pptx
Monica Beckwith
 
QCon London.pdf
Monica Beckwith
 
Enabling Java: Windows on Arm64 - A Success Story!
Monica Beckwith
 
Applying Concurrency Cookbook Recipes to SPEC JBB
Monica Beckwith
 
The Performance Engineer's Guide to Java (HotSpot) Virtual Machine
Monica Beckwith
 
Garbage First Garbage Collector: Where the Rubber Meets the Road!
Monica Beckwith
 
JFokus Java 9 contended locking performance
Monica Beckwith
 
Java Performance Engineer's Survival Guide
Monica Beckwith
 
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
Monica Beckwith
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
Monica Beckwith
 
Java 9: The (G1) GC Awakens!
Monica Beckwith
 
Game of Performance: A Song of JIT and GC
Monica Beckwith
 
Way Improved :) GC Tuning Confessions - presented at JavaOne2015
Monica Beckwith
 
GC Tuning Confessions Of A Performance Engineer - Improved :)
Monica Beckwith
 
GC Tuning Confessions Of A Performance Engineer
Monica Beckwith
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Monica Beckwith
 
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Monica Beckwith
 
Ad

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
July Patch Tuesday
Ivanti
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Python basic programing language for automation
DanialHabibi2
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
July Patch Tuesday
Ivanti
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 

OpenJDK Concurrent Collectors

  • 1. Java Team OpenJDK: In the New Age of Concurrent Garbage Collectors HotSpot’s Regionalized GCs Monica Beckwith JVM Performance java-performance@Microsoft @mon_beck
  • 2. Agenda Part 1 – Groundwork & Commonalities Laying the Groundwork Stop-the-world (STW) vs concurrent collection Heap layout – regions and generations Basic Commonalities Copying collector – from and to spaces Regions – occupied and free Collection set and priority July 10th, 2020
  • 3. Agenda Part 2 – Introduction & Differences Introduction to G1, Shenandoah and Z GCs Algorithm Basic Differences GC phases Marking Barriers Compaction July 10th, 2020
  • 4. Groundwork : Stop-the world vs concurrent collections
  • 5. Stop-the-world aka STW GC Application Threads GC Threads Application Threads Safepoint Requested GC Complete d Application Threads GC Threads Application Threads Safepoint Requested GC Complete d Handshakes Thread local handshakes vs Global Time To Safepoint (TTSP)
  • 7. Groundwork : Heap layout - regions and generations
  • 8. Heap Layout Heap Z GC Shenandoah GC Young Generation G1 GC Old Generation
  • 9. Commonalities : Copying collector – from and to spaces From To
  • 10. HeapFrom Space To Space O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O GC ROOTS THREAD 1 STACK THREAD N STACK STATIC VARIABLES ANY JNI REFERENCES Copying aka Evacuating Collector
  • 11. O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O GC ROOTS THREAD 1 STACK THREAD N STACK O O O O O STATIC VARIABLES ANY JNI REFERENCES O OO O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Copying aka Evacuating Collector
  • 12. Copying aka Evacuating Collector O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
  • 13. Commonalities : Regions – occupied and free
  • 14. Occupied and Free Regions O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O • List of free regions • In case of generational heap (like G1), the occupied regions could be young, old or humongous
  • 16. Collection Priority and Collection Set O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OOOO • Priority is to reclaim regions with most garbage • The candidate regions for collection/reclamation/relocation are said to be in a collection set • There are threshold based on how expensive a region can get and maximum regions to collect • Incremental collection aka incremental compaction or partial compaction • Usually needs a threshold that triggers the compaction • Stops after the desired reclamation threshold or free-ness threshold is reached • Doesn’t need to be stop-the-world
  • 17. Introduction : G1, Shenandoah & Z - Algorithms
  • 18. Algorithm and Other Considerations Garbage Collectors G1 GC Shenandoah GC Z GC Regionalized? Yes Yes Yes Generational? Yes No No Compaction? Yes, STW, Forwarding address in header Yes, Concurrent, Forwarding Pointer Yes, Concurrent, Colored Pointers Target Pause Times? 200ms 10ms 10ms Concurrent Marking Algorithm? SATB SATB Striped
  • 20. GC Phases of Marking and Compaction G1 GC Gist Initial Mark Mark objects directly reachable by the roots Concurrent Root Region Scanning Since initial mark is piggy-backed on a young collection, the survivor regions need to be scanned Concurrent Marking Snapshot-at-the-beginning (SATB) algorithm Final Marking Drain SATB buffers; traverse unvisited live objects Cleanup Identify and free completely free regions, sort regions based on liveness and expense STW Compaction Move objects in collection set to “to” regions; free regions in collection set •C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
  • 21. Concurrent Marking Logical snapshot of the heap SATB marking guarantees that all garbage objects that are present at the start of the concurrent marking phase will be identified by the snapshot But application mutates its object graph Any new objects are considered live For any reference update, the mutator needs to log the previous value in a log queue This is enabled by a pre-write barrier •C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion. •https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf Snapshot-at-the-beginning (SATB) Algorithm
  • 22. Barriers SATB Pre-Write Barrier The pseudo-code of the pre-write barrier for an assignment of the form x.f := y is: if (marking_is_active) { pre_val := x.f; if (pre_val != NULL) { satb_enqueue(pre_val); } } •C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
  • 23. Barriers Post Write Barrier Consider the following assignment: object.field = some_other_object G1 GC will issue a write barrier after the reference is updated, hence the name. G1 GC filters the need for a barrier by way of a simple check as explained below: (&object.field XOR &some_other_object) >> RegionSize If the check evaluates to zero, a barrier is not needed. If the check != zero, G1 GC enqueues the card in the update log buffer https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf •C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
  • 24. STW Compaction Forwarding Pointer in Header BodyHeader A Java Object Pointer Pointer to an InstanceKlass Mark Word b b GC workers compete to install the forwarding pointer From source: • An InstanceKlass is the VM level representation of a Java class. It contains all information needed for at class at execution runtime. • When marked the bits will be 11
  • 26. GC Phases of Marking and Compaction Z GC Gist Initial Mark Mark objects directly reachable by the roots Concurrent Marking Striping - GC threads walk the object graph and mark Final Marking Traverse unvisited live objects; weak root cleaning Concurrent Prepare for Compaction Identify collection set; reference processing Start Compaction Handles roots into the collection set Concurrent Compaction Move objects in collection set to “to” regions Concurrent Remap (done with Concurrent Marking of next cycle since walks the object graph) Fixup of all the pointers to now-moved objects http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
  • 27. Striping Heap divided into logical stripes GC threads work on their own stripe Minimizes shared state Load barrier to detect loads of non-marked object pointers Concurrent reference processing Thread local handshakes http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf Heap GC Thread0 GC Thread1 GC Threadn … 0 1 … n 0 1 … n 0 1 … n Stripe 0 Stripe 1 Stripe n Concurrent Marking
  • 28. Barriers Read Barrier – For References Update a “bad” reference to a “good” reference Can be self-healing/repairing barrier when updates the source memory location Imposes a set of invariants – “All visible loaded reference values will be safely “marked through” by the collector, if they haven’t been already. All visible loaded reference values point to the current location of the safely accessible contents of the target objects they refer to.” Tene, G.; Iyengar, B. & Wolf, M. (2011), C4: The Continuously Concurrent Compacting Collector, in 'Proceedings of the international symposium on Memory management' , ACM, New York, NY, USA , pp. 79--88 . Loaded Reference Barrier
  • 29. Example Object o = obj.fieldA; // Loading an object reference from heap load_barrier(register_for(o), address_of(obj.fieldA)); if (o & bad_bit_mask) { slow_path(register_for(o), address_of(obj.fieldA)); }
  • 30. Example mov 0x20(%rax), %rbx // Object o = obj.fieldA; test %rbx, (0x16)%r15 // Bad color? jnz slow_path // Yes -> Enter slow path and mark/relocate/remap, // adjust 0x20(%rax) and %rbx http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
  • 31. Core Concept Colored Pointers http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf Object Address 041 Unused M a r k e d 0 M a r k e d 1 R e m a p p e d F i n a l i z a b l e 4663 Object is known to be marked? Object is known to not be pointing into the relocation set? Object is reachable only through a Finalizer? Metadata stores in the unused bits of the 64-bit pointers Virtual address mapping/tagging Multi-mapping on x86-64 Hardware support on SPARC, aarch64
  • 32. Concurrent Compaction Load barrier to detect object pointers into the collection set Can be self-healing Off-heap forwarding tables enable to immediately release and reuse virtual and physical memory http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf Off-Heap Forwarding Tables
  • 34. GC Phases of Marking and Compaction https://wiki.openjdk.java.net/display/shenandoah/Main Shenandoah GC Gist Initial Mark Mark objects directly reachable by the roots Concurrent Marking Snapshot-at-the-beginning (SATB) algorithm Final Marking Drain SATB buffers; traverse unvisited live objects; identify collection set Concurrent Cleanup Free completely free regions Concurrent Compaction Move objects in collection set to “to” regions Initial Update Reference Initialize the update reference phase Concurrent Update Reference Scans the heap linearly; update any references to objects that have moved Final Update Reference Update roots to point to to-region copies Concurrent Cleanup Free regions in collection set
  • 35. Concurrent Marking •C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion. •https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf Snapshot-at-the-beginning (SATB) Algorithm
  • 36. Barriers SATB Pre-Write Barrier - Recap •C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion. Needed for all updates Check if “marking-is-active” SATB_enqueue the pre_val
  • 37. Barriers Read Barrier – For Concurrent Compaction Here’s an assembly code snippet for reading a field: mov 0x10(%rsi),%rsi ; *getfield value Here’s what the snippet looks like with Shenandoah: mov -0x8(%rsi),%rsi ; read of forwarding pointer at address object - 0x8 mov 0x10(%rsi),%rsi ; *getfield value *Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016). Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9. 10.1145/2972206.2972210.
  • 38. Barriers Copying Write Barrier – For Concurrent Compaction Needed for all updates to ensure to-space invariant Check if “evacuation_in_progress” Check if “in_collection_set” and “not_yet_copied” CAS (fwd-ptr(obj), obj, copy) *Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016). Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9. 10.1145/2972206.2972210.
  • 39. Barriers Read Barrier – For Concurrent Compaction *Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016). Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9. 10.1145/2972206.2972210.
  • 40. Barriers Copying Write Barrier – For Concurrent Compaction *Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016). Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9. 10.1145/2972206.2972210.
  • 41. Barriers Loaded Reference Barrier - Recap Tene, G.; Iyengar, B. & Wolf, M. (2011), C4: The Continuously Concurrent Compacting Collector, in 'Proceedings of the international symposium on Memory management' , ACM, New York, NY, USA , pp. 79--88 . https://developers.redhat.com/blog/2019/06/27/shenandoah-gc-in-jdk-13-part-1-load-reference-barriers/ Ensure strong ‘to-space invariant’ Utilize barriers at reference load Check if fast-path-possible; else do-slow-path
  • 42. Concurrent Compaction Brooks Style Indirection Pointer BodyHeader A Java Object Indirection Pointer Forwarding pointer is placed before the object Additional work of dereferencing per object
  • 43. Concurrent Compaction Brooks Style Indirection Pointer Forwarding pointer is placed before the object Additional work of dereferencing per object
  • 44. Concurrent Compaction Forwarding Pointer in Header BodyHeader To Space Copy Java Object Body Forwarding Pointer From Space Java Object X https://developers.redhat.com/blog/2019/06/28/shenandoah-gc-in-jdk-13-part-2-eliminating-the-forward- pointer-word/
  • 46. Variability: OpenJDK 8 LTS  OpenJDK 11 LTS JDK 11 LTS significantly less variability than JDK 8 LTS for responsiveness 0.00 0.20 0.40 0.60 0.80 1.00 1.20 Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 SPECjbb2015 JDK 8 LTS Full System Capacity Responsiveness 0.00 0.20 0.40 0.60 0.80 1.00 1.20 Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 SPECjbb2015 JDK 11 LTS Full System Capacity Responsiveness 0 5 10 15 JDK 8 LTS JDK 11 LTS % STD Dev Full System Capacity Responsiveness With G1 GC
  • 47. 0.00 0.25 0.50 0.75 1.00 1.25 1.50 JDK 8 LTS JDK 11 LTS JDK 12 JDK 13 Full System Capacity Responsiveness Out-of-box* GC Performance OpenJDK 8 LTS - > OpenJDK 11 LTS "-Xmx150g –Xms150g -Xmn130g" G1 GC became the default GC Higher is Better
  • 48. Out-of-box* OpenJDK GC Performance Innovation happens at tip *With Xmx=Xms 0.85 0.90 0.95 1.00 1.05 1.10 Full System Capacity Responsiveness PGC JDK tip vs JDK 11 G1GC JDK tip vs JDK 11 ZGC JDK tip vs JDK 11 Higher is Better
  • 49. GCs Head-to-Head Performance 0.00 0.25 0.50 0.75 1.00 1.25 1.50 shenandoah z g1, base+ng parallel, base+xmn parallel, base+ng Full System Capacity Responsiveness Higher is Better
  • 51. © Copyright Microsoft Corporation. All rights reserved.

Editor's Notes

  • #11: Root set includes: thread local variables, references embedded in generated code, interned Strings, references from classloaders (e.g. static final references), JNI references, JVMTI references. Having larger root set generally means longer pauses with Shenandoah, see below for diagnostic techniques
  • #13: Compacting garbage collection algorithms have been shown to have smaller memory footprints and better cache locality than in place algorithms like Concurrent Mark and Sweep (CMS)
  • #22: Objects allocated during the concurrent marking phase will be considered live but they are not traced, thus reducing the marking overhead. The technique guarantees that all live objects that were alive at the start of the marking phase are marked and traced and any new allocations made by the concurrent mutator threads during the marking cycle are marked as live and consequently not collected.
  • #23: The marking_is_active condition is a simple check of a thread local flag that is set to true at the start of marking, during the initial mark pause. Guarding the rest of the pre-barrier code with this check reduces the overhead of executing the remainder of the barrier code when marking is not active. Since the flag is thread-local and it's value may be loaded multiple times, it is likely that any individual check will hit in cache - further reducing the overhead of the barrier.
  • #24: Remembered Sets
  • #25: Klass pointer Live objects that need to be evacuated are copied to thread-local GC allocation buffers (GCLABs) allocated in target regions. Worker threads compete to install a forwarding pointer to the newly allocated copy of the old object image. With the help of work stealing [2], a single “winner” thread helps with copying and scanning the object. Work stealing also provides load balancing between the worker threads. Each section here marks a heap word. That would be 64 bits on 64-bit architectures and 32 bits on 32-bit architectures. The first word is the so-called mark word, or header of the object. It is used for a variety of purposes. For example, it can keep the hash-code of an object; it has 3 bits that are used for various locking states; some GCs use it to track object age and marking status; and it can be “overlaid” with a pointer to the “displaced” mark, to an “inflated” lock, or, during GC, the forwarding pointer. The second word is reserved for the klass pointer. This is simply a pointer to the Hotspot-internal data structure that represents the class of the object. Arrays would have an additional word next to store the array length. What follows is the actual payload of the object, that is, fields and array elements.
  • #27: Weak root cleaning (string table)
  • #28: Finalizable mark: Final reachable: object about to be finalized
  • #29: writes and reads always happen into/from the to-space copy = strong to-space invariant.
  • #32: Finalizable mark: Final reachable: object about to be finalized
  • #33: Self-healing is where Java threads will help out
  • #38: SGC no need to update refs, fwding pointer.; When we start register %rsi contains the address of the object, and the field is at offset 0x10.
  • #39: To-space invariant – all writes are made into the object in to-space Even primitives and locking of objects – exotic barriers acmp (pointer comparison), CAS, clone
  • #42: writes and reads always happen into/from the to-space copy = strong to-space invariant.
  • #43: Memory and throughput overhead