SlideShare a Scribd company logo
Optimizing your Java Applications for multi-core hardware  Prashanth K Nageshappa [email_address] Java Technologies IBM
Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
As The World Gets Smarter, Demands On IT Will Grow Smart energy grids Smart healthcare Smart food systems  Intelligent  oil field technologies  Smart supply chains  Smart retail IT infrastructure must grow to meet these demands global scope, processing scale, efficiency Digital data is projected to grow tenfold from 2007 to 2011. Devices will be connected to the internet by 2011 1 Trillion Global trading systems are under extreme stress, handling billions of market data messages each day 25 Billion 70% on average is spent on maintaining current IT infrastructure versus adding new capabilities 10x
Hardware Trends Increasing transistor density Clock Speed leveling off More number of cores Non-Uniform Memory Access Main memory getting larger
In 2010 POWER Systems Brings Massive Parallelism 2001 180 nm 2004 130 nm 2007 65 nm 2010 45 nm POWER7™ 4 threads/core 8 cores/chip 32 sockets/server 1024 threads POWER6™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER5™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER4™ 1 thread/core 2 cores/chip 16 sockets/server 32 threads Threads
Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
Why should I care? Your application may be re-used Better performance Better leverage additional resources Cores, hardware threads, memory etc
Think about scalability Serial bottlenecks inhibit scalability Organize your application into parallel tasks Consider TaskExecutor API Too many threads can be just as bad as too few Do not rely on JVM to discover opportunities No automatic parallelization  Java class libraries do not exploit vector processor capabilities
Think about scalability Load imbalance Workload not evenly distributed Consider breaking large tasks into smaller ones Change serial algorithms to parallel ones Tracing and I/O Bottleneck unless infrequent updates or log is striped (RAID) Blocking disk/console I/O inhibit scalability
Synchronization and locking J9's Three-tiered locking Spin Yield OS Avoid synchronization in static methods Consider breaking long synchronized blocks into several smaller ones May be bad if results in many context switches Java Lock Monitor (JLM) tool can help http://perfinsp.sourceforge.net/jlm.html
Synchronization and locking Volatiles Compiler will not cache the value Creates memory barrier Avoid synchronized container classes Building scalable data structures is difficult Use java.util.concurrent (j/u/c) Non-blocking object access Possible with j/u/c
Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
java.util.concurrent package Introduced in Java SE 5  Alternative strong synchronization Lighter weight, better scalability  Comparing to intrinsic locks java.util.concurrent.atomic.* java.util.concurrent.locks.* ConcurrentCollections Synchronizers TaskExecutor
j/u/c/atomic.* Atomic primitives Strong form of synchronization But does not use lock – non blocking Exploit atomic instructions such as compare-and-swap in hardware Supports compounded actions AtomicLongFieldUpdater AtomicMarkableReference AtomicReference AtomicReferenceArray AtomicReferenceFieldUpdater AtomicStampedReference AtomicBoolean AtomicInteger AtomicIntegerArray AtomicIntegerFieldUpdater AtomicLong AtomicLongArray
j/u/c/atomic.* Getter and setters get set lazySet Updates getAndSet getAndAdd/getAndIncrement/getAndDecrement addAndGet/incrementAndGet/decrementAndGet CAS compareAndSet/weakCompareAndSet Conversions toString, intValue, longValue, floatValue, doubleValue
j/u/c/locks.* Problems with intrinsic locks Impossible to back off from a lock attempt Deadlock Lack of features Read vs write Fairness policies Block-structured Must lock and release in the same method j/u/c/locks Greater flexibility for locks and conditions Non-block-structured Provides reader-writer locks Why block other readers? Better scalability
j/u/c/locks.* Interfaces: Condition Lock ReadWriteLock Classes: ReentrantLock ReentrantReadWriteLock LockSupport AbstractQueuedSynchronizer
j/u/c.* - Concurrent Collections Concurrent, thread safe implementations of several collections HashMap  ->  ConcurrentHashMap TreeMap  ->  ConcurrentSkipListMap ArrayList  ->  CopyOnWriteArrayList ArraySet  ->  CopyOnWriteArraySet Queues  ->  ConcurrentLinkedQueue or one of the blocking queues
Strains on the VM Excessive use of temporary memory can lead to increased garbage collector activity Stop the world GC pauses the application Excessive class loading Updating class hierarchy Invalidating JIT optimizations Consider creating a “startup” phase Transitions between Java and native code VM access lock
Memory Footprint Little control over object allocation in Java Small short lived objects are easier to cache Large long lived objects likely to cause cache misses Memory Analysis Tool (MAT) can help Consider using large pages for TLB misses -Xlp, requires OS support Tune your heap settings Heap lock contention with flat heap
Affinitizing JVMs Can exploit cache hierarchy on a subset of cores JVM working set can fit within the physical memory of a single node in a NUMA system Linux:  taskset, numactl  Windows:  start
Is my application scalable? Low CPU means resources are not maximized Evaluate if application has too few/many threads Locks and synchronization Network connections, I/O Thrashing  working set is too large for physical memory High CPU is generally good, as long as resources are spent in application threads, doing meaningful work Evaluate where time is being spent Garbage collection VM/JIT OS Kernel functions Other processes Tune, tune, tune
Write Once, Tune Everywhere HealthCenter, GCMV, MAT http://www.ibm.com/developerworks/java/jdk/tools/ Dependence on operating System Memory allocation Socket layer Tune for hardware capabilities How many cores? How much memory? What is the limit on network access? Are there storage bottlenecks?
Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
IBM Java Execution Model is Built for Parallelism JIT Compiler Garbage Collector Application Threads Generates high performance code for application threads Customizes execution to underlying hardware Optimizes locking performance Asynchronous compilation thread Java software threads are executed on multiple hardware threads Thread safe libraries with scalable concurrency support for parallel programming Manages memory on behalf of the application Must balance throughput against observed pauses Exploits many multiple hardware threads
Configurable Garbage Collection policies Multiple policies to match varying user requirements Pause time, Throughput, Memory footprint and  GC overhead All modes exploit parallel execution Dynamic adaptation to number of available hardware cores & threads GC scalability independent from user application scalability Very low overhead (<3%) on typical workloads
How do GC policies compare? - optthruput Time Thread 1 Thread 2 Thread 3 Thread n GC Java Optimize Throughput Highly parallel GC + streamlined application thread execution May cause longer pause times -Xgcpolicy:optthruput Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies.
How do GC policies compare? - optavgpause Time GC Java Concurrent Tracing Optimize Pause Time GC cleans up concurrently with application thread execution Sacrifice some throughput to reduce average pause times -Xgcpolicy:optavgpause Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies. Thread 1 Thread 2 Thread 3 Thread n
How do GC policies compare? - gencon Time Global GC Java Concurrent Tracing Scavenge GC Balanced Clean up many short-lived objects concurrent with application threads Some pauses needed to collect longer-lived objects -Xgcpolicy:gencon Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies. Thread 1 Thread 2 Thread 3 Thread n
How do GC policies compare? - subpools Uses multiple free lists Tries to predict the size of future allocation requests based on earlier allocation requests.  Recreates free lists at the end of each GC based on these predictions.  While allocating objects on the heap, free chunks are chosen using a “best fit” method, as against the “first fit” method used in other algorithms. Concurrent marking is disabled Scalable Scalable GC focused on the larger multiprocessor machines Improved object allocation algorithm May not be appropriate for small-to-midsize configurations – Xgcpolicy:subpool
JVM optimizations for multi-core scalability Lock removal across JVM and class libraries java.util.concurrent package optimizations Better working set for cache efficiency Stack allocation Remove/optimize synchronization Thread local storage for send/receive buffers Non-blocking containers Asynch JIT compilation on a separate thread Right-sized application runtimes
Merci Grazie Gracias Obrigado Danke French Russian German Italian Spanish Brazilian Portuguese Arabic Simplified Chinese Traditional Chinese Thai Korean Thank You Questions? Email:  [email_address] http://www.ibm.com/developerworks/java/ Japanese
Special notices ©  IBM Corporation 2010. All Rights Reserved. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views.  They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant.  While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way.  Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.  Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment.  The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed.  Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved.  Actual environmental costs and performance characteristics may vary by customer. The following are trademarks of the International Business Machines Corporation in the United States and/or other countries:  ibm.com/legal/copytrade.shtmlAIX, CICS, CICSPlex, DataPower, DB2, DB2 Universal Database, i5/OS, IBM, the IBM logo, IMS/ESA, Power Systems, Lotus, OMEGAMON, OS/390, Parallel Sysplex, pureXML, Rational, Redbooks, Sametime, SMART SOA, System z , Tivoli, WebSphere, and z/OS. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

More Related Content

What's hot (20)

PDF
Distributed DNN training: Infrastructure, challenges, and lessons learned
Wee Hyong Tok
 
PDF
MIT's experience on OpenPOWER/POWER 9 platform
Ganesan Narayanasamy
 
PDF
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
 
PDF
Biomedical Signal and Image Analytics using MATLAB
CodeOps Technologies LLP
 
PDF
Exploring the Performance Impact of Virtualization on an HPC Cloud
Ryousei Takano
 
PDF
Apache Spark At Scale in the Cloud
Rose Toomey
 
PPTX
MapReduce Container ReUse
Hortonworks
 
PDF
Strata London 2019 Scaling Impala
Manish Maheshwari
 
PDF
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Jen Aman
 
PDF
The JVM is your friend
Kai Koenig
 
PDF
Postgres & Red Hat Cluster Suite
EDB
 
PDF
"The BG collaboration, Past, Present, Future. The new available resources". P...
lccausp
 
PDF
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
PDF
Deep Dive into RDS PostgreSQL Universe
Jignesh Shah
 
PDF
MySQL Infrastructure Testing Automation at GitHub
Ike Walker
 
PPTX
Hadoop and Big Data Overview
Prabhu Thukkaram
 
PDF
Eliminating the Pauses in your Java Application
Mark Stoodley
 
PDF
User-space Network Processing
Ryousei Takano
 
PDF
High performance computing tutorial, with checklist and tips to optimize clus...
Pradeep Redddy Raamana
 
PDF
PostgreSQL and Benchmarks
Jignesh Shah
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Wee Hyong Tok
 
MIT's experience on OpenPOWER/POWER 9 platform
Ganesan Narayanasamy
 
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
 
Biomedical Signal and Image Analytics using MATLAB
CodeOps Technologies LLP
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Ryousei Takano
 
Apache Spark At Scale in the Cloud
Rose Toomey
 
MapReduce Container ReUse
Hortonworks
 
Strata London 2019 Scaling Impala
Manish Maheshwari
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Jen Aman
 
The JVM is your friend
Kai Koenig
 
Postgres & Red Hat Cluster Suite
EDB
 
"The BG collaboration, Past, Present, Future. The new available resources". P...
lccausp
 
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
Deep Dive into RDS PostgreSQL Universe
Jignesh Shah
 
MySQL Infrastructure Testing Automation at GitHub
Ike Walker
 
Hadoop and Big Data Overview
Prabhu Thukkaram
 
Eliminating the Pauses in your Java Application
Mark Stoodley
 
User-space Network Processing
Ryousei Takano
 
High performance computing tutorial, with checklist and tips to optimize clus...
Pradeep Redddy Raamana
 
PostgreSQL and Benchmarks
Jignesh Shah
 

Similar to Optimizing your java applications for multi core hardware (20)

PPT
Hs java open_party
Open Party
 
PPT
Jvm Performance Tunning
Terry Cho
 
PPT
Jvm Performance Tunning
guest1f2740
 
PPT
Best Practices for performance evaluation and diagnosis of Java Applications ...
IndicThreads
 
ODP
Low level java programming
Peter Lawrey
 
PDF
Java Performance and Using Java Flight Recorder
Isuru Perera
 
PDF
JVM Under the Hood
Serkan Özal
 
PDF
Software Profiling: Java Performance, Profiling and Flamegraphs
Isuru Perera
 
PDF
Profiler Guided Java Performance Tuning
osa_ora
 
PDF
Software Profiling: Understanding Java Performance and how to profile in Java
Isuru Perera
 
PDF
Java Performance Tuning
Ender Aydin Orak
 
PDF
Java Performance and Profiling
WSO2
 
PPT
Introduction to Real Time Java
Deniz Oguz
 
ODP
Writing and testing high frequency trading engines in java
Peter Lawrey
 
KEY
Modern Java Concurrency (OSCON 2012)
Martijn Verburg
 
PDF
Java Memory Model
Łukasz Koniecki
 
ODP
Jvm tuning in a rush! - Lviv JUG
Tomek Borek
 
PDF
Java on the Mainframe
Michael Erichsen
 
PDF
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
Haim Yadid
 
PPTX
Jvm memory model
Yoav Avrahami
 
Hs java open_party
Open Party
 
Jvm Performance Tunning
Terry Cho
 
Jvm Performance Tunning
guest1f2740
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
IndicThreads
 
Low level java programming
Peter Lawrey
 
Java Performance and Using Java Flight Recorder
Isuru Perera
 
JVM Under the Hood
Serkan Özal
 
Software Profiling: Java Performance, Profiling and Flamegraphs
Isuru Perera
 
Profiler Guided Java Performance Tuning
osa_ora
 
Software Profiling: Understanding Java Performance and how to profile in Java
Isuru Perera
 
Java Performance Tuning
Ender Aydin Orak
 
Java Performance and Profiling
WSO2
 
Introduction to Real Time Java
Deniz Oguz
 
Writing and testing high frequency trading engines in java
Peter Lawrey
 
Modern Java Concurrency (OSCON 2012)
Martijn Verburg
 
Java Memory Model
Łukasz Koniecki
 
Jvm tuning in a rush! - Lviv JUG
Tomek Borek
 
Java on the Mainframe
Michael Erichsen
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
Haim Yadid
 
Jvm memory model
Yoav Avrahami
 
Ad

More from IndicThreads (20)

PPTX
Http2 is here! And why the web needs it
IndicThreads
 
ODP
Understanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
IndicThreads
 
PPT
Go Programming Language - Learning The Go Lang way
IndicThreads
 
PPT
Building Resilient Microservices
IndicThreads
 
PPT
App using golang indicthreads
IndicThreads
 
PDF
Building on quicksand microservices indicthreads
IndicThreads
 
PDF
How to Think in RxJava Before Reacting
IndicThreads
 
PPT
Iot secure connected devices indicthreads
IndicThreads
 
PDF
Real world IoT for enterprises
IndicThreads
 
PPT
IoT testing and quality assurance indicthreads
IndicThreads
 
PPT
Functional Programming Past Present Future
IndicThreads
 
PDF
Harnessing the Power of Java 8 Streams
IndicThreads
 
PDF
Building & scaling a live streaming mobile platform - Gr8 road to fame
IndicThreads
 
PPTX
Internet of things architecture perspective - IndicThreads Conference
IndicThreads
 
PDF
Cars and Computers: Building a Java Carputer
IndicThreads
 
PPTX
Scrap Your MapReduce - Apache Spark
IndicThreads
 
PPT
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
IndicThreads
 
PPTX
Speed up your build pipeline for faster feedback
IndicThreads
 
PPT
Unraveling OpenStack Clouds
IndicThreads
 
PPTX
Digital Transformation of the Enterprise. What IT leaders need to know!
IndicThreads
 
Http2 is here! And why the web needs it
IndicThreads
 
Understanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
IndicThreads
 
Go Programming Language - Learning The Go Lang way
IndicThreads
 
Building Resilient Microservices
IndicThreads
 
App using golang indicthreads
IndicThreads
 
Building on quicksand microservices indicthreads
IndicThreads
 
How to Think in RxJava Before Reacting
IndicThreads
 
Iot secure connected devices indicthreads
IndicThreads
 
Real world IoT for enterprises
IndicThreads
 
IoT testing and quality assurance indicthreads
IndicThreads
 
Functional Programming Past Present Future
IndicThreads
 
Harnessing the Power of Java 8 Streams
IndicThreads
 
Building & scaling a live streaming mobile platform - Gr8 road to fame
IndicThreads
 
Internet of things architecture perspective - IndicThreads Conference
IndicThreads
 
Cars and Computers: Building a Java Carputer
IndicThreads
 
Scrap Your MapReduce - Apache Spark
IndicThreads
 
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
IndicThreads
 
Speed up your build pipeline for faster feedback
IndicThreads
 
Unraveling OpenStack Clouds
IndicThreads
 
Digital Transformation of the Enterprise. What IT leaders need to know!
IndicThreads
 
Ad

Recently uploaded (20)

PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 

Optimizing your java applications for multi core hardware

  • 1. Optimizing your Java Applications for multi-core hardware Prashanth K Nageshappa [email_address] Java Technologies IBM
  • 2. Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
  • 3. As The World Gets Smarter, Demands On IT Will Grow Smart energy grids Smart healthcare Smart food systems Intelligent oil field technologies Smart supply chains Smart retail IT infrastructure must grow to meet these demands global scope, processing scale, efficiency Digital data is projected to grow tenfold from 2007 to 2011. Devices will be connected to the internet by 2011 1 Trillion Global trading systems are under extreme stress, handling billions of market data messages each day 25 Billion 70% on average is spent on maintaining current IT infrastructure versus adding new capabilities 10x
  • 4. Hardware Trends Increasing transistor density Clock Speed leveling off More number of cores Non-Uniform Memory Access Main memory getting larger
  • 5. In 2010 POWER Systems Brings Massive Parallelism 2001 180 nm 2004 130 nm 2007 65 nm 2010 45 nm POWER7™ 4 threads/core 8 cores/chip 32 sockets/server 1024 threads POWER6™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER5™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER4™ 1 thread/core 2 cores/chip 16 sockets/server 32 threads Threads
  • 6. Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
  • 7. Why should I care? Your application may be re-used Better performance Better leverage additional resources Cores, hardware threads, memory etc
  • 8. Think about scalability Serial bottlenecks inhibit scalability Organize your application into parallel tasks Consider TaskExecutor API Too many threads can be just as bad as too few Do not rely on JVM to discover opportunities No automatic parallelization Java class libraries do not exploit vector processor capabilities
  • 9. Think about scalability Load imbalance Workload not evenly distributed Consider breaking large tasks into smaller ones Change serial algorithms to parallel ones Tracing and I/O Bottleneck unless infrequent updates or log is striped (RAID) Blocking disk/console I/O inhibit scalability
  • 10. Synchronization and locking J9's Three-tiered locking Spin Yield OS Avoid synchronization in static methods Consider breaking long synchronized blocks into several smaller ones May be bad if results in many context switches Java Lock Monitor (JLM) tool can help http://perfinsp.sourceforge.net/jlm.html
  • 11. Synchronization and locking Volatiles Compiler will not cache the value Creates memory barrier Avoid synchronized container classes Building scalable data structures is difficult Use java.util.concurrent (j/u/c) Non-blocking object access Possible with j/u/c
  • 12. Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
  • 13. java.util.concurrent package Introduced in Java SE 5 Alternative strong synchronization Lighter weight, better scalability Comparing to intrinsic locks java.util.concurrent.atomic.* java.util.concurrent.locks.* ConcurrentCollections Synchronizers TaskExecutor
  • 14. j/u/c/atomic.* Atomic primitives Strong form of synchronization But does not use lock – non blocking Exploit atomic instructions such as compare-and-swap in hardware Supports compounded actions AtomicLongFieldUpdater AtomicMarkableReference AtomicReference AtomicReferenceArray AtomicReferenceFieldUpdater AtomicStampedReference AtomicBoolean AtomicInteger AtomicIntegerArray AtomicIntegerFieldUpdater AtomicLong AtomicLongArray
  • 15. j/u/c/atomic.* Getter and setters get set lazySet Updates getAndSet getAndAdd/getAndIncrement/getAndDecrement addAndGet/incrementAndGet/decrementAndGet CAS compareAndSet/weakCompareAndSet Conversions toString, intValue, longValue, floatValue, doubleValue
  • 16. j/u/c/locks.* Problems with intrinsic locks Impossible to back off from a lock attempt Deadlock Lack of features Read vs write Fairness policies Block-structured Must lock and release in the same method j/u/c/locks Greater flexibility for locks and conditions Non-block-structured Provides reader-writer locks Why block other readers? Better scalability
  • 17. j/u/c/locks.* Interfaces: Condition Lock ReadWriteLock Classes: ReentrantLock ReentrantReadWriteLock LockSupport AbstractQueuedSynchronizer
  • 18. j/u/c.* - Concurrent Collections Concurrent, thread safe implementations of several collections HashMap -> ConcurrentHashMap TreeMap -> ConcurrentSkipListMap ArrayList -> CopyOnWriteArrayList ArraySet -> CopyOnWriteArraySet Queues -> ConcurrentLinkedQueue or one of the blocking queues
  • 19. Strains on the VM Excessive use of temporary memory can lead to increased garbage collector activity Stop the world GC pauses the application Excessive class loading Updating class hierarchy Invalidating JIT optimizations Consider creating a “startup” phase Transitions between Java and native code VM access lock
  • 20. Memory Footprint Little control over object allocation in Java Small short lived objects are easier to cache Large long lived objects likely to cause cache misses Memory Analysis Tool (MAT) can help Consider using large pages for TLB misses -Xlp, requires OS support Tune your heap settings Heap lock contention with flat heap
  • 21. Affinitizing JVMs Can exploit cache hierarchy on a subset of cores JVM working set can fit within the physical memory of a single node in a NUMA system Linux: taskset, numactl Windows: start
  • 22. Is my application scalable? Low CPU means resources are not maximized Evaluate if application has too few/many threads Locks and synchronization Network connections, I/O Thrashing working set is too large for physical memory High CPU is generally good, as long as resources are spent in application threads, doing meaningful work Evaluate where time is being spent Garbage collection VM/JIT OS Kernel functions Other processes Tune, tune, tune
  • 23. Write Once, Tune Everywhere HealthCenter, GCMV, MAT http://www.ibm.com/developerworks/java/jdk/tools/ Dependence on operating System Memory allocation Socket layer Tune for hardware capabilities How many cores? How much memory? What is the limit on network access? Are there storage bottlenecks?
  • 24. Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
  • 25. IBM Java Execution Model is Built for Parallelism JIT Compiler Garbage Collector Application Threads Generates high performance code for application threads Customizes execution to underlying hardware Optimizes locking performance Asynchronous compilation thread Java software threads are executed on multiple hardware threads Thread safe libraries with scalable concurrency support for parallel programming Manages memory on behalf of the application Must balance throughput against observed pauses Exploits many multiple hardware threads
  • 26. Configurable Garbage Collection policies Multiple policies to match varying user requirements Pause time, Throughput, Memory footprint and GC overhead All modes exploit parallel execution Dynamic adaptation to number of available hardware cores & threads GC scalability independent from user application scalability Very low overhead (<3%) on typical workloads
  • 27. How do GC policies compare? - optthruput Time Thread 1 Thread 2 Thread 3 Thread n GC Java Optimize Throughput Highly parallel GC + streamlined application thread execution May cause longer pause times -Xgcpolicy:optthruput Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies.
  • 28. How do GC policies compare? - optavgpause Time GC Java Concurrent Tracing Optimize Pause Time GC cleans up concurrently with application thread execution Sacrifice some throughput to reduce average pause times -Xgcpolicy:optavgpause Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies. Thread 1 Thread 2 Thread 3 Thread n
  • 29. How do GC policies compare? - gencon Time Global GC Java Concurrent Tracing Scavenge GC Balanced Clean up many short-lived objects concurrent with application threads Some pauses needed to collect longer-lived objects -Xgcpolicy:gencon Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies. Thread 1 Thread 2 Thread 3 Thread n
  • 30. How do GC policies compare? - subpools Uses multiple free lists Tries to predict the size of future allocation requests based on earlier allocation requests. Recreates free lists at the end of each GC based on these predictions. While allocating objects on the heap, free chunks are chosen using a “best fit” method, as against the “first fit” method used in other algorithms. Concurrent marking is disabled Scalable Scalable GC focused on the larger multiprocessor machines Improved object allocation algorithm May not be appropriate for small-to-midsize configurations – Xgcpolicy:subpool
  • 31. JVM optimizations for multi-core scalability Lock removal across JVM and class libraries java.util.concurrent package optimizations Better working set for cache efficiency Stack allocation Remove/optimize synchronization Thread local storage for send/receive buffers Non-blocking containers Asynch JIT compilation on a separate thread Right-sized application runtimes
  • 32. Merci Grazie Gracias Obrigado Danke French Russian German Italian Spanish Brazilian Portuguese Arabic Simplified Chinese Traditional Chinese Thai Korean Thank You Questions? Email: [email_address] http://www.ibm.com/developerworks/java/ Japanese
  • 33. Special notices © IBM Corporation 2010. All Rights Reserved. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. The following are trademarks of the International Business Machines Corporation in the United States and/or other countries: ibm.com/legal/copytrade.shtmlAIX, CICS, CICSPlex, DataPower, DB2, DB2 Universal Database, i5/OS, IBM, the IBM logo, IMS/ESA, Power Systems, Lotus, OMEGAMON, OS/390, Parallel Sysplex, pureXML, Rational, Redbooks, Sametime, SMART SOA, System z , Tivoli, WebSphere, and z/OS. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.