Friday, June 14, 2013

Java GC in Numbers – Parallel Young Collection

This is a first articles in series, where I would like to study effect of various HotSpot JVM options on duration of STW pauses associated with garbage collection.

This article will study how number of parallel threads affects duration of young collection Stop-the-World pause. HotSpot JVM has several young GC algorithms. My experiments are covering following combinations:

  • Serial young (DefNew), Mark Sweep Compact old
  • Parallel young (ParNew), Mark Sweep Compact old
  • Serial young (DefNew), Concurrent Mark Sweep old
  • Parallel young (ParNew), Concurrent Mark Sweep old
  • There is also PSNew (Parallel Scavenge) algorithm similar to ParNew, but it cannot be used together with Concurrent Mark Sweep (CMS), so I have ignored it.

    In experiments, I was using synthetic benchmark producing evenly distributed load on memory subsystem. Size of young generation was same for all experiments (64MiB). Two versions of HotSpot JVM were used: JDK 6u43 (VM 20.14-b01) and JDK 7u15 (VM 23.7-b01).

    Test box was equipped with two 12 core x 2 hardware threads CPUs (totaling in 48 hardware threads).

    Mark Sweep Compact

    Mark Sweep Compact is prone to regular full GCs, so it is not a choice for pause sensitive applications. But it shares same young collection algorithms/code with concurrent collector and produces less noisy results, so I added to better understand concurrent case.

    Difference between single thread case and 48 thread case is significant so number are present in two graphics.

    Note worthy (not surprising though), that serial algorithm performs slightly better than parallel with one thread. Discrepancy between Java 6 and Java 7 is also interesting, but I have no ideas now to explain that.

    From graphics above you can get an idea that more threads is better, but it is not obvious how exactly better. Graphics below show effective parallelization (8 thread case is taken as base value, because smaller numbers of threads are producing fairly noisy results).

    You can see almost linear parallelization up to 16 threads. It is also worth to note, that 48 threads are considerably faster that 24 even though there are only 24 physical cores. Effect of parallelization is slightly better for larger heap sizes.

    Concurrent Mark Sweep

    Concurrent Mark Sweep is a collector used for pause sensitive applications and young collection pause time is something that you probably really care if you have consciously chosen CMS. Same hardware and same benchmark were used.
    Results are below.

    Compared to Mark Sweep Compact, concurrent algorithm is producing much noisy results (especially for small number of threads).

    Java 7 is systematically showing worse performance compared to Java 6, not too much though.

    Parallelization diagrams, show us same picture - linear scalability, which degrades with greater number of threads (experiment conditions is slightly different for CMS and MSC cases, so direct comparison of these diagrams is not correct).

    Conclusions

    Tests have confirmed that parallel young collection algorithms in HotSpot JVM scales extremely well by number of CPU cores. Having a lot of CPU cores on server will help you greatly with JVM Stop-the-World pauses.

    Source code

    Source code used for benchmarking and its description is available at GitHub.
    github.com/aragozin/jvm-tools/tree/master/ygc-bench

    Thursday, May 23, 2013

    Ad-hoc diagnostic tools for JVM

    There are bunch of graphical tools helping you to look inside of running JVM (VisualVm and JConsole are extremely helpful). Unfortunately, some times (almost always in my case), you may find yourself in SSH console on headless server side by side with your JVM process, trying to investigate problem.

    Why CLI is important?

    You can connect to remote JVM process via JMX in VisualVM and there are other interesting tools like CRaSH offering a lot of goodies for troubleshooting. But …

    • You may be behind firewall, using broker SSH relay as only way to access environment.
    • All remote tools require ahead of time setup on JVM side.
    • Remote connections should be secured properly – that is huge burden.

    CLI tools are leveraging OS security and will work in your SSH console, reliving you from all pains above.

    Stock tools

    Of cause there are CLI tools in your JDK package. Let me highlight few tools from Oracle’s stock JDK.

    jps

    This one could list you JVM process (instead of doing ps … | grep java). Similar to ps it could display command line arguments of process. It is useful to find PID of JVM you are interested in, which will be required for other tool.

    jmap

    This little tool will allow you to take adhoc head dumps and calculate memory footprint histograms by classes. It also can be used to enfore full GC on target JVM. Be careful,  some of jmap operations could cause STW pauses in target JVM.

    jstack

    Dump your thread stacks, look though your locks.

    jstat

    JVM exposed a lot of internal details (e.g. memory size, compilation statistics, etc). jstat could report some of this data. Output is fairly cryptic (and machine oriented), but never less helpful.

    jinfo

    Introspect –XX options of running JVM. You could also change some of them for live JVM process (e.g. enable -XX:+PrintGCDetails to add GC details in application log).

    Behind the scene

    Behind the scene JVM have several internal protocols which could be used by diagnostic tools (not to count JMX):

    • Attach API – lightweight protocol for connection to JVM processes.
    • Perf data – shared memory based protocol for JVM to expose its performance counters. Using shared memory makes it very lightweight.
    • JDI, JDWP, JVMTI – components of Java Platform Debugger Architecture

    Both Attach API and perf data are lightweight, fairly unintrusive for monitored application and could be easily used from Java code.

    Swiss Java Knife – CLI tools for ad hoc JVM diagnostic

    Some time ago I was blogging about few tools using Attach API and JMX – jtop and gcrep. Since then, few things have changed:

    • wrapper around attach API has been factored out in separate module org.gridkit.lab:jvm-attach-api:1.1
    • tools have been consolidated into single JAR and modularized (so it is easy to add new commands or build jar with specific set of commands)

    But most important new cool features have been added:

    java –jar sjk.jar jps

    Stock jps can print command line, but it is as far as it can help you. SJK’s version of jps allows you to choose which information about process you would like to be shown. E.g. you can add value of specific system property or –XX flag to the output. Another improvement is build-in filter option. You can filter java processes by command line (which includes main class name) and system properties.

    java –jar sjk.jar ttop

    Thread top command was also improved. Sort option has been added and you also can limit number of threads in output to top N. Additinally, filter option will help you to monitor only certain threads.

    java –jar sjk.jar mxdump

    This command will dump all MBeans from process to json format.

    java –jar sjk.jar hh

    “Heap histo” command is extended version of jmap –histo. Killer feature is --dead flag which will display histogram of garbage in heap (actually tool will take 2 histograms: all heap and live object – and show difference).

    You can download sjk jars here.
    Or build from sources https://github.com/aragozin/jvm-tools

    Tuesday, May 14, 2013

    TechTalk: Java GC - Theory and Practice

    I'm glad to announce upcoming tech talk in our user group at Moscow - "Java GC — Theory and Practice"

    Event will be held on 16 May at Moscow, online translation will be available (tech talk language - russian)

    Registration is open at http://aragozin.timepad.ru/event/60137/

    В программе:

    Теория:

    Алгоритмы сборки мусора. Слабая гипотеза о поколения. Механизмы барьеров записи. Математикечкая модель длительности пауз для Concurent Mark Sweep.

    Практика — HotSpot JVM — Concurrent Mark Sweep GC

    Принципы сайзинга памяти JVM. Тюнинг сборщика молодого поколения. Фрагментация. Специальные ссылки. Паузы не связанные со сборкой мусора.

    "Фантастика"

    Работа с off-heap. Shareв read-only heap region. G1 и тренды развития HotSpot.