SlideShare a Scribd company logo
JAVA
PERFORMANCE
PUZZLERS
DOUGLAS Q. HAWKINS
LEAD JIT DEVELOPER
AZUL SYSTEMS
@dougqh
dougqh@gmail.com
AGENDA
LOOK AT SOME INTERESTING
OFTEN UNINTUITIVE PERFORMANCE CASES
SEE WHAT WE CAN LEARN FROM THEM
Adding 1..1395 Numbers
Adding 1..1396 Numbers
A
B
C Adding 1..1397 Numbers
1
100
10000
0 50 100 150 200 250 300 350 400 450
logns
iterations
Interpreter 1st JIT 2nd JIT Repeat…
Deoptimize
& Repeat
JMH JAVA MEASUREMENT HARNESS:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class Benchmark {
@Setup
public void setup() {}
@Benchmark
public void benchmark1() { … }
@Benchmark
public void benchmark2() { … }
…
}
01A
add1395 avgt 10 3.790 ± 0.300 ns/op
add1396 avgt 10 3.784 ± 0.336 ns/op
add1397 avgt 10 2767.567 ± 351.909 ns/op
01A
RESULTS
PERFORMANCE IS
FULL OF SURPRISES.
PERFORMANCE CLIFF:
HUGE METHOD LIMIT
add1395 7993 bytes
add1396 7999 bytes
add1397 8005 bytes
int x = 0;
for ( int i = 1; i <= 1395; ++i ) {
x += i;
}
return x;
A
B
C
int x = 0;
for ( int i = 1; i <= 1396; ++i ) {
x += i;
}
return x;
int x = 0;
for ( int i = 1; i <= 1397; ++i ) {
x += i;
}
return x; 01B
add1395 avgt 10 435.372 ± 19.225 ns/op
add1396 avgt 10 436.955 ± 18.688 ns/op
add1397 avgt 10 434.245 ± 11.635 ns/op
NOW EQUALLY — BAD
01B
System.arraycopy
int[] dest = new int[src.length];
for ( int i = 0; i < src.length; ++i ) {
dest[i] = src[i];
}
array.clone
A
B
C
02A
arrayCopy avgt 10 95598.200 ns/op
cloneArray avgt 10 94848.814 ns/op
manual avgt 10 93309.838 ns/op
02A
RESULTS
100k
int[] original = randomInts(100_000);
int[] copy1 = new int[original.length];
long startTime1 = System.nanoTime();
for ( int i = 0; i < original.length; ++i ) {
copy1[i] = original[i];
}
System.out.printf("copy loop: % 10d ns%n",
System.nanoTime() - startTime1);
int[] copy2 = new int[original.length];
long startTime2 = System.nanoTime();
System.arraycopy(
original, 0,
copy2, 0, original.length);
System.out.printf("arraycopy: % 10d ns%n",
System.nanoTime() - startTime2);
VS.
02B
02B
RESULTS
copy loop: 1646957 ns
arraycopy: 54827 ns
PERFORMANCE IS
FULL OF INTRICACIES.
O(N) != O(N)
Integer.valueOf(x)
new Integer(x)
x
A
B
C
03
public java.lang.Integer valueOf();
Code:
0: aload_0
1: getfield offset:I
4: aload_0
5: getfield nums:[I
8: arraylength
9: if_icmplt 17
12: aload_0
13: iconst_0
14: putfield offset:I
17: aload_0
18: getfield nums:[I
21: aload_0
22: dup
23: getfield offset:I
26: dup_x1
27: iconst_1
28: iadd
29: putfield offset:I
32: iaload
33: invokestatic Integer.valueOf(I)
36: areturn
public java.lang.Integer auto();
Code:
0: aload_0
1: getfield offset:I
4: aload_0
5: getfield nums:[I
8: arraylength
9: if_icmplt 17
12: aload_0
13: iconst_0
14: putfield offset:I
17: aload_0
18: getfield nums:[I
21: aload_0
22: dup
23: getfield offset:I
26: dup_x1
27: iconst_1
28: iadd
29: putfield offset:I
32: iaload
33: invokestatic Integer.valueOf(I)
36: areturn
javap -c
03
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
static final int low = -128;
// configurable through
// -Djava.lang.Integer.IntegerCache.high
static final int high = 127;
03
@Param({"100", "1000", "10000"})
private int range;
PARAMERIZING JMH
@Setup
public void setUp() {
ThreadLocalRandom random = ThreadLocalRandom.current();
nums = new int[1_000_000];
for ( int i = 0; i < nums.length; ++i ) {
nums[i] = random.nextInt(-range, range);
}
}
03
RESULTSRESULTS
(range)
auto 100 avgt 10 5.132 ± 0.316 ns/op
auto 1000 avgt 10 8.184 ± 1.551 ns/op
auto 10000 avgt 10 6.996 ± 1.401 ns/op
new_ 100 avgt 10 6.328 ± 0.973 ns/op
new_ 1000 avgt 10 6.083 ± 0.651 ns/op
new_ 10000 avgt 10 6.243 ± 1.031 ns/op
valueOf 100 avgt 10 5.096 ± 0.116 ns/op
valueOf 1000 avgt 10 8.488 ± 1.957 ns/op
valueOf 10000 avgt 10 7.155 ± 1.382 ns/op
03
RESULTS
-100 to 100 -1,000 to 1,000 -10,000 to 10,000
new 6.328 ± 0.973 6.083 ± 0.651 6.243 ± 1.031
autobox 5.132 ± 0.316 8.184 ± 1.551 6.996 ± 1.401
valueOf 5.096 ± 0.116 8.488 ± 1.957 7.155 ± 1.382
EVERYTHING MATTERS:
HARDWARE INCLUDED
0
2000000
4000000
6000000
8000000
100 1000 10000 100000 1000000
ints ArrayList LinkedList
http://cr.openjdk.java.net/~shade/scratch/ArrayVsLinked.java
LINKED LIST
ARRAY LIST
ARRAY
O(N) != O(N)
A
B
C
D
E
list.toArray()
list.toArray(new Object[0])
list.toArray(new Object[list.size()])
list.toArray(new String[0])
list.toArray(new String[list.size()])
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
A
B
C
D
E
list.toArray()
list.toArray(new Object[0])
list.toArray(new Object[list.size()])
list.toArray(new String[0])
list.toArray(new String[list.size()])
SAYS…
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
04
toArray avgt 10 54.084 ± 10.000 ns/op
toArraySized avgt 10 58.555 ± 0.745 ns/op
toArrayUnsized avgt 10 54.025 ± 0.343 ns/op
toStringArraySize avgt 10 154.291 ± 2.060 ns/op
toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op
RESULTS
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
STRING[] SLOWER THAN OBJECT[]?
Object[] objects = new Integer[20];
objects[0] = “foo”;
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
STRING[] SLOWER THAN OBJECT[]?
Object[] objects = new Integer[20];
objects[0] = “foo”;
Possible Runtime Check
Sometimes JIT Eliminates It
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
TRICKY WHEN ARRAY IS PASSED IN
list.toArray(new String[…]);
OBJECT[] IS COMMONLY USED,
SO SPECIAL CASE.
toArray avgt 10 54.084 ± 10.000 ns/op
toArraySized avgt 10 58.555 ± 0.745 ns/op
toArrayUnsized avgt 10 54.025 ± 0.343 ns/op
toStringArraySize avgt 10 154.291 ± 2.060 ns/op
toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op
IS WRONG.
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
WHY IS NO ARRAY / UNSIZED FASTER
// allocate
dest = malloc(sizeof(E) * len);
// zero-initialize
for ( int i = 0; i < len; ++i ) {
dest[i] = null;
}
// copy
for ( int i = 0; i < len; ++i ) {
dest[i] = src[i];
}
Dead
Stores
Integer.toString(NUM)
“” + NUM
A
B
C
StringBuilder builder = new StringBuilder();
builder.append(NUM);
builder.toString();
05A
static final int NUM =
ThreadLocalRandom.current().nextInt()
builder avgt 10 47.688 ± 3.866 ns/op
concat avgt 10 33.118 ± 0.840 ns/op
toString avgt 10 41.105 ± 2.005 ns/op
05A
javap -c
public java.lang.String concat();
Code:
0: new StringBuilder
3: dup
4: invokespecial StringBuilder."<init>":()V
7: ldc ""
9: invokevirtual StringBuilder.append(LString;)
12: getstatic NUM:I
15: invokevirtual StringBuilder.append(I)
18: invokevirtual StringBuilder.toString()
21: areturn
05A
public java.lang.String concat() {
return new StringBuilder().
append(“”).
append(NUM).
toString();
}
IN JAVA...
05B
return new StringBuilder().
append(“”).
append(NUM).
toString();
StringBuilder builder =
new StringBuilder();
builder.append(NUM);
VS.
05B
return builder.toString();
05B
RESULTS
builder avgt 10 41.122 ± 1.688 ns/op
concat avgt 10 35.173 ± 3.092 ns/op
concatLikeBuilder avgt 10 32.536 ± 2.302 ns/op
COMPILERS ARE
GLORIFIED REGEX-ES.
static final int NUM = 1 << 20
builder avgt 10 40.734 ± 0.997 ns/op
concat avgt 10 3.343 ± 0.205 ns/op
toString avgt 10 32.877 ± 0.450 ns/op
05C
public java.lang.String concat();
Code:
0: ldc “1048576”
2: areturn
public java.lang.String concat() {
return “1048576”;
}
javap -c
05C
CONSTANT FOLDING & PROPAGATION
static final int NUM = 1 << 20
static final int NUM = 1_048_576
constant
fold
constant
propagate
“” + NUM
“” + 1_048_576
constant
fold
“1048576”
05C
PERFORMANCE
COMPOSES
UNINTUITIVELY.
String str = “”;
for ( int i = 0; i < 100; ++i ) {
str += i;
}
A
B
StringBuilder builder = new StringBuilder();
for ( int i = 0; i < 100; ++i ) {
builder.append(i);
}
06
RESULTS
builder avgt 10 979.112 ± 59.893 ns/op
concat avgt 10 2661.941 ± 135.251 ns/op
05
FAST IN ONE CONTEXT
CAN BE SLOW IN ANOTHER.
RESULTS
speed
objects
allocated*
memory
consumed*
concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes
builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes
* Memory usage measured separately with Caliper
06
RESULTS
speed
objects
allocated*
memory
consumed*
concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes
builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes
sizedBuilder 887.717 ± 62.569 ns/op 4 objects 1464 bytes
* Memory usage measured separately with Caliper
06
APPLIES TO COLLECTIONS, TOO.
HashSet 16
HashMap 16
Hashtable 16
LinkedList 1
ArrayList 10
Vector 10
StringBuilder 16
StringBuffer 16
http://www.slideshare.net/cnbailey/memory-efficient-java
O(N)?
obj.invoke with 1 type
obj.invoke with 2 type
A
B
C obj.invoke with 3 types
(monomoprhic)
(bimorphic)
(trimorphic)
(megamorphic)
07A
THERE IS A CLIFF
AT 3 TYPES.
https://shipilev.net/blog/2015/black-magic-method-dispatch/
func.apply(x);
if ( func.getClass() == Square.class ) {
x * x;
} else if ( func.getClass() == Cube.class ) {
x * x * x
} else {
…
}
func.apply(x);
<= 2 types > 2 types
@Setup
public void setup() {
for ( int i = 0; i < 20_000; ++i ) {
if ( morphism >= 1 ) func = new Square();
call();
if ( morphism >=2 ) func = new Cube();
call();
if ( morphism >= 3 ) …
call();
if ( morphism >= 4 ) …
call();
}
// regardless of morphism --
// use Square in the end
func = new Square();
}
@Benchmark
public int call() {
int x = nums[index];
index =
(index + 1) % nums.length;
return func.apply(x);
}
call 1 avgt 10 8.120 ± 0.103 ns/op
call 2 avgt 10 8.225 ± 0.113 ns/op
call 3 avgt 10 8.170 ± 0.329 ns/op
call 4 avgt 10 8.189 ± 0.241 ns/op
RESULTS
07A
(morphism)
NO CLIFF?
@Benchmark
public int callLoop() {
int sum = 0;
for ( int x: nums ) {
sum += call(x);
}
return sum;
}
public int call(int x) {
return func.apply(x);
}
07B
https://shipilev.net/blog/2015/black-magic-method-dispatch/
07B
RESULTS
callLoop 1 avgt 10 4079.187 ± 269.537 ns/op
callLoop 2 avgt 10 6090.224 ± 209.573 ns/op
callLoop 3 avgt 10 20508.673 ± 18484.645 ns/op
callLoop 4 avgt 10 20271.124 ± 17767.914 ns/op
(morphism)
PERFORMANCE
ISN’T ADDITIVE.
FAST + FAST = SLOW
PERFORMANCE IS FULL OF SURPRISES.
UNINTUITIVE
INTRICACIES
SENSITIVITIES
NON-OBVIOUS CLIFFS
NOT ADDITIVE
PERFORMANCE IS FULL OF SURPRISES.
DON’T WORRY *TOO* MUCH.
JUST WRITE CLEAN CODE.
ONLY WORRY ABOUT THE HOTTEST CODE,
IMPROVE AND MEASURE *CAREFULLY*.
REMEMBER BEST WAY TO ADD 1…1396
int x = 0;
x += 1; x += 2; x += 3; x += 4; x += 5; x += 6; x += 7; x += 8; x += 9; x += 10;
x += 11; x += 12; x += 13; x += 14; x += 15; x += 16; x += 17; x += 18; x += 19; x += 20;
x += 21; x += 22; x += 23; x += 24; x += 25; x += 26; x += 27; x += 28; x += 29; x += 30;
x += 31; x += 32; x += 33; x += 34; x += 35; x += 36; x += 37; x += 38; x += 39; x += 40;
x += 41; x += 42; x += 43; x += 44; x += 45; x += 46; x += 47; x += 48; x += 49; x += 50;
x += 51; x += 52; x += 53; x += 54; x += 55; x += 56; x += 57; x += 58; x += 59; x += 60;
x += 61; x += 62; x += 63; x += 64; x += 65; x += 66; x += 67; x += 68; x += 69; x += 70;
x += 71; x += 72; x += 73; x += 74; x += 75; x += 76; x += 77; x += 78; x += 79; x += 80;
x += 81; x += 82; x += 83; x += 84; x += 85; x += 86; x += 87; x += 88; x += 89; x += 90;
x += 91; x += 92; x += 93; x += 94; x += 95; x += 96; x += 97; x += 98; x += 99; x += 100;
x += 101; x += 102; x += 103; x += 104; x += 105; x += 106; x += 107; x += 108; x += 109; x += 110;
x += 111; x += 112; x += 113; x += 114; x += 115; x += 116; x += 117; x += 118; x += 119; x += 120;
x += 121; x += 122; x += 123; x += 124; x += 125; x += 126; x += 127; x += 128; x += 129; x += 130;
x += 131; x += 132; x += 133; x += 134; x += 135; x += 136; x += 137; x += 138; x += 139; x += 140;
x += 141; x += 142; x += 143; x += 144; x += 145; x += 146; x += 147; x += 148; x += 149; x += 150;
x += 151; x += 152; x += 153; x += 154; x += 155; x += 156; x += 157; x += 158; x += 159; x += 160;
x += 161; x += 162; x += 163; x += 164; x += 165; x += 166; x += 167; x += 168; x += 169; x += 170;
x += 171; x += 172; x += 173; x += 174; x += 175; x += 176; x += 177; x += 178; x += 179; x += 180;
x += 181; x += 182; x += 183; x += 184; x += 185; x += 186; x += 187; x += 188; x += 189; x += 190;
x += 191; x += 192; x += 193; x += 194; x += 195; x += 196; x += 197; x += 198; x += 199; x += 200;
x += 201; x += 202; x += 203; x += 204; x += 205; x += 206; x += 207; x += 208; x += 209; x += 210;
int n = 1395;
return n * (n + 1) / 2;A
B
C
int n = 1396;
return n * (n + 1) / 2;
int n = 1397;
return n * (n + 1) / 2;
2.515 ± 0.043 ns/op
2.532 ± 0.089 ns/op
2.580 ± 0.042 ns/op
01C
REFERENCES
ALEKSEY SHIPILËV
http://shipilev.net/
PSYCHOMATIC LOBOTOMY SAW
http://psy-lob-saw.blogspot.com/
MECHANICAL SYMPATHY
http://mechanical-sympathy.blogspot.com/
JAVA SPECIALIST NEWSLETTER
http://www.javaspecialists.eu/

More Related Content

What's hot (20)

PDF
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
NAVER / MusicPlatform
 
PPTX
How Data Flow analysis works in a static code analyzer
Andrey Karpov
 
PPTX
Java practice programs for beginners
ishan0019
 
DOC
Final JAVA Practical of BCA SEM-5.
Nishan Barot
 
PDF
.NET Multithreading and File I/O
Jussi Pohjolainen
 
PDF
Beauty and the beast - Haskell on JVM
Jarek Ratajski
 
PPTX
COSCUP: Introduction to Julia
岳華 杜
 
DOCX
Java PRACTICAL file
RACHIT_GUPTA
 
PDF
Java practical(baca sem v)
mehul patel
 
PDF
Dagger & rxjava & retrofit
Ted Liang
 
PDF
Virtual machine and javascript engine
Duoyi Wu
 
PPT
Thread
phanleson
 
ODP
Java Generics
Carol McDonald
 
PDF
The Language for future-julia
岳華 杜
 
PPTX
Java simple programs
VEERA RAGAVAN
 
PDF
Java Practical File Diploma
mustkeem khan
 
PDF
The Ring programming language version 1.5.3 book - Part 10 of 184
Mahmoud Samir Fayed
 
PDF
Aaron Bedra - Effective Software Security Teams
centralohioissa
 
PDF
The Ring programming language version 1.5.4 book - Part 10 of 185
Mahmoud Samir Fayed
 
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
NAVER / MusicPlatform
 
How Data Flow analysis works in a static code analyzer
Andrey Karpov
 
Java practice programs for beginners
ishan0019
 
Final JAVA Practical of BCA SEM-5.
Nishan Barot
 
.NET Multithreading and File I/O
Jussi Pohjolainen
 
Beauty and the beast - Haskell on JVM
Jarek Ratajski
 
COSCUP: Introduction to Julia
岳華 杜
 
Java PRACTICAL file
RACHIT_GUPTA
 
Java practical(baca sem v)
mehul patel
 
Dagger & rxjava & retrofit
Ted Liang
 
Virtual machine and javascript engine
Duoyi Wu
 
Thread
phanleson
 
Java Generics
Carol McDonald
 
The Language for future-julia
岳華 杜
 
Java simple programs
VEERA RAGAVAN
 
Java Practical File Diploma
mustkeem khan
 
The Ring programming language version 1.5.3 book - Part 10 of 184
Mahmoud Samir Fayed
 
Aaron Bedra - Effective Software Security Teams
centralohioissa
 
The Ring programming language version 1.5.4 book - Part 10 of 185
Mahmoud Samir Fayed
 

Similar to Java Performance Puzzlers (20)

PPTX
Java Performance Tips (So Code Camp San Diego 2014)
Kai Chan
 
PDF
Java Performance Tuning
Atthakorn Chanthong
 
KEY
Java Performance MythBusters
Sebastian Zarnekow
 
PPT
Java performance
Sergey D
 
PPTX
"Эффективность и оптимизация кода в Java 8" Сергей Моренец
Fwdays
 
PPTX
Why learn Internals?
Shaul Rosenzwieg
 
ODP
Best practices in Java
Mudit Gupta
 
PPTX
Effectiveness and code optimization in Java
Strannik_2013
 
PPT
JVM performance options. How it works
Dmitriy Dumanskiy
 
PPTX
How to write memory efficient code?
Tier1 app
 
PDF
Save Java memory
JavaDayUA
 
PPTX
ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under...
garbervetsky
 
PDF
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
PPTX
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov
 
PPT
Memory efficient programming
indikaMaligaspe
 
PDF
5 Coding Hacks to Reduce GC Overhead
Takipi
 
KEY
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Charles Nutter
 
PDF
Java performance
Rajesuwer P. Singaravelu
 
PDF
Lecture20 vector
nurkhaledah
 
PPTX
Java best practices
Անուշիկ Միրզոյան
 
Java Performance Tips (So Code Camp San Diego 2014)
Kai Chan
 
Java Performance Tuning
Atthakorn Chanthong
 
Java Performance MythBusters
Sebastian Zarnekow
 
Java performance
Sergey D
 
"Эффективность и оптимизация кода в Java 8" Сергей Моренец
Fwdays
 
Why learn Internals?
Shaul Rosenzwieg
 
Best practices in Java
Mudit Gupta
 
Effectiveness and code optimization in Java
Strannik_2013
 
JVM performance options. How it works
Dmitriy Dumanskiy
 
How to write memory efficient code?
Tier1 app
 
Save Java memory
JavaDayUA
 
ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under...
garbervetsky
 
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov
 
Memory efficient programming
indikaMaligaspe
 
5 Coding Hacks to Reduce GC Overhead
Takipi
 
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Charles Nutter
 
Java performance
Rajesuwer P. Singaravelu
 
Lecture20 vector
nurkhaledah
 
Java best practices
Անուշիկ Միրզոյան
 
Ad

More from Doug Hawkins (8)

PDF
ReadyNow: Azul's Unconventional "AOT"
Doug Hawkins
 
PDF
JVM Mechanics: When Does the JVM JIT & Deoptimize?
Doug Hawkins
 
PDF
Understanding Garbage Collection
Doug Hawkins
 
PDF
JVM Internals - NHJUG Jan 2012
Doug Hawkins
 
PDF
Inside Android's Dalvik VM - NEJUG Nov 2011
Doug Hawkins
 
PDF
JVM Internals - NEJUG Nov 2010
Doug Hawkins
 
KEY
Introduction to Class File Format & Byte Code
Doug Hawkins
 
PDF
JVM Internals - Garbage Collection & Runtime Optimizations
Doug Hawkins
 
ReadyNow: Azul's Unconventional "AOT"
Doug Hawkins
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
Doug Hawkins
 
Understanding Garbage Collection
Doug Hawkins
 
JVM Internals - NHJUG Jan 2012
Doug Hawkins
 
Inside Android's Dalvik VM - NEJUG Nov 2011
Doug Hawkins
 
JVM Internals - NEJUG Nov 2010
Doug Hawkins
 
Introduction to Class File Format & Byte Code
Doug Hawkins
 
JVM Internals - Garbage Collection & Runtime Optimizations
Doug Hawkins
 
Ad

Recently uploaded (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Digital Circuits, important subject in CS
contactparinay1
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 

Java Performance Puzzlers

  • 2. AGENDA LOOK AT SOME INTERESTING OFTEN UNINTUITIVE PERFORMANCE CASES SEE WHAT WE CAN LEARN FROM THEM
  • 3. Adding 1..1395 Numbers Adding 1..1396 Numbers A B C Adding 1..1397 Numbers
  • 4. 1 100 10000 0 50 100 150 200 250 300 350 400 450 logns iterations Interpreter 1st JIT 2nd JIT Repeat… Deoptimize & Repeat
  • 5. JMH JAVA MEASUREMENT HARNESS: @State(Scope.Benchmark) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class Benchmark { @Setup public void setup() {} @Benchmark public void benchmark1() { … } @Benchmark public void benchmark2() { … } … } 01A
  • 6. add1395 avgt 10 3.790 ± 0.300 ns/op add1396 avgt 10 3.784 ± 0.336 ns/op add1397 avgt 10 2767.567 ± 351.909 ns/op 01A RESULTS
  • 8. PERFORMANCE CLIFF: HUGE METHOD LIMIT add1395 7993 bytes add1396 7999 bytes add1397 8005 bytes
  • 9. int x = 0; for ( int i = 1; i <= 1395; ++i ) { x += i; } return x; A B C int x = 0; for ( int i = 1; i <= 1396; ++i ) { x += i; } return x; int x = 0; for ( int i = 1; i <= 1397; ++i ) { x += i; } return x; 01B
  • 10. add1395 avgt 10 435.372 ± 19.225 ns/op add1396 avgt 10 436.955 ± 18.688 ns/op add1397 avgt 10 434.245 ± 11.635 ns/op NOW EQUALLY — BAD 01B
  • 11. System.arraycopy int[] dest = new int[src.length]; for ( int i = 0; i < src.length; ++i ) { dest[i] = src[i]; } array.clone A B C 02A
  • 12. arrayCopy avgt 10 95598.200 ns/op cloneArray avgt 10 94848.814 ns/op manual avgt 10 93309.838 ns/op 02A RESULTS 100k
  • 13. int[] original = randomInts(100_000); int[] copy1 = new int[original.length]; long startTime1 = System.nanoTime(); for ( int i = 0; i < original.length; ++i ) { copy1[i] = original[i]; } System.out.printf("copy loop: % 10d ns%n", System.nanoTime() - startTime1); int[] copy2 = new int[original.length]; long startTime2 = System.nanoTime(); System.arraycopy( original, 0, copy2, 0, original.length); System.out.printf("arraycopy: % 10d ns%n", System.nanoTime() - startTime2); VS. 02B
  • 14. 02B RESULTS copy loop: 1646957 ns arraycopy: 54827 ns
  • 15. PERFORMANCE IS FULL OF INTRICACIES.
  • 18. public java.lang.Integer valueOf(); Code: 0: aload_0 1: getfield offset:I 4: aload_0 5: getfield nums:[I 8: arraylength 9: if_icmplt 17 12: aload_0 13: iconst_0 14: putfield offset:I 17: aload_0 18: getfield nums:[I 21: aload_0 22: dup 23: getfield offset:I 26: dup_x1 27: iconst_1 28: iadd 29: putfield offset:I 32: iaload 33: invokestatic Integer.valueOf(I) 36: areturn public java.lang.Integer auto(); Code: 0: aload_0 1: getfield offset:I 4: aload_0 5: getfield nums:[I 8: arraylength 9: if_icmplt 17 12: aload_0 13: iconst_0 14: putfield offset:I 17: aload_0 18: getfield nums:[I 21: aload_0 22: dup 23: getfield offset:I 26: dup_x1 27: iconst_1 28: iadd 29: putfield offset:I 32: iaload 33: invokestatic Integer.valueOf(I) 36: areturn javap -c 03
  • 19. public static Integer valueOf(int i) { if (i >= IntegerCache.low && i <= IntegerCache.high) return IntegerCache.cache[i + (-IntegerCache.low)]; return new Integer(i); } static final int low = -128; // configurable through // -Djava.lang.Integer.IntegerCache.high static final int high = 127;
  • 20. 03 @Param({"100", "1000", "10000"}) private int range; PARAMERIZING JMH @Setup public void setUp() { ThreadLocalRandom random = ThreadLocalRandom.current(); nums = new int[1_000_000]; for ( int i = 0; i < nums.length; ++i ) { nums[i] = random.nextInt(-range, range); } }
  • 21. 03 RESULTSRESULTS (range) auto 100 avgt 10 5.132 ± 0.316 ns/op auto 1000 avgt 10 8.184 ± 1.551 ns/op auto 10000 avgt 10 6.996 ± 1.401 ns/op new_ 100 avgt 10 6.328 ± 0.973 ns/op new_ 1000 avgt 10 6.083 ± 0.651 ns/op new_ 10000 avgt 10 6.243 ± 1.031 ns/op valueOf 100 avgt 10 5.096 ± 0.116 ns/op valueOf 1000 avgt 10 8.488 ± 1.957 ns/op valueOf 10000 avgt 10 7.155 ± 1.382 ns/op
  • 22. 03 RESULTS -100 to 100 -1,000 to 1,000 -10,000 to 10,000 new 6.328 ± 0.973 6.083 ± 0.651 6.243 ± 1.031 autobox 5.132 ± 0.316 8.184 ± 1.551 6.996 ± 1.401 valueOf 5.096 ± 0.116 8.488 ± 1.957 7.155 ± 1.382
  • 24. 0 2000000 4000000 6000000 8000000 100 1000 10000 100000 1000000 ints ArrayList LinkedList http://cr.openjdk.java.net/~shade/scratch/ArrayVsLinked.java
  • 27. ARRAY
  • 29. A B C D E list.toArray() list.toArray(new Object[0]) list.toArray(new Object[list.size()]) list.toArray(new String[0]) list.toArray(new String[list.size()]) https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 30. A B C D E list.toArray() list.toArray(new Object[0]) list.toArray(new Object[list.size()]) list.toArray(new String[0]) list.toArray(new String[list.size()]) SAYS… https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 31. toArray avgt 10 54.084 ± 10.000 ns/op toArraySized avgt 10 58.555 ± 0.745 ns/op toArrayUnsized avgt 10 54.025 ± 0.343 ns/op toStringArraySize avgt 10 154.291 ± 2.060 ns/op toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op RESULTS https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 32. STRING[] SLOWER THAN OBJECT[]? Object[] objects = new Integer[20]; objects[0] = “foo”; https://shipilev.net/blog/2016/arrays-wisdom-ancients/
  • 33. STRING[] SLOWER THAN OBJECT[]? Object[] objects = new Integer[20]; objects[0] = “foo”; Possible Runtime Check Sometimes JIT Eliminates It https://shipilev.net/blog/2016/arrays-wisdom-ancients/
  • 34. TRICKY WHEN ARRAY IS PASSED IN list.toArray(new String[…]); OBJECT[] IS COMMONLY USED, SO SPECIAL CASE.
  • 35. toArray avgt 10 54.084 ± 10.000 ns/op toArraySized avgt 10 58.555 ± 0.745 ns/op toArrayUnsized avgt 10 54.025 ± 0.343 ns/op toStringArraySize avgt 10 154.291 ± 2.060 ns/op toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op IS WRONG. https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 36. WHY IS NO ARRAY / UNSIZED FASTER // allocate dest = malloc(sizeof(E) * len); // zero-initialize for ( int i = 0; i < len; ++i ) { dest[i] = null; } // copy for ( int i = 0; i < len; ++i ) { dest[i] = src[i]; } Dead Stores
  • 37. Integer.toString(NUM) “” + NUM A B C StringBuilder builder = new StringBuilder(); builder.append(NUM); builder.toString(); 05A
  • 38. static final int NUM = ThreadLocalRandom.current().nextInt() builder avgt 10 47.688 ± 3.866 ns/op concat avgt 10 33.118 ± 0.840 ns/op toString avgt 10 41.105 ± 2.005 ns/op 05A
  • 39. javap -c public java.lang.String concat(); Code: 0: new StringBuilder 3: dup 4: invokespecial StringBuilder."<init>":()V 7: ldc "" 9: invokevirtual StringBuilder.append(LString;) 12: getstatic NUM:I 15: invokevirtual StringBuilder.append(I) 18: invokevirtual StringBuilder.toString() 21: areturn 05A
  • 40. public java.lang.String concat() { return new StringBuilder(). append(“”). append(NUM). toString(); } IN JAVA... 05B
  • 41. return new StringBuilder(). append(“”). append(NUM). toString(); StringBuilder builder = new StringBuilder(); builder.append(NUM); VS. 05B return builder.toString();
  • 42. 05B RESULTS builder avgt 10 41.122 ± 1.688 ns/op concat avgt 10 35.173 ± 3.092 ns/op concatLikeBuilder avgt 10 32.536 ± 2.302 ns/op
  • 44. static final int NUM = 1 << 20 builder avgt 10 40.734 ± 0.997 ns/op concat avgt 10 3.343 ± 0.205 ns/op toString avgt 10 32.877 ± 0.450 ns/op 05C
  • 45. public java.lang.String concat(); Code: 0: ldc “1048576” 2: areturn public java.lang.String concat() { return “1048576”; } javap -c 05C
  • 46. CONSTANT FOLDING & PROPAGATION static final int NUM = 1 << 20 static final int NUM = 1_048_576 constant fold constant propagate “” + NUM “” + 1_048_576 constant fold “1048576” 05C
  • 48. String str = “”; for ( int i = 0; i < 100; ++i ) { str += i; } A B StringBuilder builder = new StringBuilder(); for ( int i = 0; i < 100; ++i ) { builder.append(i); } 06
  • 49. RESULTS builder avgt 10 979.112 ± 59.893 ns/op concat avgt 10 2661.941 ± 135.251 ns/op 05
  • 50. FAST IN ONE CONTEXT CAN BE SLOW IN ANOTHER.
  • 51. RESULTS speed objects allocated* memory consumed* concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes * Memory usage measured separately with Caliper 06
  • 52. RESULTS speed objects allocated* memory consumed* concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes sizedBuilder 887.717 ± 62.569 ns/op 4 objects 1464 bytes * Memory usage measured separately with Caliper 06
  • 53. APPLIES TO COLLECTIONS, TOO. HashSet 16 HashMap 16 Hashtable 16 LinkedList 1 ArrayList 10 Vector 10 StringBuilder 16 StringBuffer 16 http://www.slideshare.net/cnbailey/memory-efficient-java
  • 54. O(N)?
  • 55. obj.invoke with 1 type obj.invoke with 2 type A B C obj.invoke with 3 types (monomoprhic) (bimorphic) (trimorphic) (megamorphic) 07A
  • 56. THERE IS A CLIFF AT 3 TYPES.
  • 57. https://shipilev.net/blog/2015/black-magic-method-dispatch/ func.apply(x); if ( func.getClass() == Square.class ) { x * x; } else if ( func.getClass() == Cube.class ) { x * x * x } else { … } func.apply(x); <= 2 types > 2 types
  • 58. @Setup public void setup() { for ( int i = 0; i < 20_000; ++i ) { if ( morphism >= 1 ) func = new Square(); call(); if ( morphism >=2 ) func = new Cube(); call(); if ( morphism >= 3 ) … call(); if ( morphism >= 4 ) … call(); } // regardless of morphism -- // use Square in the end func = new Square(); } @Benchmark public int call() { int x = nums[index]; index = (index + 1) % nums.length; return func.apply(x); }
  • 59. call 1 avgt 10 8.120 ± 0.103 ns/op call 2 avgt 10 8.225 ± 0.113 ns/op call 3 avgt 10 8.170 ± 0.329 ns/op call 4 avgt 10 8.189 ± 0.241 ns/op RESULTS 07A (morphism)
  • 61. @Benchmark public int callLoop() { int sum = 0; for ( int x: nums ) { sum += call(x); } return sum; } public int call(int x) { return func.apply(x); } 07B https://shipilev.net/blog/2015/black-magic-method-dispatch/
  • 62. 07B RESULTS callLoop 1 avgt 10 4079.187 ± 269.537 ns/op callLoop 2 avgt 10 6090.224 ± 209.573 ns/op callLoop 3 avgt 10 20508.673 ± 18484.645 ns/op callLoop 4 avgt 10 20271.124 ± 17767.914 ns/op (morphism)
  • 64. PERFORMANCE IS FULL OF SURPRISES. UNINTUITIVE INTRICACIES SENSITIVITIES NON-OBVIOUS CLIFFS NOT ADDITIVE
  • 65. PERFORMANCE IS FULL OF SURPRISES. DON’T WORRY *TOO* MUCH. JUST WRITE CLEAN CODE. ONLY WORRY ABOUT THE HOTTEST CODE, IMPROVE AND MEASURE *CAREFULLY*.
  • 66. REMEMBER BEST WAY TO ADD 1…1396 int x = 0; x += 1; x += 2; x += 3; x += 4; x += 5; x += 6; x += 7; x += 8; x += 9; x += 10; x += 11; x += 12; x += 13; x += 14; x += 15; x += 16; x += 17; x += 18; x += 19; x += 20; x += 21; x += 22; x += 23; x += 24; x += 25; x += 26; x += 27; x += 28; x += 29; x += 30; x += 31; x += 32; x += 33; x += 34; x += 35; x += 36; x += 37; x += 38; x += 39; x += 40; x += 41; x += 42; x += 43; x += 44; x += 45; x += 46; x += 47; x += 48; x += 49; x += 50; x += 51; x += 52; x += 53; x += 54; x += 55; x += 56; x += 57; x += 58; x += 59; x += 60; x += 61; x += 62; x += 63; x += 64; x += 65; x += 66; x += 67; x += 68; x += 69; x += 70; x += 71; x += 72; x += 73; x += 74; x += 75; x += 76; x += 77; x += 78; x += 79; x += 80; x += 81; x += 82; x += 83; x += 84; x += 85; x += 86; x += 87; x += 88; x += 89; x += 90; x += 91; x += 92; x += 93; x += 94; x += 95; x += 96; x += 97; x += 98; x += 99; x += 100; x += 101; x += 102; x += 103; x += 104; x += 105; x += 106; x += 107; x += 108; x += 109; x += 110; x += 111; x += 112; x += 113; x += 114; x += 115; x += 116; x += 117; x += 118; x += 119; x += 120; x += 121; x += 122; x += 123; x += 124; x += 125; x += 126; x += 127; x += 128; x += 129; x += 130; x += 131; x += 132; x += 133; x += 134; x += 135; x += 136; x += 137; x += 138; x += 139; x += 140; x += 141; x += 142; x += 143; x += 144; x += 145; x += 146; x += 147; x += 148; x += 149; x += 150; x += 151; x += 152; x += 153; x += 154; x += 155; x += 156; x += 157; x += 158; x += 159; x += 160; x += 161; x += 162; x += 163; x += 164; x += 165; x += 166; x += 167; x += 168; x += 169; x += 170; x += 171; x += 172; x += 173; x += 174; x += 175; x += 176; x += 177; x += 178; x += 179; x += 180; x += 181; x += 182; x += 183; x += 184; x += 185; x += 186; x += 187; x += 188; x += 189; x += 190; x += 191; x += 192; x += 193; x += 194; x += 195; x += 196; x += 197; x += 198; x += 199; x += 200; x += 201; x += 202; x += 203; x += 204; x += 205; x += 206; x += 207; x += 208; x += 209; x += 210;
  • 67. int n = 1395; return n * (n + 1) / 2;A B C int n = 1396; return n * (n + 1) / 2; int n = 1397; return n * (n + 1) / 2; 2.515 ± 0.043 ns/op 2.532 ± 0.089 ns/op 2.580 ± 0.042 ns/op 01C
  • 68. REFERENCES ALEKSEY SHIPILËV http://shipilev.net/ PSYCHOMATIC LOBOTOMY SAW http://psy-lob-saw.blogspot.com/ MECHANICAL SYMPATHY http://mechanical-sympathy.blogspot.com/ JAVA SPECIALIST NEWSLETTER http://www.javaspecialists.eu/