SlideShare a Scribd company logo
Shooting the Rapids:
Getting the Best from Java 8
Streams
Kirk Pepperdine
@kcpeppe
Maurice Naftalin
@mauricenaftalin
About Kirk
• Specialises in performance tuning
• speaks frequently about performance
• author of performance tuning workshop
• Co-founder jClarity
• performance diagnositic tooling
• Java Champion (since 2006)
About Kirk
• Specialises in performance tuning
• speaks frequently about performance
• author of performance tuning workshop
• Co-founder jClarity
• performance diagnositic tooling
• Java Champion (since 2006)
About Maurice
About Maurice
About Maurice
Co-author Author
About Maurice
Co-author Author
Java
Champion
JavaOne
Rock Star
Agenda
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Commons
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Commons
• Justifying the Overhead
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
sum=2.181635
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
DoubleSummaryStatistics
{count=3, sum=2.181635, min=0.080123, average=0.727212,
max=1.101357}
Application time: (d+.d+)
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
Regex:
Application time: (d+.d+)
Pattern stoppedTimePattern =
Pattern.compile(" ");
⋮
Matcher matcher = stoppedTimePattern.matcher(logRecord);
String value = matcher.group(1);
Example: Processing GC Logfile
Processing GC Logfile: Old School Code
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");
String logRecord;

double value = 0;

while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {

value += (Double.parseDouble( matcher.group(1)));

}

}
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
What is a Lambda?
matcher
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcher
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcher
Predicate<Matcher> matches =
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
matcher.find()
->
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches = matcher.find()->
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
A lambda is a function
from arguments to result
matcher.find()->
matcher
matcher.find()
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTimePattern.matcher(input))

.filter(matcher -> matcher.find())

.map(matcher -> matcher.group(1))

.mapToDouble(s -> Double.parseDouble(s))

.summaryStatistics();

Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTimePattern.matcher(input))

.filter(matcher -> matcher.find())

.map(matcher -> matcher.group(1))

.mapToDouble(s -> Double.parseDouble(s))

.summaryStatistics();

data source
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTimePattern.matcher(input))

.filter(matcher -> matcher.find())

.map(matcher -> matcher.group(1))

.mapToDouble(s -> Double.parseDouble(s))

.summaryStatistics();

start streaming
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTimePattern.matcher(input))

.filter(matcher -> matcher.find())

.map(matcher -> matcher.group(1))

.mapToDouble(s -> Double.parseDouble(s))

.summaryStatistics();

map to Matcher
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTimePattern.matcher(input))

.filter(matcher -> matcher.find())

.map(matcher -> matcher.group(1))

.mapToDouble(s -> Double.parseDouble(s))

.summaryStatistics();

filter out
uninteresting bits
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTimePattern.matcher(input))

.filter(matcher -> matcher.find())

.map(matcher -> matcher.group(1))

.mapToDouble(s -> Double.parseDouble(s))

.summaryStatistics();

extract group
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTimePattern.matcher(input))

.filter(matcher -> matcher.find())

.map(matcher -> matcher.group(1))

.mapToDouble(s -> Double.parseDouble(s))

.summaryStatistics();

map String to
Double
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTimePattern.matcher(input))

.filter(matcher -> matcher.find())

.map(matcher -> matcher.group(1))

.mapToDouble(s -> Double.parseDouble(s))

.summaryStatistics();

aggregate results
What is a Stream?
• A sequence of values
• source and intermediate operations set the stream up lazily:
Stream<String> groupStream =

logFileReader.lines()

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble);
Source
What is a Stream?
• A sequence of values
• source and intermediate operations set the stream up lazily:
Stream<String> groupStream =

logFileReader.lines()

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble);
Intermediate
Operations
What is a Stream?
• The terminal operation pulls the values down the stream:
SummaryStatistics statistics =

logFileReader.lines()

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();Terminal
Operation
Visualising Sequential Streams
x2x0 x1 x3x0 x1 x2 x3
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3x1 x2 x3 ✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3 x1x2 x3 ❌✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3 x1x2x3 ❌✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Old School: 80200ms
Sequential: 25800ms
(>9m lines, MacBook Pro, Haswell i7, 4 cores, hyperthreaded)
Stream code is faster because operations are fused
How Does That Perform?
Can We Do Better?
Parallel streams make use of multiple cores
• split the data into segments
• each segment processed by its own thread
- on its own core – if possible
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
x2
Visualizing Parallel Streams
x0
x1
x3
x0
x1
x2
x3
x2
Visualizing Parallel Streams
x0
x1
x3
x0
x1
x2
x3
x2
Visualizing Parallel Streams
x1
x3
x0
x1
x3
✔
❌
x2
Visualizing Parallel Streams
x1 y3
x0
x1
x3
✔
❌
Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines().parallel()

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))

.mapToDouble(Double::parseDouble)

.summaryStatistics();
Results of Going Parallel:
• No benefit from using parallel streams while streaming data
Agenda
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Commons
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Commons
• Justifying the Overhead
Poorly Splitting Sources
• Some sources split much worse than others
– LinkedList vs. ArrayList
Poorly Splitting Sources
• Some sources split much worse than others
– LinkedList vs. ArrayList
• Streaming I/O is bad.
– kills the advantage of going parallel
Poorly Splitting Sources
• Some sources split much worse than others
– LinkedList vs. ArrayList
• Streaming I/O is bad.
– kills the advantage of going parallel
Poorly Splitting Sources
Streaming I/O Bottleneck
x2x0 x1 x3x0 x1 x2 x3
Streaming I/O Bottleneck
✔
❌
x2x1x0 x1 x3
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coveragenew spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coveragenew spliterator coverage
MappedByteBuffer mid
Included in JDK9 as FileChannelLinesSpliterator
StreamingIO: 56s
Spliterator: 88s
(>9m lines, MacBook Pro, Haswell i7, 4 cores, hyperthreaded)
Stream code is faster because operations are fused
LineSpliterator – results
When to Use Parallel Streams?
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent
• Source must be well-splitting
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent
• Source must be well-splitting
• Enough hardware to support allVM needs
– there may be other business afoot
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent
• Source must be well-splitting
• Enough hardware to support allVM needs
– there may be other business afoot
• Overhead of splitting must be justified
– intermediate operations need to be expensive
– and CPU-bound
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent
• Source must be well-splitting
• Enough hardware to support allVM needs
– there may be other business afoot
• Overhead of splitting must be justified
– intermediate operations need to be expensive
– and CPU-bound
http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
Agenda
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Commons
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Commons
• Justifying the Overhead
Tragedy of the Commons
Tragedy of the Commons
You have a finite amount of hardware
– it might be in your best interest to grab it all
– but if everyone behaves the same way…
Agenda
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Commons
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Commons
• Justifying the Overhead
Justifying the Overhead
CPNQ performance model:
C - number of submitters
P - number of CPUs
N - number of elements
Q - cost of the operation
Justifying the Overhead
Need to amortize setup costs
– N*Q needs to be large
– Q can often only be estimated
– N often should be >10,000 elements
If P is the number of processors, the formula assumes that
intermediate tasks are CPU bound
Don’t Have Too Many Threads!
• Too many threads cause frequent handoffs
• It costs ~80,000 cycles to handoff data between threads
• You can do a lot of processing in 80,000 cycles!
Fork/Join
Fork/Join
• Parallel streams implemented by Fork/Join framework
• added in Java 7, but difficult to code
• parallel streams are more usable
Fork/Join
• Parallel streams implemented by Fork/Join framework
• added in Java 7, but difficult to code
• parallel streams are more usable
Fork/Join
• Parallel streams implemented by Fork/Join framework
• added in Java 7, but difficult to code
• parallel streams are more usable
• Each segment of data is submitted as a ForkJoinTask
• ForkJoinTask.invoke() spawns a new task
• ForkJoinTask.join() retrieves the result
Fork/Join
• Parallel streams implemented by Fork/Join framework
• added in Java 7, but difficult to code
• parallel streams are more usable
• Each segment of data is submitted as a ForkJoinTask
• ForkJoinTask.invoke() spawns a new task
• ForkJoinTask.join() retrieves the result
• How Fork/Join works and performs is important to your
latency picture
Common Fork/Join Pool
Fork/Join by default uses a common thread pool
- default number of worker threads == number of logical cores - 1
- (submitting thread is pressed into service)
- can configure the pool via system properties:
- or create our own pool…
java.util.concurrent.ForkJoinPool.common.parallelism

java.util.concurrent.ForkJoinPool.common.threadFactory

java.util.concurrent.ForkJoinPool.common.exceptionHandler
Custom Fork/Join Pool
When used inside a ForkJoinPool, the ForkJoinTask.fork()
method uses the current pool:
ForkJoinPool ourOwnPool = new ForkJoinPool(10);
ourOwnPool.invoke(
() -> stream.parallel().
⋮
Don’t Have Too Few Threads!
• Fork/Join pool uses a work queue
• If tasks are CPU bound, no use increasing the size of the
thread pool
• But if not CPU bound, they are sitting in queue
accumulating dead time
• Can make thread pool bigger to reduce dead time
• Little’s Law tells us
Number of tasks in the system =

Arrival rate * Average service time
Little’s Law Example
System receives 400 Txs and it takes 100ms to clear a request
- Number of tasks in system = 0.100 * 400 = 40
On an 8 core machine with a CPU bound task
- implies 32 tasks are sitting in queue accumulating dead time
- Average response time 600 ms of which 500ms is dead time
- ~83% of service time is in waiting
public final V invoke() {
ForkJoinPool.common.getMonitor().submitTask(this);
int s;
if ((s = doInvoke() & DONE_MASK) != NORMAL) reportException(s);
ForkJoinPool.common.getMonitor().retireTask(this);
return getRawResult();
}
ForkJoinPool Observability
ForkJoinPool comes with no visibility
- need to instrument ForkJoinTask.invoke()
– gather data from ForkJoinPool to feed into Little’s Law
Conclusions
Sequential stream performance comparable to imperative code
Going parallel is worthwhile IF
- task is suitable
- data source is suitable
- environment is suitable
Need to monitor JDK to understanding bottlenecks
- Fork/Join pool is not well instrumented
Questions?
Questions?

More Related Content

PDF
Shooting the Rapids
Maurice Naftalin
 
PDF
Good and Wicked Fairies, and the Tragedy of the Commons: Understanding the Pe...
Maurice Naftalin
 
PDF
Let's Get to the Rapids
Maurice Naftalin
 
PDF
Parallel-Ready Java Code: Managing Mutation in an Imperative Language
Maurice Naftalin
 
PDF
Journey's End – Collection and Reduction in the Stream API
Maurice Naftalin
 
PPTX
Writing Hadoop Jobs in Scala using Scalding
Toni Cebrián
 
PPTX
Hot Streaming Java
nick_maiorano
 
PDF
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
Vyacheslav Lapin
 
Shooting the Rapids
Maurice Naftalin
 
Good and Wicked Fairies, and the Tragedy of the Commons: Understanding the Pe...
Maurice Naftalin
 
Let's Get to the Rapids
Maurice Naftalin
 
Parallel-Ready Java Code: Managing Mutation in an Imperative Language
Maurice Naftalin
 
Journey's End – Collection and Reduction in the Stream API
Maurice Naftalin
 
Writing Hadoop Jobs in Scala using Scalding
Toni Cebrián
 
Hot Streaming Java
nick_maiorano
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
Vyacheslav Lapin
 

What's hot (20)

PDF
Effective testing for spark programs Strata NY 2015
Holden Karau
 
PPTX
Beyond parallelize and collect - Spark Summit East 2016
Holden Karau
 
PDF
Unit testing of spark applications
Knoldus Inc.
 
PDF
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Taro L. Saito
 
PDF
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Sunghyouk Bae
 
PDF
Introduction of failsafe
Sunghyouk Bae
 
PDF
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Holden Karau
 
PDF
Graphite
Oleg Obleukhov
 
PDF
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Holden Karau
 
PDF
Ge aviation spark application experience porting analytics into py spark ml p...
Databricks
 
PDF
Kotlin @ Coupang Backend 2017
Sunghyouk Bae
 
PDF
Testing and validating spark programs - Strata SJ 2016
Holden Karau
 
PDF
HBase RowKey design for Akka Persistence
Konrad Malawski
 
PDF
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Holden Karau
 
PDF
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Holden Karau
 
PDF
Requery overview
Sunghyouk Bae
 
PDF
Reactive Streams / Akka Streams - GeeCON Prague 2014
Konrad Malawski
 
PDF
2014 akka-streams-tokyo-japanese
Konrad Malawski
 
PDF
Getting the best performance with PySpark - Spark Summit West 2016
Holden Karau
 
PPTX
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Samir Bessalah
 
Effective testing for spark programs Strata NY 2015
Holden Karau
 
Beyond parallelize and collect - Spark Summit East 2016
Holden Karau
 
Unit testing of spark applications
Knoldus Inc.
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Taro L. Saito
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Sunghyouk Bae
 
Introduction of failsafe
Sunghyouk Bae
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Holden Karau
 
Graphite
Oleg Obleukhov
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Holden Karau
 
Ge aviation spark application experience porting analytics into py spark ml p...
Databricks
 
Kotlin @ Coupang Backend 2017
Sunghyouk Bae
 
Testing and validating spark programs - Strata SJ 2016
Holden Karau
 
HBase RowKey design for Akka Persistence
Konrad Malawski
 
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Holden Karau
 
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Holden Karau
 
Requery overview
Sunghyouk Bae
 
Reactive Streams / Akka Streams - GeeCON Prague 2014
Konrad Malawski
 
2014 akka-streams-tokyo-japanese
Konrad Malawski
 
Getting the best performance with PySpark - Spark Summit West 2016
Holden Karau
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Samir Bessalah
 
Ad

Similar to Shooting the Rapids: Getting the Best from Java 8 Streams (20)

PDF
Harnessing the Power of Java 8 Streams
IndicThreads
 
PDF
Charles Sharp: Java 8 Streams
jessitron
 
PDF
Lambda.pdf
ManishWalia18
 
PDF
Java Lambda internals with invoke dynamic
Mohit Kumar
 
PPTX
Java 8 Lambda and Streams
Venkata Naga Ravi
 
PPTX
Java8 training - Class 1
Marut Singh
 
PDF
Streams in Java 8
Tobias Coetzee
 
PPTX
ADT02 - Java 8 Lambdas and the Streaming API
Michael Remijan
 
PDF
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
TechTalks
 
PPTX
Java 8
vpulec
 
PPTX
Java 8 streams
Manav Prasad
 
PDF
Apouc 2014-java-8-create-the-future
OUGTH Oracle User Group in Thailand
 
PDF
Java 8 - functional features
Rafal Rybacki
 
PPTX
java8
Arik Abulafya
 
PPTX
Exploring Streams and Lambdas in Java8
Isuru Samaraweera
 
PDF
Clean Lambdas & Streams in Java8
Victor Rentea
 
PPTX
Java8lambda
Isuru Samaraweera
 
PDF
JDK8 Functional API
Justin Lin
 
PPTX
Java 8 stream and c# 3.5
Quang Trần Duy
 
PDF
Going reactive in java
José Paumard
 
Harnessing the Power of Java 8 Streams
IndicThreads
 
Charles Sharp: Java 8 Streams
jessitron
 
Lambda.pdf
ManishWalia18
 
Java Lambda internals with invoke dynamic
Mohit Kumar
 
Java 8 Lambda and Streams
Venkata Naga Ravi
 
Java8 training - Class 1
Marut Singh
 
Streams in Java 8
Tobias Coetzee
 
ADT02 - Java 8 Lambdas and the Streaming API
Michael Remijan
 
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
TechTalks
 
Java 8
vpulec
 
Java 8 streams
Manav Prasad
 
Apouc 2014-java-8-create-the-future
OUGTH Oracle User Group in Thailand
 
Java 8 - functional features
Rafal Rybacki
 
Exploring Streams and Lambdas in Java8
Isuru Samaraweera
 
Clean Lambdas & Streams in Java8
Victor Rentea
 
Java8lambda
Isuru Samaraweera
 
JDK8 Functional API
Justin Lin
 
Java 8 stream and c# 3.5
Quang Trần Duy
 
Going reactive in java
José Paumard
 
Ad

Recently uploaded (20)

PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
Presentation about variables and constant.pptx
safalsingh810
 
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 

Shooting the Rapids: Getting the Best from Java 8 Streams

  • 1. Shooting the Rapids: Getting the Best from Java 8 Streams Kirk Pepperdine @kcpeppe Maurice Naftalin @mauricenaftalin
  • 2. About Kirk • Specialises in performance tuning • speaks frequently about performance • author of performance tuning workshop • Co-founder jClarity • performance diagnositic tooling • Java Champion (since 2006)
  • 3. About Kirk • Specialises in performance tuning • speaks frequently about performance • author of performance tuning workshop • Co-founder jClarity • performance diagnositic tooling • Java Champion (since 2006)
  • 9. Agenda • Introduction – lambdas, streams, and a logfile processing problem
  • 10. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources
  • 11. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons
  • 12. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons • Justifying the Overhead
  • 13. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮
  • 14. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮
  • 15. sum=2.181635 Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮
  • 16. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮ DoubleSummaryStatistics {count=3, sum=2.181635, min=0.080123, average=0.727212, max=1.101357}
  • 17. Application time: (d+.d+) Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮ Regex:
  • 18. Application time: (d+.d+) Pattern stoppedTimePattern = Pattern.compile(" "); ⋮ Matcher matcher = stoppedTimePattern.matcher(logRecord); String value = matcher.group(1); Example: Processing GC Logfile
  • 19. Processing GC Logfile: Old School Code Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); String logRecord;
 double value = 0;
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) {
 value += (Double.parseDouble( matcher.group(1)));
 }
 }
  • 20. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; What is a Lambda? matcher matcher.find() matcher matcher.find()
  • 21. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcher matcher.find() matcher matcher.find()
  • 22. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcher Predicate<Matcher> matches = matcher.find() matcher matcher.find()
  • 23. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find() matcher matcher.find()
  • 24. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find() -> matcher matcher.find()
  • 25. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find()-> matcher matcher.find()
  • 26. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = A lambda is a function from arguments to result matcher.find()-> matcher matcher.find()
  • 27. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();

  • 28. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 data source
  • 29. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 start streaming
  • 30. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 map to Matcher
  • 31. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 filter out uninteresting bits
  • 32. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 extract group
  • 33. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 map String to Double
  • 34. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 aggregate results
  • 35. What is a Stream? • A sequence of values • source and intermediate operations set the stream up lazily: Stream<String> groupStream =
 logFileReader.lines()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble); Source
  • 36. What is a Stream? • A sequence of values • source and intermediate operations set the stream up lazily: Stream<String> groupStream =
 logFileReader.lines()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble); Intermediate Operations
  • 37. What is a Stream? • The terminal operation pulls the values down the stream: SummaryStatistics statistics =
 logFileReader.lines()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();Terminal Operation
  • 38. Visualising Sequential Streams x2x0 x1 x3x0 x1 x2 x3 Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 39. Visualising Sequential Streams x2x0 x1 x3x1 x2 x3 ✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 40. Visualising Sequential Streams x2x0 x1 x3 x1x2 x3 ❌✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 41. Visualising Sequential Streams x2x0 x1 x3 x1x2x3 ❌✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 42. Old School: 80200ms Sequential: 25800ms (>9m lines, MacBook Pro, Haswell i7, 4 cores, hyperthreaded) Stream code is faster because operations are fused How Does That Perform?
  • 43. Can We Do Better? Parallel streams make use of multiple cores • split the data into segments • each segment processed by its own thread - on its own core – if possible
  • 44. Splitting the Data Implemented by a Spliterator:
  • 45. Splitting the Data Implemented by a Spliterator:
  • 46. Splitting the Data Implemented by a Spliterator:
  • 47. Splitting the Data Implemented by a Spliterator:
  • 48. Splitting the Data Implemented by a Spliterator:
  • 49. Splitting the Data Implemented by a Spliterator:
  • 50. Splitting the Data Implemented by a Spliterator:
  • 51. Splitting the Data Implemented by a Spliterator:
  • 52. Splitting the Data Implemented by a Spliterator:
  • 56. x2 Visualizing Parallel Streams x1 y3 x0 x1 x3 ✔ ❌
  • 57. Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines().parallel()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1))
 .mapToDouble(Double::parseDouble)
 .summaryStatistics();
  • 58. Results of Going Parallel: • No benefit from using parallel streams while streaming data
  • 60. Agenda • Introduction – lambdas, streams, and a logfile processing problem
  • 61. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources
  • 62. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons
  • 63. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons • Justifying the Overhead
  • 65. • Some sources split much worse than others – LinkedList vs. ArrayList Poorly Splitting Sources
  • 66. • Some sources split much worse than others – LinkedList vs. ArrayList • Streaming I/O is bad. – kills the advantage of going parallel Poorly Splitting Sources
  • 67. • Some sources split much worse than others – LinkedList vs. ArrayList • Streaming I/O is bad. – kills the advantage of going parallel Poorly Splitting Sources
  • 68. Streaming I/O Bottleneck x2x0 x1 x3x0 x1 x2 x3
  • 70. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage
  • 71. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer
  • 72. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer mid
  • 73. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer mid
  • 74. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coveragenew spliterator coverage MappedByteBuffer mid
  • 75. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coveragenew spliterator coverage MappedByteBuffer mid Included in JDK9 as FileChannelLinesSpliterator
  • 76. StreamingIO: 56s Spliterator: 88s (>9m lines, MacBook Pro, Haswell i7, 4 cores, hyperthreaded) Stream code is faster because operations are fused LineSpliterator – results
  • 77. When to Use Parallel Streams?
  • 78. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent
  • 79. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent • Source must be well-splitting
  • 80. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent • Source must be well-splitting • Enough hardware to support allVM needs – there may be other business afoot
  • 81. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent • Source must be well-splitting • Enough hardware to support allVM needs – there may be other business afoot • Overhead of splitting must be justified – intermediate operations need to be expensive – and CPU-bound
  • 82. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent • Source must be well-splitting • Enough hardware to support allVM needs – there may be other business afoot • Overhead of splitting must be justified – intermediate operations need to be expensive – and CPU-bound http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
  • 84. Agenda • Introduction – lambdas, streams, and a logfile processing problem
  • 85. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources
  • 86. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons
  • 87. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons • Justifying the Overhead
  • 88. Tragedy of the Commons
  • 89. Tragedy of the Commons You have a finite amount of hardware – it might be in your best interest to grab it all – but if everyone behaves the same way…
  • 91. Agenda • Introduction – lambdas, streams, and a logfile processing problem
  • 92. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources
  • 93. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons
  • 94. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons • Justifying the Overhead
  • 95. Justifying the Overhead CPNQ performance model: C - number of submitters P - number of CPUs N - number of elements Q - cost of the operation
  • 96. Justifying the Overhead Need to amortize setup costs – N*Q needs to be large – Q can often only be estimated – N often should be >10,000 elements If P is the number of processors, the formula assumes that intermediate tasks are CPU bound
  • 97. Don’t Have Too Many Threads! • Too many threads cause frequent handoffs • It costs ~80,000 cycles to handoff data between threads • You can do a lot of processing in 80,000 cycles!
  • 99. Fork/Join • Parallel streams implemented by Fork/Join framework • added in Java 7, but difficult to code • parallel streams are more usable
  • 100. Fork/Join • Parallel streams implemented by Fork/Join framework • added in Java 7, but difficult to code • parallel streams are more usable
  • 101. Fork/Join • Parallel streams implemented by Fork/Join framework • added in Java 7, but difficult to code • parallel streams are more usable • Each segment of data is submitted as a ForkJoinTask • ForkJoinTask.invoke() spawns a new task • ForkJoinTask.join() retrieves the result
  • 102. Fork/Join • Parallel streams implemented by Fork/Join framework • added in Java 7, but difficult to code • parallel streams are more usable • Each segment of data is submitted as a ForkJoinTask • ForkJoinTask.invoke() spawns a new task • ForkJoinTask.join() retrieves the result • How Fork/Join works and performs is important to your latency picture
  • 103. Common Fork/Join Pool Fork/Join by default uses a common thread pool - default number of worker threads == number of logical cores - 1 - (submitting thread is pressed into service) - can configure the pool via system properties: - or create our own pool… java.util.concurrent.ForkJoinPool.common.parallelism
 java.util.concurrent.ForkJoinPool.common.threadFactory
 java.util.concurrent.ForkJoinPool.common.exceptionHandler
  • 104. Custom Fork/Join Pool When used inside a ForkJoinPool, the ForkJoinTask.fork() method uses the current pool: ForkJoinPool ourOwnPool = new ForkJoinPool(10); ourOwnPool.invoke( () -> stream.parallel(). ⋮
  • 105. Don’t Have Too Few Threads! • Fork/Join pool uses a work queue • If tasks are CPU bound, no use increasing the size of the thread pool • But if not CPU bound, they are sitting in queue accumulating dead time • Can make thread pool bigger to reduce dead time • Little’s Law tells us Number of tasks in the system =
 Arrival rate * Average service time
  • 106. Little’s Law Example System receives 400 Txs and it takes 100ms to clear a request - Number of tasks in system = 0.100 * 400 = 40 On an 8 core machine with a CPU bound task - implies 32 tasks are sitting in queue accumulating dead time - Average response time 600 ms of which 500ms is dead time - ~83% of service time is in waiting
  • 107. public final V invoke() { ForkJoinPool.common.getMonitor().submitTask(this); int s; if ((s = doInvoke() & DONE_MASK) != NORMAL) reportException(s); ForkJoinPool.common.getMonitor().retireTask(this); return getRawResult(); } ForkJoinPool Observability ForkJoinPool comes with no visibility - need to instrument ForkJoinTask.invoke() – gather data from ForkJoinPool to feed into Little’s Law
  • 108. Conclusions Sequential stream performance comparable to imperative code Going parallel is worthwhile IF - task is suitable - data source is suitable - environment is suitable Need to monitor JDK to understanding bottlenecks - Fork/Join pool is not well instrumented