Scala parallel-collections

Parallel Collections
with Scala

Jul 6' 2012 > Vikas Hazrati > vikas@knoldus.com > @vhazrati

Motivation

Multiple-cores

Popular Parallel Programming remains a formidable challenge.

Implicit Parallelism

scala> val list = (1 to 10000).toList

scala> list.map(_ + 42)

scala> list.par.map(_ + 42)

scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)

scala> res0.map(println(_))
1
2
3
4
5
res1: List[Unit] = List((), (), (), (), ())

scala> res0.par.map(println(_))
3
1
4
2
5
res2: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((),
(), (), (), ())

ParArray

ParVector

mutable.ParHashMap

mutable.ParHashSet

immutable.ParHashMap

immutable.ParHashSet

ParRange

ParTrieMap (collection.concurrent.TrieMaps are
new in 2.10)

Caution: Performance benefits visible only around several
Thousand elements in the collection

Machine Architecture
Depends on
JVM vendor and version
Per element workload
Specific collection – ParArray, ParTrieMap
Specific operation – transformer(filter), accessor (foreach)

Memory Management

map, fold and filter
scala> val parArray = (1 to 1000000).toArray.par
scala> parArray.fold(0)(_+_)
res3: Int = 1784293664

scala> val narArray = (1 to 1000000).toArray

scala> narArray.fold(0)(_+_)
I did not notice
res5: Int = 1784293664
Difference on my
laptop
scala> parArray.fold(0)(_+_)
res6: Int = 1784293664

creating a parallel collection

import scala.collection.parallel.immutable.ParVector
With a new
val pv = new ParVector[Int]

val pv = Vector(1,2,3,4,5,6,7,8,9).par

Taking a sequential collection
And converting it

Parallel collections can be converted back to sequential collections with seq

Collections are inherently sequential

They are converted to || by copying elements into similar parallel collection

An example is List– it’s converted into a standard immutable parallel
sequence, which is a ParVector.
Overhead!

Array, Vector, HashMap do not have this overhead

how does it work?

Map reduce ?

by recursively “splitting” a given collection, applying an operation on each partition
of the collection in parallel, and re-“combining” all of the results that were completed
in parallel.

Side effecting operations Non Associative operations

scala> var sum =0 side effecting operation
sum: Int = 0

scala> val list = (1 to 1000).toList.par

scala> list.foreach(sum += _); sum
res7: Int = 452474

scala> var sum =0
sum: Int = 0

res8: Int = 497761

scala> var sum =0
sum: Int = 0

res9: Int = 422508

non-associative operations

The order in which function is applied to the elements of the collection can
be arbitrary
scala> val list = (1 to 1000).toList.par

scala> list.reduce(_-_)

res01: Int = -228888


res02: Int = -61000


res03: Int = -331818

associate but non-commutative

scala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").par
strings: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)

scala> val alphabet = strings.reduce(_++_)
alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz

out of order?

Operations may be out of order

BUT

Recombination of results would be in order

C
collection A
A B C B

A B C

performance

In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where
the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node;
instead, its position in the tree defines the key with which it is associated.

conversions

List is converted to
vector

Converting parallel to sequential takes constant time

architecture

splitters combiners

Split the collection into Is a Builder.
Non-trivial partitions so Combines split lists together.
That they can be accessed
in sequence

brickbats

Absence of configuration

Not all algorithms are parallel friendly

unproven

Now, if you want your code to not care whether it receives a
parallel or sequential collection, you should prefix it with
Gen: GenTraversable, GenIterable, GenSeq, etc.
These can be either parallel or sequential.

Scala parallel-collections

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to Scala parallel-collections (20)

More from Knoldus Inc. (20)

Recently uploaded (20)

Scala parallel-collections