SlideShare a Scribd company logo
Parallel Collections
         with Scala

Jul 6' 2012 > Vikas Hazrati > vikas@knoldus.com > @vhazrati
Motivation

Multiple-cores



Popular Parallel Programming remains a formidable challenge.




                                 Implicit Parallelism
scala> val list = (1 to 10000).toList




                    scala> list.map(_ + 42)

                    scala> list.par.map(_ + 42)
scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)

scala> res0.map(println(_))
1
2
3
4
5
res1: List[Unit] = List((), (), (), (), ())

scala> res0.par.map(println(_))
3
1
4
2
5
res2: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((),
(), (), (), ())
ParArray

ParVector

mutable.ParHashMap

mutable.ParHashSet

immutable.ParHashMap

immutable.ParHashSet

ParRange

ParTrieMap (collection.concurrent.TrieMaps are
new in 2.10)
Caution: Performance benefits visible only around several
             Thousand elements in the collection




             Machine Architecture
Depends on
                 JVM vendor and version
                     Per element workload
                      Specific collection – ParArray, ParTrieMap
                Specific operation – transformer(filter), accessor (foreach)

             Memory Management
map, fold and filter
scala> val parArray = (1 to 1000000).toArray.par
           scala> parArray.fold(0)(_+_)
           res3: Int = 1784293664


 scala> val narArray = (1 to 1000000).toArray

           scala> narArray.fold(0)(_+_)
                                                   I did not notice
           res5: Int = 1784293664
                                                   Difference on my
                                                   laptop
           scala> parArray.fold(0)(_+_)
           res6: Int = 1784293664
creating a parallel collection

    import scala.collection.parallel.immutable.ParVector
                                                             With a new
    val pv = new ParVector[Int]




     val pv = Vector(1,2,3,4,5,6,7,8,9).par

                                                    Taking a sequential collection
                                                    And converting it

Parallel collections can be converted back to sequential collections with seq
Collections are inherently sequential

They are converted to || by copying elements into similar parallel collection



An example is List– it’s converted into a standard immutable parallel
sequence, which is a ParVector.
                                                                        Overhead!

          Array, Vector, HashMap do not have this overhead
how does it work?

     Map reduce ?

 by recursively “splitting” a given collection, applying an operation on each partition
of the collection in parallel, and re-“combining” all of the results that were completed
in parallel.




         Side effecting operations                    Non Associative operations
scala> var sum =0      side effecting operation
sum: Int = 0

scala> val list = (1 to 1000).toList.par

scala> list.foreach(sum += _); sum
res7: Int = 452474

scala> var sum =0
sum: Int = 0

scala> list.foreach(sum += _); sum
res8: Int = 497761

scala> var sum =0
sum: Int = 0

scala> list.foreach(sum += _); sum
res9: Int = 422508
non-associative operations

The order in which function is applied to the elements of the collection can
be arbitrary
                scala> val list = (1 to 1000).toList.par

                scala> list.reduce(_-_)

                res01: Int = -228888

                scala> list.reduce(_-_)

                res02: Int = -61000

                scala> list.reduce(_-_)

                res03: Int = -331818
associate but non-commutative

scala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").par
strings: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
 ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)



scala> val alphabet = strings.reduce(_++_)
alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
out of order?

Operations may be out of order

BUT

Recombination of results would be in order

                                             C
        collection                                   A
    A        B       C                                       B

                                                 A       B       C
performance




In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where
 the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node;
instead, its position in the tree defines the key with which it is associated.
conversions


                                                        List is converted to
                                                        vector




Converting parallel to sequential takes constant time
architecture



      splitters                    combiners


Split the collection into    Is a Builder.
Non-trivial partitions so    Combines split lists together.
That they can be accessed
in sequence
brickbats

Absence of configuration



                           Not all algorithms are parallel friendly


  unproven

       Now, if you want your code to not care whether it receives a
       parallel or sequential collection, you should prefix it with
       Gen: GenTraversable, GenIterable, GenSeq, etc.
       These can be either parallel or sequential.

More Related Content

What's hot (20)

PDF
Getting Started With Scala
Meetu Maltiar
 
PPTX
Scala Introduction
Constantine Nosovsky
 
PDF
Data Structures In Scala
Knoldus Inc.
 
PDF
Scala collections
Inphina Technologies
 
PPTX
Purely Functional Data Structures in Scala
Vladimir Kostyukov
 
PDF
Data transformation-cheatsheet
Dieudonne Nahigombeye
 
PDF
Map, Reduce and Filter in Swift
Aleksandras Smirnovas
 
PPTX
Scala Back to Basics: Type Classes
Tomer Gabel
 
PPT
Rewriting Java In Scala
Skills Matter
 
ODP
Effective way to code in Scala
Knoldus Inc.
 
ODP
Type Parameterization
Knoldus Inc.
 
PDF
Haskell for data science
John Cant
 
ODP
Introducing scala
Meetu Maltiar
 
PPTX
Functional Programming in Swift
Saugat Gautam
 
PPTX
Collection and framework
SARAVANAN GOPALAKRISHNAN
 
PDF
Python programming : List and tuples
Emertxe Information Technologies Pvt Ltd
 
PDF
Functional Programming by Examples using Haskell
goncharenko
 
PDF
Python list
Prof. Dr. K. Adisesha
 
Getting Started With Scala
Meetu Maltiar
 
Scala Introduction
Constantine Nosovsky
 
Data Structures In Scala
Knoldus Inc.
 
Scala collections
Inphina Technologies
 
Purely Functional Data Structures in Scala
Vladimir Kostyukov
 
Data transformation-cheatsheet
Dieudonne Nahigombeye
 
Map, Reduce and Filter in Swift
Aleksandras Smirnovas
 
Scala Back to Basics: Type Classes
Tomer Gabel
 
Rewriting Java In Scala
Skills Matter
 
Effective way to code in Scala
Knoldus Inc.
 
Type Parameterization
Knoldus Inc.
 
Haskell for data science
John Cant
 
Introducing scala
Meetu Maltiar
 
Functional Programming in Swift
Saugat Gautam
 
Collection and framework
SARAVANAN GOPALAKRISHNAN
 
Python programming : List and tuples
Emertxe Information Technologies Pvt Ltd
 
Functional Programming by Examples using Haskell
goncharenko
 

Viewers also liked (8)

PDF
Scala style-guide
Knoldus Inc.
 
PDF
Scala traits aug24-introduction
Knoldus Inc.
 
PDF
Solid scala
Knoldus Inc.
 
PDF
Thinking functional-in-scala
Knoldus Inc.
 
PDF
Domain-driven design
Knoldus Inc.
 
PDF
OOPs Development with Scala
Knoldus Inc.
 
ODP
Akka Finite State Machine
Knoldus Inc.
 
ODP
Introduction to AWS IAM
Knoldus Inc.
 
Scala style-guide
Knoldus Inc.
 
Scala traits aug24-introduction
Knoldus Inc.
 
Solid scala
Knoldus Inc.
 
Thinking functional-in-scala
Knoldus Inc.
 
Domain-driven design
Knoldus Inc.
 
OOPs Development with Scala
Knoldus Inc.
 
Akka Finite State Machine
Knoldus Inc.
 
Introduction to AWS IAM
Knoldus Inc.
 
Ad

Similar to Scala parallel-collections (20)

PDF
Scala Collections
Meetu Maltiar
 
PPTX
Scalable Applications with Scala
Nimrod Argov
 
PPTX
Collection Framework in Java | Generics | Input-Output in Java | Serializatio...
Sagar Verma
 
ODP
Collection advance
Aakash Jain
 
ODP
Functional programming with Scala
Neelkanth Sachdeva
 
ODP
Functional Programming With Scala
Knoldus Inc.
 
PDF
Collections Java e Google Collections
André Faria Gomes
 
PDF
The Scala Programming Language
league
 
PDF
Introduction to parallel and distributed computation with spark
Angelo Leto
 
PPTX
Taxonomy of Scala
shinolajla
 
PPTX
collectionsframework210616084411 (1).pptx
ArunPatrick2
 
PDF
Spark workshop
Wojciech Pituła
 
PPTX
Numpy in python, Array operations using numpy and so on
SherinRappai
 
PDF
Meet scala
Wojciech Pituła
 
PPTX
Scala collections wizardry - Scalapeño
Sagie Davidovich
 
PDF
Underscore.js
timourian
 
PDF
Scala collections api expressivity and brevity upgrade from java
IndicThreads
 
PPT
description of Collections, seaching & Sorting
mdimberu
 
PDF
(How) can we benefit from adopting scala?
Tomasz Wrobel
 
Scala Collections
Meetu Maltiar
 
Scalable Applications with Scala
Nimrod Argov
 
Collection Framework in Java | Generics | Input-Output in Java | Serializatio...
Sagar Verma
 
Collection advance
Aakash Jain
 
Functional programming with Scala
Neelkanth Sachdeva
 
Functional Programming With Scala
Knoldus Inc.
 
Collections Java e Google Collections
André Faria Gomes
 
The Scala Programming Language
league
 
Introduction to parallel and distributed computation with spark
Angelo Leto
 
Taxonomy of Scala
shinolajla
 
collectionsframework210616084411 (1).pptx
ArunPatrick2
 
Spark workshop
Wojciech Pituła
 
Numpy in python, Array operations using numpy and so on
SherinRappai
 
Meet scala
Wojciech Pituła
 
Scala collections wizardry - Scalapeño
Sagie Davidovich
 
Underscore.js
timourian
 
Scala collections api expressivity and brevity upgrade from java
IndicThreads
 
description of Collections, seaching & Sorting
mdimberu
 
(How) can we benefit from adopting scala?
Tomasz Wrobel
 
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
PPTX
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
PPTX
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
PPTX
Java 17 features and implementation.pptx
Knoldus Inc.
 
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
PPTX
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
PPTX
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
PPTX
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
PPTX
Intro to Azure Container App Presentation
Knoldus Inc.
 
PPTX
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
PPTX
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
PPTX
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
PPTX
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
Java 17 features and implementation.pptx
Knoldus Inc.
 
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
Intro to Azure Container App Presentation
Knoldus Inc.
 
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 

Recently uploaded (20)

PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
The Future of Artificial Intelligence (AI)
Mukul
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 

Scala parallel-collections

  • 1. Parallel Collections with Scala Jul 6' 2012 > Vikas Hazrati > [email protected] > @vhazrati
  • 2. Motivation Multiple-cores Popular Parallel Programming remains a formidable challenge. Implicit Parallelism
  • 3. scala> val list = (1 to 10000).toList scala> list.map(_ + 42) scala> list.par.map(_ + 42)
  • 4. scala> List(1,2,3,4,5) res0: List[Int] = List(1, 2, 3, 4, 5) scala> res0.map(println(_)) 1 2 3 4 5 res1: List[Unit] = List((), (), (), (), ()) scala> res0.par.map(println(_)) 3 1 4 2 5 res2: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((), (), (), (), ())
  • 6. Caution: Performance benefits visible only around several Thousand elements in the collection Machine Architecture Depends on JVM vendor and version Per element workload Specific collection – ParArray, ParTrieMap Specific operation – transformer(filter), accessor (foreach) Memory Management
  • 7. map, fold and filter scala> val parArray = (1 to 1000000).toArray.par scala> parArray.fold(0)(_+_) res3: Int = 1784293664 scala> val narArray = (1 to 1000000).toArray scala> narArray.fold(0)(_+_) I did not notice res5: Int = 1784293664 Difference on my laptop scala> parArray.fold(0)(_+_) res6: Int = 1784293664
  • 8. creating a parallel collection import scala.collection.parallel.immutable.ParVector With a new val pv = new ParVector[Int] val pv = Vector(1,2,3,4,5,6,7,8,9).par Taking a sequential collection And converting it Parallel collections can be converted back to sequential collections with seq
  • 9. Collections are inherently sequential They are converted to || by copying elements into similar parallel collection An example is List– it’s converted into a standard immutable parallel sequence, which is a ParVector. Overhead! Array, Vector, HashMap do not have this overhead
  • 10. how does it work? Map reduce ? by recursively “splitting” a given collection, applying an operation on each partition of the collection in parallel, and re-“combining” all of the results that were completed in parallel. Side effecting operations Non Associative operations
  • 11. scala> var sum =0 side effecting operation sum: Int = 0 scala> val list = (1 to 1000).toList.par scala> list.foreach(sum += _); sum res7: Int = 452474 scala> var sum =0 sum: Int = 0 scala> list.foreach(sum += _); sum res8: Int = 497761 scala> var sum =0 sum: Int = 0 scala> list.foreach(sum += _); sum res9: Int = 422508
  • 12. non-associative operations The order in which function is applied to the elements of the collection can be arbitrary scala> val list = (1 to 1000).toList.par scala> list.reduce(_-_) res01: Int = -228888 scala> list.reduce(_-_) res02: Int = -61000 scala> list.reduce(_-_) res03: Int = -331818
  • 13. associate but non-commutative scala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").par strings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz) scala> val alphabet = strings.reduce(_++_) alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
  • 14. out of order? Operations may be out of order BUT Recombination of results would be in order C collection A A B C B A B C
  • 15. performance In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated.
  • 16. conversions List is converted to vector Converting parallel to sequential takes constant time
  • 17. architecture splitters combiners Split the collection into Is a Builder. Non-trivial partitions so Combines split lists together. That they can be accessed in sequence
  • 18. brickbats Absence of configuration Not all algorithms are parallel friendly unproven Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.