SlideShare a Scribd company logo
Stefan Marr, Daniele Bonetta
2016
Seminar on
Parallel and Concurrent Programming
Agenda
1. Modus Operandi
2. Introduction to
Concurrent Programming Models
3. Seminar Paper Overview
2
MODUS OPERANDI
3
Tasks and Deadlines
• Talk on selected paper (student 1)
– 30min with slides (+ 15min discussion)
• to be discussed with us 1 week before
– Summary (max. 500 word)
• 2 days before seminar, 11:59am
• Questions on assigned paper (student 2)
– Min. 5 questions
– 2 days before seminar, 11:59am
4
Report
Category 1: Theoretical treatment
• Focus on paper, related work, state of the art
of the field
• Detailed discussion
Category 2: Practical treatment of topic, for
instance
• Reproduce experiments/results
• Extend experiments
• Experiment with variations
5
Report
• paper summary (500 words)
• outline, content, and experiments to be
discussed with us
• Cat. 1: ca. 4000 word (excl. references)
– state of the art, context in field, and specific
technique from paper
• Cat. 2: ca. 2000 word (excl. references)
– Discuss experiments, gained insights, found
limitations, etc.
Deadline: Feb. 6th
6
Consultations
• For alternative paper proposals
• To prepare presentation!
• To agree on focus of report/experiments
– For experiments mandatory
7
Grading
• Required attendance: 80% of all meetings
• 50% slides, presentation, and discussion
• 50% write-up/experiments
8
Timeline
Oct. 5th Introduction to Concurrent
Programming Models
Oct. 10th Deadline: List of ranked papers
Oct. 12th Runtime Techniques for Big Data
and Parallelism
Week 3-5 Preparations and Consultations
Week 6-12 Presentations
Feb. 6th Deadline for Report
9
Got Background in
Concurrency/Parallelism?
Show of Hands!
10
Multicore is the Norm
8 Cores
200 Euro Phones
24 Cores
Workstation
>=72 Cores
Embedded System
Problem: Power Wall at ca. 5 GHz
CPUs don’t get Faster But Multiply
0.2
1.5
3.8
3.33
3.8
0
1
2
3
4
1990 1995 2000 2005 2010 2015
4, 6, 12,
… cores
GHz
1 core
Based on the Clock Frequency of Intel Processors
Power ≈ Voltage2  Frequency
Voltage = -15%
Frequency = -15%
Power = 1
Performance ≈ 1.8
Problem: Memory Wall
Memory Wall
1
10
100
1000
10000
1980 1985 1990 1995 2000 2005
CPU Frequency
DRAM Speeds
Relative
Performance
Gap
Source: Sun World Wide Analyst Conference Feb. 25, 2003
Multicore Transition
Work around physical limitations
Power Wall and Memory Wall
10/5/2016 17
MemoryMemory
MemoryMemory
Main Memory Main Memory
For a brief bit of history:
ENIAC’s recessive gene
Marcus Mitch, and Akera Atsushi. Penn Printout (March 1996)
http://www.upenn.edu/computing/printout/archive/v12/4/pdf/gene.pdf
ENIAC's main control panel, U. S. Army Photo
Decades of Research
and Solutions for Everything
10/5/2016 19
…
But no Silver Bullet
CSP
Locks, Monitors, …
Fork/Join
Transactional Memory
20
Data Flow
Actors
A Rough Categorization
21
Communicating
Isolates
Threads and Locks Coordinating
Threads
A Rough Categorization
22
Marr, S. (2013), 'Supporting Concurrency Abstractions in High-level Language Virtual Machines', PhD
thesis, Software Languages Lab, Vrije Universiteit Brussel.
Data Parallelism
THREADS AND LOCKS
Powerful but hard
23
Uniform Shared Memory
A Model
for the Machines We Used to Have
24
C/C++
Threads
• Sequences of instructions
• Unit of scheduling
– Preemptive and concurrent
– Or parallel
25
time
A Snake Game
• Multiple players
• Compete for ‘apples’
• Shared board
10/5/2016 26
Race Conditions and Data Races
Race Condition
• Result depending on
timing of operations
Data Race
• Race condition on
memory
• Synchronization
absent or incomplete
27
Locks
synchronized (board) {
board.moveLeft(snake)
}
28
Optimized Locking for more Parallelism
synchronized (board[3][3]) {
synchronized (board[3][2]) {
board.moveLeft(snake)
}
}
29
Strategy: Lock only cells you need to update
What could go
wrong?
Common Issues
• Lack of Progress
– Deadlock
– Livelock
• Race Condition
– Data race
– Atomicity violation
• Performance
– Sequential bottle necks
– False sharing
30
Basic Concepts
Shared Memory with Threads and Locks
• Threads
• Synchronization
• No safety guarantees
– Data Races
– Deadlocks
31
P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al.
P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis
P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. Bond et al.
P2.10 Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al.
Questions?
COORDINATING THREADS
Making Coordination Explicit
32
Communicating
Threads
Shared Memory with
Explicit Coordination
Raising the Abstraction Level
Libraries for
most languages
Two Main Variants
Temporal Isolation
Transactional Memory
Explicit Communication
Channel or Message-based
34
Transactional Memory
atomic {
board.moveLeft(snake)
}
35
Coordinated by
Runtime System
Transactional Memory
Simple Programing Model
• No Data Races
(within transactions)
• No Deadlocks
36
Issues
• Performance overhead
• Still experimental
• Livelocks
• Inter-transactional
race conditions
• I/O semantics
Some Issues
atomic {
dataArray = getData();
fork { compute(dataArray[0]); }
compute(dataArray[1]);
}
37
P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al.
P1.1 Transactional Data Structure Libraries, A. Spiegelman et al.
P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al.
What happens with
forked thread when
transaction aborts?
Channel-based Communication
coordChannel ! (#moveLeft, snake)
38
for i in players():
msg ? coordChannels[i]
match msg:
(#moveLeft, snake):
board[…,…] = …
Player Thread
Coordinator Thread
Coordinator Thread
Player Thread Player Thread
send
receive
High-level communication
but no safety guarantees
Coordinating Threads
Transactional Memory
• Transactions
• Simple Programming Model
• Practical Issues
Channel/Message Communication
• Explicit coordination
– Channels or message sending
– Higher abstraction level
• No safety guarantees
39
P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S.
Tasharofi et al.
P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc-
model, AMP'10)
Questions?
COMMUNICATING ISOLATES
Communication is Everything
40
Explicit Communication Only
Absence of Low-level Data Races
41
All Interactions Explicit
42
Actor A Actor B
Actor Principle
Many Many Variations
• Channel based
– Communicating Sequential Processes
• Message based
– Actor models
43
P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key
Properties, J. De Koster et al.
Communicating Event Loops
44
Actor A Actor B
One Message at a Time
Communicating Event Loops
45
Actor A Actor B
Actors Contain Objects
Communicating Event Loops
46
Actor A Actor B
Interacting via Messages
Message-based Communication
47
Player 1
Player 1
Board Actor
board <- moveLeft(snake)
class Board {
private array;
public moveLeft(snake) {
array[snake.x][snake.y] = ...
}
}
Player Actor
Board Actor
async send
actors.create(Board)
actors.create(Snake)
actors.create(Snake)
Main Program
Communicating Isolates
Message or Channel Based
• Explicit communication
• No shared memory
• Still potential for
– Behavioral deadlocks
– Livelocks
– Bad message inter-leavings
– Message protocol violations
48
P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et
al.
P1.11 Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14)
Questions?
DATA PARALLELISM
Parallelism for Structured Problems
49
DATA PARALLELISM WITH FORK/JOIN
Just one Example
50
Fork/Join with Work-Stealing
• Recursive
divide-and-conquer
• Automatic and efficient
parallel scheduling
• Widely available for C++,
Java, and .NET
10/5/2016 51
Blumofe, R. D.; Joerg, C. F.; Kuszmaul, B. C.; Leiserson, C. E.; Randall, K. H. & Zhou, Y. (1995),
'Cilk: An Efficient Multithreaded Runtime System', SIGPLAN Not. 30 (8), 207-216.
Typical Applications
• Recursive Algorithms1
– Mergesort
– List and tree traversals
• Parallel prefix, pack, and
sorting problems2
• Irregular and unbalanced
computation
– On directed acyclic graphs
(DAGs)
– Ideally tree-shaped
52
1) More material can be found at: http://homes.cs.washington.edu/~djg/teachingMaterials/spac/
2) Prefix Sums and Their Applications: http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf
Tiny Example: Summing a large Array
• Simple array with numbers
• Recursively divide
– Every ‘ ’ is a parallel fork
• Then do addition
– Every ‘ ’ is a join
53
Note: This example is academic, and could be better expressed with a parallel map/reduce
library, such as Scala’s Parallel Collections, Java 8 Streams, or Microsoft’s PLINQ.
46 9 42 7 55
45724965
4965
5 6
11
49
13
24
4572
72 45
9 9
18
42
Data Parallelism with Fork/Join
• Parallel programming
technique
• Recursive divide-and-
conquer
• Automatic and efficient
load-balancing
58
P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)
CONCLUSION CONCURRENCY
MODELS
59
Four Rough Categories
60
Communicating
Isolates
Threads and Locks
Coordinating
Threads
Data Parallelism
SEMINAR PAPERS
61
These are Suggestions
Please, feel free to
propose papers of your interest.
(Papers need to be approved by us)
62
Topics of Interest
• High-level language
concurrency models
– Actors, Communicating
Sequential Processes,
STM, Stream Processing,
...
• Tooling
– Debugging
– Profiling
• Implementation and
runtime systems
– Communication
mechanisms
– Data/object
representation
– System-level aspects
• Big Data Frameworks
– Programming models
– Runtime level problems
63
Papers without Artifacts
P1.1 Transactional Data Structure Libraries, A. Spiegelman et al.
(conc-model, PLDI'16)
P1.2 Type-Aware Transactions for Faster Concurrent Code, N.
Herman et al. (conc-model, runtime, EuroSys'16)
P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key
Properties, J. De Koster et al. (conc-model, Agere'16)
P1.4 Why Do Scala Developers Mix the Actor Model with other
Concurrency Models?, S. Tasharofi et al. (conc-model, ECOOP'13)
P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime,
Java'00)
P1.6 The Asynchronous Partitioned Global Address Space Model, V.
Saraswat et al. (conc-model, AMP'10)
64
Papers without Artifacts
P1.7 Pydron: Semi-Automatic Parallelization for Multi-
Core and the Cloud, S. C. Müller et al. (conc-model,
runtime, OSDI'15)
P1.8 Fast Splittable Pseudorandom Number Generators,
G. L. Steele et al. (runtime, OOPSLA'14)
P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-
P. Lozi et al. (runtime, EuroSys'15)
P1.10Application-Assisted Live Migration of Virtual
Machines with Java Applications, K.-Y. Hou et al.
(runtime, EuroSys'15)
P1.11Distributed Debugging for Mobile Networks, E.
Gonzalez Boix et al. (tooling, JSS'14)
65
Papers with Artifacts
P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui,
V. Trigonakis (conc-model, PPoPP'16)
P2.2 Transactional Tasks: Parallelism in Software
Transactions, J. Swalens et al. (conc-model,
ECOOP'16)
P2.3 StreamJIT: a commensal compiler for high-
performance stream programming, J. Bosboom et
al. (conc-model, runtime, OOPSLA'14)
P2.4 An Efficient Synchronization Mechanism for Multi-
core Systems, M. Aldinucci et al. (conc-model,
runtime, EuroPar'12)
P2.5 Parallel parsing made practical, A. Barenghi et al.
(runtime, SCP'15) 66
Papers with Artifacts
P2.6 SparkR : Scaling R Program with Spark, S.
Venkataraman et al. (conc-model, bigdata,
SIGMOD'16)
P2.7 SparkSQL: Relational Data Processing in Spark, M.
Armbrust et al. (bigdata, runtime, VLDB'14)
P2.8 Twitter Heron: Stream Processing at Scale, S.
Kulkarni et al. (bigdata, SIGMOD'15)
P2.9 OCTET: Capturing and Controlling Cross-Thread
Dependences Efficiently, M. D. Bond et al. (tooling,
OOPSLA'13)
P2.10Efficient and Thread-Safe Objects for Dynamically-
Typed Languages, B. Daloze et al. (runtime,
OOPSLA'16) 67

More Related Content

What's hot (20)

PDF
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Databricks
 
PDF
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Provectus
 
PPT
An Introduction to JVM Internals and Garbage Collection in Java
Abhishek Asthana
 
PDF
Re-engineering Eclipse MDT/OCL for Xtext
Edward Willink
 
PDF
[Question Paper] Linux Administration (75:25 Pattern) [April / 2015]
Mumbai B.Sc.IT Study
 
PDF
Highly Scalable Java Programming for Multi-Core System
James Gan
 
PPTX
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 
PDF
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Rafał Leszko
 
PPTX
JVM Memory Model - Yoav Abrahami, Wix
Codemotion Tel Aviv
 
PDF
WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...
antopensource
 
PPT
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
PPTX
A Study of Variability Spaces in Open Source Software
sarah_nadi
 
PDF
Making fitting in RooFit faster
Patrick Bos
 
PDF
Learning to Translate with Joey NMT
Julia Kreutzer
 
PDF
OpenCV DNN module vs. Ours method
Ryosuke Tanno
 
PDF
Iron* - An Introduction to Getting Dynamic on .NET
Kristian Kristensen
 
PDF
"Making OpenCV Code Run Fast," a Presentation from Intel
Edge AI and Vision Alliance
 
PDF
Be a Zen monk, the Python way
Sriram Murali
 
PDF
Python VS GO
Ofir Nir
 
ODP
LCDS - State Presentation
Ruochun Tzeng
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Databricks
 
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Provectus
 
An Introduction to JVM Internals and Garbage Collection in Java
Abhishek Asthana
 
Re-engineering Eclipse MDT/OCL for Xtext
Edward Willink
 
[Question Paper] Linux Administration (75:25 Pattern) [April / 2015]
Mumbai B.Sc.IT Study
 
Highly Scalable Java Programming for Multi-Core System
James Gan
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Rafał Leszko
 
JVM Memory Model - Yoav Abrahami, Wix
Codemotion Tel Aviv
 
WWX14 speech : Justin Donaldson "Promhx : Cross-platform Promises and Reactiv...
antopensource
 
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
A Study of Variability Spaces in Open Source Software
sarah_nadi
 
Making fitting in RooFit faster
Patrick Bos
 
Learning to Translate with Joey NMT
Julia Kreutzer
 
OpenCV DNN module vs. Ours method
Ryosuke Tanno
 
Iron* - An Introduction to Getting Dynamic on .NET
Kristian Kristensen
 
"Making OpenCV Code Run Fast," a Presentation from Intel
Edge AI and Vision Alliance
 
Be a Zen monk, the Python way
Sriram Murali
 
Python VS GO
Ofir Nir
 
LCDS - State Presentation
Ruochun Tzeng
 

Similar to Seminar on Parallel and Concurrent Programming (20)

PDF
Our Concurrent Past; Our Distributed Future
C4Media
 
PDF
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
TEST Huddle
 
PDF
Peyton jones-2011-parallel haskell-the_future
Takayuki Muranushi
 
PDF
Simon Peyton Jones: Managing parallelism
Skills Matter
 
PPTX
Ruby Concurrency
Egor Hamaliy
 
PPT
01-MessagePassingFundamentals.ppt
HarshitPal37
 
PPT
Parallel Programming: Beyond the Critical Section
Tony Albrecht
 
PPTX
20090720 smith
Michael Karpov
 
PDF
Topic 4: Concurrency
Zubair Nabi
 
PPTX
Parallel Programming Models: Shared variable model
SHASHIKANT346021
 
PDF
Introduction to multicore .ppt
Rajagopal Nagarajan
 
PPT
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
tugrulh
 
PPT
cs2110Concurrency1.ppt
narendra551069
 
PPT
Parallel Computing 2007: Overview
Geoffrey Fox
 
PPT
parallel programming models
Swetha S
 
PPTX
Introduction to Concurrent Programming
Dilum Bandara
 
PDF
Concurrent and Distributed Applications with Akka, Java and Scala
Fernando Rodriguez
 
ODP
Multithreading 101
Tim Penhey
 
PDF
ParaForming - Patterns and Refactoring for Parallel Programming
khstandrews
 
PPTX
Concurrency Constructs Overview
stasimus
 
Our Concurrent Past; Our Distributed Future
C4Media
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
TEST Huddle
 
Peyton jones-2011-parallel haskell-the_future
Takayuki Muranushi
 
Simon Peyton Jones: Managing parallelism
Skills Matter
 
Ruby Concurrency
Egor Hamaliy
 
01-MessagePassingFundamentals.ppt
HarshitPal37
 
Parallel Programming: Beyond the Critical Section
Tony Albrecht
 
20090720 smith
Michael Karpov
 
Topic 4: Concurrency
Zubair Nabi
 
Parallel Programming Models: Shared variable model
SHASHIKANT346021
 
Introduction to multicore .ppt
Rajagopal Nagarajan
 
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
tugrulh
 
cs2110Concurrency1.ppt
narendra551069
 
Parallel Computing 2007: Overview
Geoffrey Fox
 
parallel programming models
Swetha S
 
Introduction to Concurrent Programming
Dilum Bandara
 
Concurrent and Distributed Applications with Akka, Java and Scala
Fernando Rodriguez
 
Multithreading 101
Tim Penhey
 
ParaForming - Patterns and Refactoring for Parallel Programming
khstandrews
 
Concurrency Constructs Overview
stasimus
 
Ad

More from Stefan Marr (19)

PPTX
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Stefan Marr
 
PPTX
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Stefan Marr
 
PPTX
Building High-Performance Language Implementations With Low Effort
Stefan Marr
 
PPTX
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Stefan Marr
 
PPTX
Supporting Concurrency Abstractions in High-level Language Virtual Machines
Stefan Marr
 
PDF
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Stefan Marr
 
PDF
Sly and the RoarVM: Parallel Programming with Smalltalk
Stefan Marr
 
PDF
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Stefan Marr
 
PDF
Sly and the RoarVM: Exploring the Manycore Future of Programming
Stefan Marr
 
PDF
PHP.next: Traits
Stefan Marr
 
PDF
The Price of the Free Lunch: Programming in the Multicore Era
Stefan Marr
 
PDF
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Stefan Marr
 
PPTX
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...
Stefan Marr
 
PPTX
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Stefan Marr
 
PPTX
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Stefan Marr
 
PPTX
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Stefan Marr
 
PDF
VMADL: An Architecture Definition Language for Variability and Composition ...
Stefan Marr
 
PPT
Metaprogrammierung und Reflection
Stefan Marr
 
PPT
Traits: A New Language Feature for PHP?
Stefan Marr
 
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Stefan Marr
 
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Stefan Marr
 
Building High-Performance Language Implementations With Low Effort
Stefan Marr
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Stefan Marr
 
Supporting Concurrency Abstractions in High-level Language Virtual Machines
Stefan Marr
 
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Stefan Marr
 
Sly and the RoarVM: Parallel Programming with Smalltalk
Stefan Marr
 
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Stefan Marr
 
Sly and the RoarVM: Exploring the Manycore Future of Programming
Stefan Marr
 
PHP.next: Traits
Stefan Marr
 
The Price of the Free Lunch: Programming in the Multicore Era
Stefan Marr
 
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Stefan Marr
 
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...
Stefan Marr
 
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Stefan Marr
 
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Stefan Marr
 
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Stefan Marr
 
VMADL: An Architecture Definition Language for Variability and Composition ...
Stefan Marr
 
Metaprogrammierung und Reflection
Stefan Marr
 
Traits: A New Language Feature for PHP?
Stefan Marr
 
Ad

Recently uploaded (20)

PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PPTX
DAY 1_QUARTER1 ENGLISH 5 WEEK- PRESENTATION.pptx
BanyMacalintal
 
PDF
Week 2 - Irish Natural Heritage Powerpoint.pdf
swainealan
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PPTX
infertility, types,causes, impact, and management
Ritu480198
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
PPTX
Introduction to Indian Writing in English
Trushali Dodiya
 
PDF
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
PPTX
DIGITAL CITIZENSHIP TOPIC TLE 8 MATATAG CURRICULUM
ROBERTAUGUSTINEFRANC
 
PDF
AI-Powered-Visual-Storytelling-for-Nonprofits.pdf
TechSoup
 
PPTX
TRANSLATIONAL AND ROTATIONAL MOTION.pptx
KIPAIZAGABAWA1
 
PPTX
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
PPTX
Difference between write and update in odoo 18
Celine George
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PDF
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
DAY 1_QUARTER1 ENGLISH 5 WEEK- PRESENTATION.pptx
BanyMacalintal
 
Week 2 - Irish Natural Heritage Powerpoint.pdf
swainealan
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
infertility, types,causes, impact, and management
Ritu480198
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
Introduction to Indian Writing in English
Trushali Dodiya
 
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
DIGITAL CITIZENSHIP TOPIC TLE 8 MATATAG CURRICULUM
ROBERTAUGUSTINEFRANC
 
AI-Powered-Visual-Storytelling-for-Nonprofits.pdf
TechSoup
 
TRANSLATIONAL AND ROTATIONAL MOTION.pptx
KIPAIZAGABAWA1
 
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
Difference between write and update in odoo 18
Celine George
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 

Seminar on Parallel and Concurrent Programming

  • 1. Stefan Marr, Daniele Bonetta 2016 Seminar on Parallel and Concurrent Programming
  • 2. Agenda 1. Modus Operandi 2. Introduction to Concurrent Programming Models 3. Seminar Paper Overview 2
  • 4. Tasks and Deadlines • Talk on selected paper (student 1) – 30min with slides (+ 15min discussion) • to be discussed with us 1 week before – Summary (max. 500 word) • 2 days before seminar, 11:59am • Questions on assigned paper (student 2) – Min. 5 questions – 2 days before seminar, 11:59am 4
  • 5. Report Category 1: Theoretical treatment • Focus on paper, related work, state of the art of the field • Detailed discussion Category 2: Practical treatment of topic, for instance • Reproduce experiments/results • Extend experiments • Experiment with variations 5
  • 6. Report • paper summary (500 words) • outline, content, and experiments to be discussed with us • Cat. 1: ca. 4000 word (excl. references) – state of the art, context in field, and specific technique from paper • Cat. 2: ca. 2000 word (excl. references) – Discuss experiments, gained insights, found limitations, etc. Deadline: Feb. 6th 6
  • 7. Consultations • For alternative paper proposals • To prepare presentation! • To agree on focus of report/experiments – For experiments mandatory 7
  • 8. Grading • Required attendance: 80% of all meetings • 50% slides, presentation, and discussion • 50% write-up/experiments 8
  • 9. Timeline Oct. 5th Introduction to Concurrent Programming Models Oct. 10th Deadline: List of ranked papers Oct. 12th Runtime Techniques for Big Data and Parallelism Week 3-5 Preparations and Consultations Week 6-12 Presentations Feb. 6th Deadline for Report 9
  • 11. Multicore is the Norm 8 Cores 200 Euro Phones 24 Cores Workstation >=72 Cores Embedded System
  • 12. Problem: Power Wall at ca. 5 GHz
  • 13. CPUs don’t get Faster But Multiply 0.2 1.5 3.8 3.33 3.8 0 1 2 3 4 1990 1995 2000 2005 2010 2015 4, 6, 12, … cores GHz 1 core Based on the Clock Frequency of Intel Processors
  • 14. Power ≈ Voltage2  Frequency Voltage = -15% Frequency = -15% Power = 1 Performance ≈ 1.8
  • 16. Memory Wall 1 10 100 1000 10000 1980 1985 1990 1995 2000 2005 CPU Frequency DRAM Speeds Relative Performance Gap Source: Sun World Wide Analyst Conference Feb. 25, 2003
  • 17. Multicore Transition Work around physical limitations Power Wall and Memory Wall 10/5/2016 17 MemoryMemory MemoryMemory Main Memory Main Memory
  • 18. For a brief bit of history: ENIAC’s recessive gene Marcus Mitch, and Akera Atsushi. Penn Printout (March 1996) http://www.upenn.edu/computing/printout/archive/v12/4/pdf/gene.pdf ENIAC's main control panel, U. S. Army Photo
  • 19. Decades of Research and Solutions for Everything 10/5/2016 19
  • 20. … But no Silver Bullet CSP Locks, Monitors, … Fork/Join Transactional Memory 20 Data Flow Actors
  • 22. A Rough Categorization 22 Marr, S. (2013), 'Supporting Concurrency Abstractions in High-level Language Virtual Machines', PhD thesis, Software Languages Lab, Vrije Universiteit Brussel. Data Parallelism
  • 24. Uniform Shared Memory A Model for the Machines We Used to Have 24 C/C++
  • 25. Threads • Sequences of instructions • Unit of scheduling – Preemptive and concurrent – Or parallel 25 time
  • 26. A Snake Game • Multiple players • Compete for ‘apples’ • Shared board 10/5/2016 26
  • 27. Race Conditions and Data Races Race Condition • Result depending on timing of operations Data Race • Race condition on memory • Synchronization absent or incomplete 27
  • 29. Optimized Locking for more Parallelism synchronized (board[3][3]) { synchronized (board[3][2]) { board.moveLeft(snake) } } 29 Strategy: Lock only cells you need to update What could go wrong?
  • 30. Common Issues • Lack of Progress – Deadlock – Livelock • Race Condition – Data race – Atomicity violation • Performance – Sequential bottle necks – False sharing 30
  • 31. Basic Concepts Shared Memory with Threads and Locks • Threads • Synchronization • No safety guarantees – Data Races – Deadlocks 31 P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al. P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. Bond et al. P2.10 Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al. Questions?
  • 32. COORDINATING THREADS Making Coordination Explicit 32 Communicating Threads
  • 33. Shared Memory with Explicit Coordination Raising the Abstraction Level Libraries for most languages
  • 34. Two Main Variants Temporal Isolation Transactional Memory Explicit Communication Channel or Message-based 34
  • 36. Transactional Memory Simple Programing Model • No Data Races (within transactions) • No Deadlocks 36 Issues • Performance overhead • Still experimental • Livelocks • Inter-transactional race conditions • I/O semantics
  • 37. Some Issues atomic { dataArray = getData(); fork { compute(dataArray[0]); } compute(dataArray[1]); } 37 P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al. P1.1 Transactional Data Structure Libraries, A. Spiegelman et al. P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al. What happens with forked thread when transaction aborts?
  • 38. Channel-based Communication coordChannel ! (#moveLeft, snake) 38 for i in players(): msg ? coordChannels[i] match msg: (#moveLeft, snake): board[…,…] = … Player Thread Coordinator Thread Coordinator Thread Player Thread Player Thread send receive High-level communication but no safety guarantees
  • 39. Coordinating Threads Transactional Memory • Transactions • Simple Programming Model • Practical Issues Channel/Message Communication • Explicit coordination – Channels or message sending – Higher abstraction level • No safety guarantees 39 P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S. Tasharofi et al. P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc- model, AMP'10) Questions?
  • 41. Explicit Communication Only Absence of Low-level Data Races 41
  • 42. All Interactions Explicit 42 Actor A Actor B Actor Principle
  • 43. Many Many Variations • Channel based – Communicating Sequential Processes • Message based – Actor models 43 P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al.
  • 44. Communicating Event Loops 44 Actor A Actor B One Message at a Time
  • 45. Communicating Event Loops 45 Actor A Actor B Actors Contain Objects
  • 46. Communicating Event Loops 46 Actor A Actor B Interacting via Messages
  • 47. Message-based Communication 47 Player 1 Player 1 Board Actor board <- moveLeft(snake) class Board { private array; public moveLeft(snake) { array[snake.x][snake.y] = ... } } Player Actor Board Actor async send actors.create(Board) actors.create(Snake) actors.create(Snake) Main Program
  • 48. Communicating Isolates Message or Channel Based • Explicit communication • No shared memory • Still potential for – Behavioral deadlocks – Livelocks – Bad message inter-leavings – Message protocol violations 48 P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al. P1.11 Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14) Questions?
  • 49. DATA PARALLELISM Parallelism for Structured Problems 49
  • 50. DATA PARALLELISM WITH FORK/JOIN Just one Example 50
  • 51. Fork/Join with Work-Stealing • Recursive divide-and-conquer • Automatic and efficient parallel scheduling • Widely available for C++, Java, and .NET 10/5/2016 51 Blumofe, R. D.; Joerg, C. F.; Kuszmaul, B. C.; Leiserson, C. E.; Randall, K. H. & Zhou, Y. (1995), 'Cilk: An Efficient Multithreaded Runtime System', SIGPLAN Not. 30 (8), 207-216.
  • 52. Typical Applications • Recursive Algorithms1 – Mergesort – List and tree traversals • Parallel prefix, pack, and sorting problems2 • Irregular and unbalanced computation – On directed acyclic graphs (DAGs) – Ideally tree-shaped 52 1) More material can be found at: http://homes.cs.washington.edu/~djg/teachingMaterials/spac/ 2) Prefix Sums and Their Applications: http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf
  • 53. Tiny Example: Summing a large Array • Simple array with numbers • Recursively divide – Every ‘ ’ is a parallel fork • Then do addition – Every ‘ ’ is a join 53 Note: This example is academic, and could be better expressed with a parallel map/reduce library, such as Scala’s Parallel Collections, Java 8 Streams, or Microsoft’s PLINQ. 46 9 42 7 55 45724965 4965 5 6 11 49 13 24 4572 72 45 9 9 18 42
  • 54. Data Parallelism with Fork/Join • Parallel programming technique • Recursive divide-and- conquer • Automatic and efficient load-balancing 58 P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)
  • 56. Four Rough Categories 60 Communicating Isolates Threads and Locks Coordinating Threads Data Parallelism
  • 58. These are Suggestions Please, feel free to propose papers of your interest. (Papers need to be approved by us) 62
  • 59. Topics of Interest • High-level language concurrency models – Actors, Communicating Sequential Processes, STM, Stream Processing, ... • Tooling – Debugging – Profiling • Implementation and runtime systems – Communication mechanisms – Data/object representation – System-level aspects • Big Data Frameworks – Programming models – Runtime level problems 63
  • 60. Papers without Artifacts P1.1 Transactional Data Structure Libraries, A. Spiegelman et al. (conc-model, PLDI'16) P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al. (conc-model, runtime, EuroSys'16) P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al. (conc-model, Agere'16) P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S. Tasharofi et al. (conc-model, ECOOP'13) P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00) P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc-model, AMP'10) 64
  • 61. Papers without Artifacts P1.7 Pydron: Semi-Automatic Parallelization for Multi- Core and the Cloud, S. C. Müller et al. (conc-model, runtime, OSDI'15) P1.8 Fast Splittable Pseudorandom Number Generators, G. L. Steele et al. (runtime, OOPSLA'14) P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.- P. Lozi et al. (runtime, EuroSys'15) P1.10Application-Assisted Live Migration of Virtual Machines with Java Applications, K.-Y. Hou et al. (runtime, EuroSys'15) P1.11Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14) 65
  • 62. Papers with Artifacts P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis (conc-model, PPoPP'16) P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al. (conc-model, ECOOP'16) P2.3 StreamJIT: a commensal compiler for high- performance stream programming, J. Bosboom et al. (conc-model, runtime, OOPSLA'14) P2.4 An Efficient Synchronization Mechanism for Multi- core Systems, M. Aldinucci et al. (conc-model, runtime, EuroPar'12) P2.5 Parallel parsing made practical, A. Barenghi et al. (runtime, SCP'15) 66
  • 63. Papers with Artifacts P2.6 SparkR : Scaling R Program with Spark, S. Venkataraman et al. (conc-model, bigdata, SIGMOD'16) P2.7 SparkSQL: Relational Data Processing in Spark, M. Armbrust et al. (bigdata, runtime, VLDB'14) P2.8 Twitter Heron: Stream Processing at Scale, S. Kulkarni et al. (bigdata, SIGMOD'15) P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. D. Bond et al. (tooling, OOPSLA'13) P2.10Efficient and Thread-Safe Objects for Dynamically- Typed Languages, B. Daloze et al. (runtime, OOPSLA'16) 67

Editor's Notes

  • #2: Talk: 18min + 5min questions
  • #12: Multicore is everywhere Just one-processor systems here, workstations usually have 2 processors, server even more Embedded systems already use manycore processors If you buy a notebook/computer something today, it is multicore
  • #13: GHz == consumed power == produced heat Cooling to complex, no way to put such things in portable devices
  • #14: Why do we need to?  So, why manycore then? Unfortunately CPUs are not becoming faster anymore Reached a peak in 2005, no CPUs are actually slower (simplified speaking) Notes: - show graph 1990, 2000, 2005, 2010 GHz count + CPUs red line power-wall   -89' Intel486™ DX Processor: 50, 33, 25 MHz - November 1, 1995, Intel® Pentium® Pro Processor, 200, 180, 166, 150 MHz   - November 20, 2000, Intel® Pentium® 4 Processor, 1.50 GHz, 1.40 GHz   - February, 2005: Intel® Pentium® 4 Processor Extreme Edition supporting HT Technology 3.80 GHz   (570)    -  3.33 GHz (with boost to 3.6 GHz) Intel® Core™ i7-980X processor Extreme Edition
  • #15: - decreasing GHz a bit and putting another core on the chip   allows to keep power consumption stable Theoretical speedup is times 1.8 but cores have lower sequential performance
  • #19: ENIAC
  • #27: AI players can consume as much CPU as they like Presentation can be done on different core
  • #52: - Efficient load balancing
  • #59: Good fit for tree-recursion Irregular computational complexity