SlideShare a Scribd company logo
Example Parallel Overview snow fork Summary
Parallel Computing with R
Péter Sólymos
Edmonton R User Group meeting, April 26, 2013
Example Parallel Overview snow fork Summary
Ovenbird example from 'detect' package
> str(oven)
'data.frame': 891 obs. of 11 variables:
$ count : int 1 0 0 1 0 0 0 0 0 0 ...
$ route : int 2 2 2 2 2 2 2 2 2 2 ...
$ stop : int 2 4 6 8 10 12 14 16 18 20 ...
$ pforest: num 0.947 0.903 0.814 0.89 0.542 ...
$ pdecid : num 0.575 0.562 0.549 0.679 0.344 ...
$ pagri : num 0 0 0 0 0.414 ...
$ long : num 609343 608556 607738 607680 607944 ...
$ lat : num 5949071 5947735 5946301 5944720 5943088 ...
$ observ : Factor w/ 4 levels "ARS","DW","RDW",..: 4 4 4 4 4 4 4 4 4 4 ...
$ julian : int 181 181 181 181 181 181 181 181 181 181 ...
$ timeday: int 2 4 6 8 10 12 14 16 18 20 ...
Example Parallel Overview snow fork Summary
NegBin GLM with bootstrap
> library(MASS)
> m <- glm.nb(count ~ pforest, oven)
> fun1 <- function(i) {
+ id <- sample.int(nrow(oven), nrow(oven), replace = TRUE)
+ coef(glm.nb(count ~ pforest, oven[id, ]))
+ }
> B <- 199
> system.time(bm <- sapply(1:B, fun1))
user system elapsed
26.79 0.02 27.11
> bm <- cbind(coef(m), bm)
> cbind(coef(summary(m))[, 1:2], `Boot. SE` = apply(bm, 1, sd))
Estimate Std. Error Boot. SE
(Intercept) -2.177 0.1277 0.1229
pforest 2.674 0.1709 0.1553
Example Parallel Overview snow fork Summary
Parallel bootstrap
> library(parallel)
> (cl <- makePSOCKcluster(3))
socket cluster with 3 nodes on host 'localhost'
> clusterExport(cl, "oven")
> tmp <- clusterEvalQ(cl, library(MASS))
> t0 <- proc.time()
> bm2 <- parSapply(cl, 1:B, fun1)
> proc.time() - t0
user system elapsed
0.00 0.00 11.06
> stopCluster(cl)
Example Parallel Overview snow fork Summary
High performance computing (HPC)
ˆ Parallel computing,
ˆ large memory and out-of-memory data,
ˆ interfaces for compiled code,
ˆ proling tools,
ˆ batch scheduling.
CRAN Task View: High-Performance and Parallel Computing with R
Example Parallel Overview snow fork Summary
Parallel computing
Embarassingly parallel problems:
ˆ bootstrap,
ˆ MCMC,
ˆ simulations.
Can be broken down into independent pieces.1
1Schmidberger et al. 2009 JSS: State of the Art in Parallel Computing with R
Example Parallel Overview snow fork Summary
Parallel computing
ˆ explicit (distributed memory),
ˆ implicit (shared memory),
ˆ grid,
ˆ Hadoop,
ˆ GPUs.
Example Parallel Overview snow fork Summary
Starting a cluster
 library(snow)
 cl - makeCluster(3, type = SOCK)
Cluster types:
ˆ SOCK, multicore
ˆ PVM, Parallel Virtual Machine
ˆ MPI, Message Passing Interface
ˆ NWS, NetWorkSpaces (multicore  grid)
Error: invalid connection
Example Parallel Overview snow fork Summary
Distribute stu, evaluate expressions
 clusterExport(cl, oven)
 clusterEvalQ(cl, library(MASS))
[[1]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
[[2]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
[[3]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
Example Parallel Overview snow fork Summary
Random Number Generation (RNG)
 library(rlecuyer)
 tmp - clusterEvalQ(cl, set.seed(1234))
 clusterEvalQ(cl, rnorm(5))
[[1]]
[1] -1.2071 0.2774 1.0844 -2.3457 0.4291
[[2]]
[1] -1.2071 0.2774 1.0844 -2.3457 0.4291
 snow:::clusterSetupRNG(cl)
[1] RNGstream
 clusterEvalQ(cl, rnorm(5))
[[1]]
[1] -1.14063 -0.49816 -0.76670 -0.04821 -1.09852
[[2]]
[1] 0.7050 0.4821 -1.2848 0.7198 0.7386
Important when calculating indices or doing simulations.
Example Parallel Overview snow fork Summary
Apply operations: split
 parallel:::parLapply
function (cl = NULL, X, fun, ...)
{
cl - defaultCluster(cl)
do.call(c, clusterApply(cl, x = splitList(X, length(cl)),
fun = lapply, fun, ...), quote = TRUE)
}
bytecode: 0x04c1eba8
environment: namespace:parallel
 snow:::splitList(1:10, length(cl))
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 6 7 8 9 10
Example Parallel Overview snow fork Summary
Apply operations: evaluate and combine
 f - function(i) i * 2
 (res - clusterApply(cl, snow:::splitList(1:10, length(cl)),
+ f))
[[1]]
[1] 2 4 6
[[2]]
[1] 8 10 12 14
[[3]]
[1] 16 18 20
 do.call(c, res)
[1] 2 4 6 8 10 12 14 16 18 20
Example Parallel Overview snow fork Summary
Apply operations: load balancing
 f - function(i) i * 2
 unlist(parallel:::parLapplyLB(cl, 1:10, f))
[1] 2 4 6 8 10 12 14 16 18 20
Example Parallel Overview snow fork Summary
Implicit parallelism
No need to distribute stu, only evaluate on child processes.
 mclapply(X, FUN, mc.cores)
Example Parallel Overview snow fork Summary
Summary
Parallel computing is not hard on a single computer.
Diculty comes in when using large, shared, and heterogeneous
resources.
 stopCluster(cl)

More Related Content

What's hot (19)

PPTX
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
PROIDEA
 
PPTX
App-o-Lockalypse now!
Oddvar Moe
 
PDF
Profiling Ruby
Ian Pointer
 
PDF
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Olga Lavrentieva
 
PDF
This is not your father's monitoring.
Mathias Herberts
 
PDF
OSTEP Chapter2 Introduction
Shuya Osaki
 
PDF
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB
 
PDF
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Mathias Herberts
 
KEY
Parallel Computing in R
mickey24
 
PDF
Tracing and awk in ns2
Pradeep Kumar TS
 
TXT
Db2
rishabshare
 
PDF
Kubernetes Tutorial
Ci Jie Li
 
PDF
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
akaptur
 
PPTX
C++ Optimization Tips
Abdelrahman Al-Ogail
 
PDF
Tests unitaires pour PostgreSQL avec pgTap
Rodolphe Quiédeville
 
PDF
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Altinity Ltd
 
PDF
Nodejs性能分析优化和分布式设计探讨
flyinweb
 
PDF
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
PDF
Bytes in the Machine: Inside the CPython interpreter
akaptur
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
PROIDEA
 
App-o-Lockalypse now!
Oddvar Moe
 
Profiling Ruby
Ian Pointer
 
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Olga Lavrentieva
 
This is not your father's monitoring.
Mathias Herberts
 
OSTEP Chapter2 Introduction
Shuya Osaki
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Mathias Herberts
 
Parallel Computing in R
mickey24
 
Tracing and awk in ns2
Pradeep Kumar TS
 
Kubernetes Tutorial
Ci Jie Li
 
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
akaptur
 
C++ Optimization Tips
Abdelrahman Al-Ogail
 
Tests unitaires pour PostgreSQL avec pgTap
Rodolphe Quiédeville
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Altinity Ltd
 
Nodejs性能分析优化和分布式设计探讨
flyinweb
 
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Bytes in the Machine: Inside the CPython interpreter
akaptur
 

Viewers also liked (12)

DOC
Lesson 10 Application Program Interface
Laguna State Polytechnic University
 
PDF
Fork CMS
Lester Lievens
 
PDF
FORK Overview
James Uren
 
PPTX
Git & GitHub
SangJung Woo
 
PPT
Unix kernal
Shehrevar Davierwala
 
PPTX
Linux Process & CF scheduling
SangJung Woo
 
PPTX
System call (Fork +Exec)
Amit Ghosh
 
PDF
Part 04 Creating a System Call in Linux
Tushar B Kute
 
PPT
Chapter 3 - Processes
Wayne Jones Jnr
 
PPT
System call
Sumant Diwakar
 
PPTX
System calls
Bernard Senam
 
Lesson 10 Application Program Interface
Laguna State Polytechnic University
 
Fork CMS
Lester Lievens
 
FORK Overview
James Uren
 
Git & GitHub
SangJung Woo
 
Linux Process & CF scheduling
SangJung Woo
 
System call (Fork +Exec)
Amit Ghosh
 
Part 04 Creating a System Call in Linux
Tushar B Kute
 
Chapter 3 - Processes
Wayne Jones Jnr
 
System call
Sumant Diwakar
 
System calls
Bernard Senam
 
Ad

Similar to Parallel Computing with R (20)

PDF
Parallel Computing with R
Abhirup Mallik
 
PDF
St Petersburg R user group meetup 2, Parallel R
Andrew Bzikadze
 
PDF
Do snow.rwn
ARUN DN
 
PDF
R workshop xx -- Parallel Computing with R
Vivian S. Zhang
 
PDF
Matloff programming on-parallel_machines-2013
lepas Yikwa
 
PPTX
Using R on High Performance Computers
Dave Hiltbrand
 
KEY
R Jobs on the Cloud
John Doxaras
 
PDF
Parallel R in snow (english after 2nd slide)
Cdiscount
 
PDF
R, Hadoop and Amazon Web Services
Portland R User Group
 
PDF
"R, Hadoop, and Amazon Web Services (20 December 2011)"
Portland R User Group
 
PDF
Distributed Computing for Everyone
Giovanna Roda
 
PDF
Extending lifespan with Hadoop and R
Radek Maciaszek
 
PPTX
Concurrency and Parallelism, Asynchronous Programming, Network Programming
Prabu U
 
PDF
Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...
Ryan Rosario
 
PDF
Streaming Data in R
Rory Winston
 
PDF
Parallel Processing with IPython
Enthought, Inc.
 
PDF
Parallel R
Matt Moores
 
PDF
Tackling repetitive tasks with serial or parallel programming in R
Lun-Hsien Chang
 
PPTX
SQL Server 2017 Machine Learning Services
Sascha Dittmann
 
PPTX
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Parallel Computing with R
Abhirup Mallik
 
St Petersburg R user group meetup 2, Parallel R
Andrew Bzikadze
 
Do snow.rwn
ARUN DN
 
R workshop xx -- Parallel Computing with R
Vivian S. Zhang
 
Matloff programming on-parallel_machines-2013
lepas Yikwa
 
Using R on High Performance Computers
Dave Hiltbrand
 
R Jobs on the Cloud
John Doxaras
 
Parallel R in snow (english after 2nd slide)
Cdiscount
 
R, Hadoop and Amazon Web Services
Portland R User Group
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
Portland R User Group
 
Distributed Computing for Everyone
Giovanna Roda
 
Extending lifespan with Hadoop and R
Radek Maciaszek
 
Concurrency and Parallelism, Asynchronous Programming, Network Programming
Prabu U
 
Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...
Ryan Rosario
 
Streaming Data in R
Rory Winston
 
Parallel Processing with IPython
Enthought, Inc.
 
Parallel R
Matt Moores
 
Tackling repetitive tasks with serial or parallel programming in R
Lun-Hsien Chang
 
SQL Server 2017 Machine Learning Services
Sascha Dittmann
 
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Ad

More from Peter Solymos (6)

PDF
NACCB 2016 Madison WI
Peter Solymos
 
PDF
Isec july2 h1_solymos
Peter Solymos
 
PPTX
BURN-Solymos-Adat-klonozas-2014-07-16
Peter Solymos
 
PDF
How to deal with messy data?
Peter Solymos
 
PDF
Complex models in ecology: challenges and solutions
Peter Solymos
 
PDF
Poetry with R -- Dissecting the code
Peter Solymos
 
NACCB 2016 Madison WI
Peter Solymos
 
Isec july2 h1_solymos
Peter Solymos
 
BURN-Solymos-Adat-klonozas-2014-07-16
Peter Solymos
 
How to deal with messy data?
Peter Solymos
 
Complex models in ecology: challenges and solutions
Peter Solymos
 
Poetry with R -- Dissecting the code
Peter Solymos
 

Recently uploaded (20)

PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 

Parallel Computing with R

  • 1. Example Parallel Overview snow fork Summary Parallel Computing with R Péter Sólymos Edmonton R User Group meeting, April 26, 2013
  • 2. Example Parallel Overview snow fork Summary Ovenbird example from 'detect' package > str(oven) 'data.frame': 891 obs. of 11 variables: $ count : int 1 0 0 1 0 0 0 0 0 0 ... $ route : int 2 2 2 2 2 2 2 2 2 2 ... $ stop : int 2 4 6 8 10 12 14 16 18 20 ... $ pforest: num 0.947 0.903 0.814 0.89 0.542 ... $ pdecid : num 0.575 0.562 0.549 0.679 0.344 ... $ pagri : num 0 0 0 0 0.414 ... $ long : num 609343 608556 607738 607680 607944 ... $ lat : num 5949071 5947735 5946301 5944720 5943088 ... $ observ : Factor w/ 4 levels "ARS","DW","RDW",..: 4 4 4 4 4 4 4 4 4 4 ... $ julian : int 181 181 181 181 181 181 181 181 181 181 ... $ timeday: int 2 4 6 8 10 12 14 16 18 20 ...
  • 3. Example Parallel Overview snow fork Summary NegBin GLM with bootstrap > library(MASS) > m <- glm.nb(count ~ pforest, oven) > fun1 <- function(i) { + id <- sample.int(nrow(oven), nrow(oven), replace = TRUE) + coef(glm.nb(count ~ pforest, oven[id, ])) + } > B <- 199 > system.time(bm <- sapply(1:B, fun1)) user system elapsed 26.79 0.02 27.11 > bm <- cbind(coef(m), bm) > cbind(coef(summary(m))[, 1:2], `Boot. SE` = apply(bm, 1, sd)) Estimate Std. Error Boot. SE (Intercept) -2.177 0.1277 0.1229 pforest 2.674 0.1709 0.1553
  • 4. Example Parallel Overview snow fork Summary Parallel bootstrap > library(parallel) > (cl <- makePSOCKcluster(3)) socket cluster with 3 nodes on host 'localhost' > clusterExport(cl, "oven") > tmp <- clusterEvalQ(cl, library(MASS)) > t0 <- proc.time() > bm2 <- parSapply(cl, 1:B, fun1) > proc.time() - t0 user system elapsed 0.00 0.00 11.06 > stopCluster(cl)
  • 5. Example Parallel Overview snow fork Summary High performance computing (HPC) ˆ Parallel computing, ˆ large memory and out-of-memory data, ˆ interfaces for compiled code, ˆ proling tools, ˆ batch scheduling. CRAN Task View: High-Performance and Parallel Computing with R
  • 6. Example Parallel Overview snow fork Summary Parallel computing Embarassingly parallel problems: ˆ bootstrap, ˆ MCMC, ˆ simulations. Can be broken down into independent pieces.1 1Schmidberger et al. 2009 JSS: State of the Art in Parallel Computing with R
  • 7. Example Parallel Overview snow fork Summary Parallel computing ˆ explicit (distributed memory), ˆ implicit (shared memory), ˆ grid, ˆ Hadoop, ˆ GPUs.
  • 8. Example Parallel Overview snow fork Summary Starting a cluster library(snow) cl - makeCluster(3, type = SOCK) Cluster types: ˆ SOCK, multicore ˆ PVM, Parallel Virtual Machine ˆ MPI, Message Passing Interface ˆ NWS, NetWorkSpaces (multicore grid) Error: invalid connection
  • 9. Example Parallel Overview snow fork Summary Distribute stu, evaluate expressions clusterExport(cl, oven) clusterEvalQ(cl, library(MASS)) [[1]] [1] MASS methods stats graphics [5] grDevices utils datasets base [[2]] [1] MASS methods stats graphics [5] grDevices utils datasets base [[3]] [1] MASS methods stats graphics [5] grDevices utils datasets base
  • 10. Example Parallel Overview snow fork Summary Random Number Generation (RNG) library(rlecuyer) tmp - clusterEvalQ(cl, set.seed(1234)) clusterEvalQ(cl, rnorm(5)) [[1]] [1] -1.2071 0.2774 1.0844 -2.3457 0.4291 [[2]] [1] -1.2071 0.2774 1.0844 -2.3457 0.4291 snow:::clusterSetupRNG(cl) [1] RNGstream clusterEvalQ(cl, rnorm(5)) [[1]] [1] -1.14063 -0.49816 -0.76670 -0.04821 -1.09852 [[2]] [1] 0.7050 0.4821 -1.2848 0.7198 0.7386 Important when calculating indices or doing simulations.
  • 11. Example Parallel Overview snow fork Summary Apply operations: split parallel:::parLapply function (cl = NULL, X, fun, ...) { cl - defaultCluster(cl) do.call(c, clusterApply(cl, x = splitList(X, length(cl)), fun = lapply, fun, ...), quote = TRUE) } bytecode: 0x04c1eba8 environment: namespace:parallel snow:::splitList(1:10, length(cl)) [[1]] [1] 1 2 3 4 5 [[2]] [1] 6 7 8 9 10
  • 12. Example Parallel Overview snow fork Summary Apply operations: evaluate and combine f - function(i) i * 2 (res - clusterApply(cl, snow:::splitList(1:10, length(cl)), + f)) [[1]] [1] 2 4 6 [[2]] [1] 8 10 12 14 [[3]] [1] 16 18 20 do.call(c, res) [1] 2 4 6 8 10 12 14 16 18 20
  • 13. Example Parallel Overview snow fork Summary Apply operations: load balancing f - function(i) i * 2 unlist(parallel:::parLapplyLB(cl, 1:10, f)) [1] 2 4 6 8 10 12 14 16 18 20
  • 14. Example Parallel Overview snow fork Summary Implicit parallelism No need to distribute stu, only evaluate on child processes. mclapply(X, FUN, mc.cores)
  • 15. Example Parallel Overview snow fork Summary Summary Parallel computing is not hard on a single computer. Diculty comes in when using large, shared, and heterogeneous resources. stopCluster(cl)