SlideShare a Scribd company logo
Direct QR factorizations for tall-and-skinny 
matrices in MapReduce architectures 
Austin Benson 
ICME, Stanford University 
David Gleich (Purdue) and Jim Demmel (UC-Berkeley) 
A Q 
R 
m 
n 
m 
n 
n 
n 
IEEEDATA 
October 8, 2013
Contributions 2 
I Numerically stable and scalable algorithm for QR and SVD of 
tall-and-skinny matrices in MapReduce 
I Performance and stability tradeos of several methods 
I Performance model: prediction within a factor of two 
I Code: https://github.com/arbenson/mrtsqr
MapReduce overview 3 
Two functions that operate on key value pairs: 
(key; value) 
map 
! (key; value) 
(key; hvalue1; : : : ; valueni) reduce 
! (key; value) 
shue stage between map and reduce to sort values by key.
MapReduce overview 4 
The programmer implements: 
I map(key, value) 
I reduce(key, h value1, : : :, valuen i) 
Handled by MapReduce framework, e.g., Hadoop: 
I shue 
I load balancing 
I reading and writing data 
I data serialization 
I fault tolerance 
I ...
MapReduce Example: ColorCount 5 
(key, value) input is (image id, image) 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
shuffle 
Map 1 Reduce 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
5 
2 
1 
1 
4 
2 
def ColorCountMap(key , val ) : 
for pixel in val : 
yield ( pixel , 1) 
def ColorCountReduce(key , vals ) : 
total = sum( vals ) 
yield (key , total )
Why MapReduce? (for scientists) 6 
MapReduce is restrictive! Why Bother? 
I Easy 
I load balancing 
I structured data I/O 
I fault tolerance 
I cheap clusters with large data storage 
Hadoop may not be the best option... 
Generate lots of 
data on 
supercomputer 
Post-process and 
analyze on 
MapReduce cluster
Tall-and-skinny matrices 7 
What are tall-and-skinny matrices? m  n 
n 
m A 
Examples: rows are data samples; blocks of A are images from a 
video; Krylov subspaces; unrolled tensors
Matrix representation 8 
We have matrices, so what are the key-value pairs? 
A = 
2 
1:0 0:0 
2:4 3:7 
0:8 4:2 
9:0 9:0 
664 
3 
775 
! 
2 
(1; [1:0; 0:0]) 
(2; [2:4; 3:7]) 
(3; [0:8; 4:2]) 
(4; [9:0; 9:0]) 
664 
3 
775 
(key, value) ! (row index, row)
Matrix representation: an example 9 
Scienti
c example: (x, y, z) coordinates and model number: 
((47570,103.429767811242,0,-16.525510963787,iDV7), [0.00019924 
-4.706066e-05 2.875293979e-05 2.456653e-05 -8.436627e-06 -1.508808e-05 
3.731976e-06 -1.048795e-05 5.229153e-06 6.323812e-06]) 
Figure: Aircraft simulation data. Aero/Astro Department, Stanford
Tall-and-skinny matrices 10 
Tall-and-skinny: m  n 
n 
m A 
Slightly more rigorous de
nition: 
It is cheap to pass O(n2) data to all processors.
Quick QR and SVD review 11 
A Q 
R 
VT 
n 
n 
n 
n 
n 
n 
n 
n A U 
n 
n 
n 
n 
Σ 
n 
n 
Figure: Q, U, and V are orthogonal matrices. R is upper triangular and 
 is diagonal with decreasing, nonnegative entries.
Tall-and-skinny QR 12 
A Q 
R 
m 
n 
m 
n 
n 
n 
Tall-and-skinny (TS): m  n. QTQ = I .
TS-QR ! TS-SVD 13 
A Q 
R Σ 
VT 
Q 
UR 
U 
R is small, so computing its SVD is cheap.
Why Tall-and-skinny QR and SVD? 14 
1. Regression with many samples 
2. Principle Component Analysis (PCA) 
3. Model Reduction 
Pressure, Dilation, Jet Engine 
Figure: Dynamic mode decomposition of the screech of a jet. Joe 
Nichols, University of Minnesota.
Cholesky QR 15 
Cholesky QR 
ATA = (QR)T (QR) = RTQTQR = RTR 
I Computing ATA in MapReduce is easy and well-studied. 
I We call this Cholesky QR.
Cholesky QR: Getting Q 16 
Q = AR1. 
A 
A1 
R-1 map 
R 
Q1 
A2 
R-1 map 
Q2 
A3 
R-1 map 
Q3 
A4 
R-1 map 
Q4 
emit 
emit 
emit 
emit 
distribute 
Local MatMul
Stability problems 17 
I Can get Q = AR1 
I Problem: Columns can be far from orthogonal and ATA 
squares condition number (data later) 
I Idea: Use a more advanced algorithm.
Communication-avoiding TSQR 18 
A = 
2 
A1 
A2 
A3 
A4 
664 
3 
775 
| {z } 
8n4n 
= 
2 
Q1 
664 
Q2 
Q3 
Q4 
3 
775 
| {z } 
8n4n 
2 
R1 
R2 
R3 
R4 
664 
3 
775 
| {z } 
4nn 
= 
=Q z2 }| { 
Q1 
664 
Q2 
Q3 
Q4 
3 
775 
| {z } 
8n4n 
~Q 
|{z} 
4nn 
|{Rz} 
nn 
Demmel et al. 2008
Communication-avoiding TSQR 19 
A = 
2 
A1 
A2 
A3 
A4 
664 
3 
775 | {z } 
8n4n 
= 
2 
Q1 
664 
Q2 
Q3 
Q4 
3 
775 
| {z } 
8n4n 
2 
R1 
R2 
R3 
R4 
664 
3 
775 
| {z } 
4nn 
Ai = QiRi can be computed in parallel. If we only need R, then 
we can throw out the intermediate Qi factors.
MapReduce TSQR 20 
S(1) 
A 
A1 
A3 
A3 
R1 map 
A2 
R emit 2 map 
R emit 3 map 
A4 
R emit 4 map 
shuffle 
S1 
A2 
reduce 
S2 
R2,1 emit 
reduce R2,2 
emit 
emit 
shuffle 
AS23 
reduce R2,3 emit 
Local TSQR 
identity map 
SA(22 ) 
reduce R emit 
Local TSQR Local TSQR 
Figure: S(1) is the matrix consisting of the rows of all of the Ri factors. 
Similarly, S(2) consists of all of the rows of the R2;j factors.
MapReduce TSQR: Getting Q 21 
I Again: have R, want Q 
A = QR ! Q = AR1 
I We call this method Indirect TSQR. 
I Problem: Q can be far from orthogonal (again).
Indirect TSQR: Iterative Re
nement 22 
Iterative re
nement: repeat TSQR for a more orthogonal Q 
A 
A1 
R-1 map 
R 
Q1 
A2 
R-1 map 
Q2 
A3 
R-1 map 
Q3 
A4 
R-1 map 
Q4 
emit 
emit 
emit 
emit 
distribute 
TSQR 
Q 
Q1 
R1 
-1 map 
R1 
Q1 
Q2 
R1 
-1 map 
Q2 
Q3 
R1 
-1 map 
Q3 
Q4 
R1 
-1 map 
Q4 
emit 
emit 
emit 
emit 
distribute 
Local MatMul Local MatMul 
Iterative Refinement step
Indirect TSQR: a randomized approach 23 
I Idea: Take a small sample of rows of A and form Rs 
s , Qs ! R1, Q = QsR1 
I Re
nement step by Qs = AR1 
1 
I R = R1Rs , QTQ  I for ill-conditioned A 
I Theory on why this works, need  100n log n rows 
[Mahoney 2011], [Avron, Maymounkov, and Toledo 2010], 
[Ipsen and Wentworth 2012] 
We call this Pseudo-Iterative Re
nement
Pseudo-Iterative Re
nement 24 
A 
A1 
Rs 
-1 map 
Rs 
Q1 
A2 
Rs 
-1 map 
Q2 
A3 
Rs 
-1 map 
Q3 
A4 
Rs 
-1 map 
Q4 
emit 
emit 
emit 
emit 
distribute 
TSQR 
Q 
Q1 
R1 
-1 map 
R1 
Q1 
Q2 
R1 
-1 map 
Q2 
Q3 
R1 
-1 map 
Q3 
Q4 
R1 
-1 map 
Q4 
emit 
emit 
emit 
emit 
distribute 
Local MatMul Local MatMul 
Iterative Form Qs Refinement step 
A1 
TSQR 
Form Rs 
(In the implementation, combine AR1 
s and TSQR in one pass)
Direct TSQR 25 
Why is computing truly orthogonal Q dicult in MapReduce? 
I Orthogonality is a global property, but we compute locally. 
I Can only label data via keys and
le names.
Communication-avoiding TSQR 26 
A = 
2 
A1 
A2 
A3 
A4 
664 
3 
775 
| {z } 
8n4n 
= 
2 
Q1 
664 
Q2 
Q3 
Q4 
3 
775 
| {z } 
8n4n 
2 
R1 
R2 
R3 
R4 
664 
3 
775 
| {z } 
4nn 
= 
2 
664 Q1 
Q2 
Q3 
Q4 
3 
775 
| {z } 
8n4n 
2 
Q1;2 
Q2;2 
Q3;2 
Q4;2 
664 
3 
775 
| {z } 
4nn 
|{Rz} 
nn 
= 
2 
Q1Q1;2 
Q2Q2;2 
Q3Q1;2 
Q4Q1;2 
664 
3 
775 
| {z } 
8nn 
|{Rz} 
nn 
= QR
Gathering Q 27 
2 
R1 
R2 
R3 
R4 
664 
3 
775 | {z } 
n#(mappers)n 
= 
2 
Q1;2 
Q2;2 
Q3;2 
Q4;2 
664 
3 
775 
| {z } 
n#(mappers)n 
|{Rz} 
nn 
I Idea: Compute QR (n  #(mappers) rows) in serial. 
I Idea: Pass Qi ;2 (n rows each) in second pass to reconstruct Q. 
I We call this Direct TSQR

More Related Content

PDF
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
Austin Benson
 
PDF
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Austin Benson
 
PDF
REDUCING TIMED AUTOMATA: A NEW APPROACH
ijistjournal
 
PDF
Large-scale computation without sacrificing expressiveness
Sangjin Han
 
PPTX
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Qian Lin
 
PDF
Sampled-Data Piecewise Affine Slab Systems: A Time-Delay Approach
Behzad Samadi
 
PDF
OT
Rei Mizuta
 
PPTX
single source shorest path
sowfi
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
Austin Benson
 
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Austin Benson
 
REDUCING TIMED AUTOMATA: A NEW APPROACH
ijistjournal
 
Large-scale computation without sacrificing expressiveness
Sangjin Han
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Qian Lin
 
Sampled-Data Piecewise Affine Slab Systems: A Time-Delay Approach
Behzad Samadi
 
single source shorest path
sowfi
 

What's hot (20)

PPTX
What is I/Q phase
Seokseong Jeon
 
PPT
Transefermation
Toran sahu
 
PPSX
Dsp i with_audio
Hardik gupta
 
PDF
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
Amr E. Mohamed
 
PDF
12handout
CHINTHAPADMAJA
 
PDF
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
Redis Labs
 
PDF
AINL 2016: Goncharov
Lidia Pivovarova
 
PPTX
Signal Flow Graph ( control system)
Gourab Ghosh
 
PDF
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
SEENET-MTP
 
PDF
Modern Control System (BE)
PRABHAHARAN429
 
PDF
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
University of Technology - Iraq
 
PDF
Learning Convolutional Neural Networks for Graphs
pione30
 
PDF
D I G I T A L C O N T R O L S Y S T E M S J N T U M O D E L P A P E R{Www
guest3f9c6b
 
PDF
A Polynomial-Space Exact Algorithm for TSP in Degree-5 Graphs
京都大学大学院情報学研究科数理工学専攻
 
PDF
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Cynthia Freeman
 
PPTX
Digital control systems (dcs) lecture 18-19-20
Ali Rind
 
PPTX
Block diagrams and signal flow graphs
Hussain K
 
PPTX
Presentation final
Kate Lu
 
PPT
Applications laplace transform
Muhammad Fadli
 
PDF
CArcMOOC 03.04 - Gate-level design
Alessandro Bogliolo
 
What is I/Q phase
Seokseong Jeon
 
Transefermation
Toran sahu
 
Dsp i with_audio
Hardik gupta
 
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
Amr E. Mohamed
 
12handout
CHINTHAPADMAJA
 
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
Redis Labs
 
AINL 2016: Goncharov
Lidia Pivovarova
 
Signal Flow Graph ( control system)
Gourab Ghosh
 
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
SEENET-MTP
 
Modern Control System (BE)
PRABHAHARAN429
 
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
University of Technology - Iraq
 
Learning Convolutional Neural Networks for Graphs
pione30
 
D I G I T A L C O N T R O L S Y S T E M S J N T U M O D E L P A P E R{Www
guest3f9c6b
 
A Polynomial-Space Exact Algorithm for TSP in Degree-5 Graphs
京都大学大学院情報学研究科数理工学専攻
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Cynthia Freeman
 
Digital control systems (dcs) lecture 18-19-20
Ali Rind
 
Block diagrams and signal flow graphs
Hussain K
 
Presentation final
Kate Lu
 
Applications laplace transform
Muhammad Fadli
 
CArcMOOC 03.04 - Gate-level design
Alessandro Bogliolo
 
Ad

Viewers also liked (15)

PDF
Ucapan aluan
Naniey Mahmud
 
PDF
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Austin Benson
 
PPTX
NYC-Meetup- Introduction to Hadoop Echosystem
AL500745425
 
PDF
Ucapan aluan
Naniey Mahmud
 
PPTX
Sandia Fast Matmul
Austin Benson
 
PDF
Suzlon takes a wise decision to go for CDR
Himanshu Sharma
 
PDF
Learning multifractal structure in large networks (KDD 2014)
Austin Benson
 
PPSX
Natura si echilibrul sau
DINU GEORGIANA- MARIA
 
PPTX
fast-matmul-cse15
Austin Benson
 
PPTX
A framework for practical fast matrix multiplication
Austin Benson
 
PDF
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
PPTX
Silent error resilience in numerical time-stepping schemes
Austin Benson
 
PDF
Silent error detection in numerical time stepping schemes (SIAM PP 2014)
Austin Benson
 
PPT
426 anaerobicdigesterdesign
hadirahimifarimani
 
PPTX
Tensor Spectral Clustering
Austin Benson
 
Ucapan aluan
Naniey Mahmud
 
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Austin Benson
 
NYC-Meetup- Introduction to Hadoop Echosystem
AL500745425
 
Ucapan aluan
Naniey Mahmud
 
Sandia Fast Matmul
Austin Benson
 
Suzlon takes a wise decision to go for CDR
Himanshu Sharma
 
Learning multifractal structure in large networks (KDD 2014)
Austin Benson
 
Natura si echilibrul sau
DINU GEORGIANA- MARIA
 
fast-matmul-cse15
Austin Benson
 
A framework for practical fast matrix multiplication
Austin Benson
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
Silent error resilience in numerical time-stepping schemes
Austin Benson
 
Silent error detection in numerical time stepping schemes (SIAM PP 2014)
Austin Benson
 
426 anaerobicdigesterdesign
hadirahimifarimani
 
Tensor Spectral Clustering
Austin Benson
 
Ad

Similar to Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData) (20)

PDF
Tall and Skinny QRs in MapReduce
David Gleich
 
PDF
MapReduce Tall-and-skinny QR and applications
David Gleich
 
PDF
Big data matrix factorizations and Overlapping community detection in graphs
David Gleich
 
PDF
Tall-and-skinny QR factorizations in MapReduce architectures
David Gleich
 
PDF
Direct tall-and-skinny QR factorizations in MapReduce architectures
David Gleich
 
PPTX
Big Practical Recommendations with Alternating Least Squares
Data Science London
 
PPTX
R user group 2011 09
MapR Technologies
 
PDF
Approximation of large covariance matrices in statistics
Alexander Litvinenko
 
PDF
Scalable and Adaptive Graph Querying with MapReduce
Kyong-Ha Lee
 
PDF
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
David Gleich
 
PPTX
EigenValue-Problems-and-QR-Algorithm_Apog-J..pptx
GiaGales1
 
PDF
Map-Reduce for Machine Learning on Multicore
illidan2004
 
PDF
Application of hierarchical matrices for partial inverse
Alexander Litvinenko
 
PDF
Parallel Machine Learning
Janani C
 
PDF
N41049093
IJERA Editor
 
PDF
Sparse matrix computations in MapReduce
David Gleich
 
PPTX
Mining of massive datasets
Ashic Mahtab
 
PPTX
Megadata With Python and Hadoop
ryancox
 
PDF
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Alexander Litvinenko
 
PDF
QR Algorithm Presentation
kmwangi
 
Tall and Skinny QRs in MapReduce
David Gleich
 
MapReduce Tall-and-skinny QR and applications
David Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
David Gleich
 
Tall-and-skinny QR factorizations in MapReduce architectures
David Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
David Gleich
 
Big Practical Recommendations with Alternating Least Squares
Data Science London
 
R user group 2011 09
MapR Technologies
 
Approximation of large covariance matrices in statistics
Alexander Litvinenko
 
Scalable and Adaptive Graph Querying with MapReduce
Kyong-Ha Lee
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
David Gleich
 
EigenValue-Problems-and-QR-Algorithm_Apog-J..pptx
GiaGales1
 
Map-Reduce for Machine Learning on Multicore
illidan2004
 
Application of hierarchical matrices for partial inverse
Alexander Litvinenko
 
Parallel Machine Learning
Janani C
 
N41049093
IJERA Editor
 
Sparse matrix computations in MapReduce
David Gleich
 
Mining of massive datasets
Ashic Mahtab
 
Megadata With Python and Hadoop
ryancox
 
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Alexander Litvinenko
 
QR Algorithm Presentation
kmwangi
 

More from Austin Benson (20)

PDF
Hypergraph Cuts with General Splitting Functions (JMM)
Austin Benson
 
PDF
Spectral embeddings and evolving networks
Austin Benson
 
PDF
Computational Frameworks for Higher-order Network Data Analysis
Austin Benson
 
PDF
Higher-order link prediction and other hypergraph modeling
Austin Benson
 
PDF
Hypergraph Cuts with General Splitting Functions
Austin Benson
 
PDF
Hypergraph Cuts with General Splitting Functions
Austin Benson
 
PDF
Higher-order link prediction
Austin Benson
 
PDF
Simplicial closure & higher-order link prediction
Austin Benson
 
PDF
Three hypergraph eigenvector centralities
Austin Benson
 
PDF
Semi-supervised learning of edge flows
Austin Benson
 
PDF
Choosing to grow a graph
Austin Benson
 
PDF
Link prediction in networks with core-fringe structure
Austin Benson
 
PDF
Higher-order Link Prediction GraphEx
Austin Benson
 
PDF
Higher-order Link Prediction Syracuse
Austin Benson
 
PDF
Random spatial network models for core-periphery structure
Austin Benson
 
PDF
Random spatial network models for core-periphery structure.
Austin Benson
 
PDF
Simplicial closure & higher-order link prediction
Austin Benson
 
PDF
Simplicial closure and simplicial diffusions
Austin Benson
 
PDF
Sampling methods for counting temporal motifs
Austin Benson
 
PDF
Set prediction three ways
Austin Benson
 
Hypergraph Cuts with General Splitting Functions (JMM)
Austin Benson
 
Spectral embeddings and evolving networks
Austin Benson
 
Computational Frameworks for Higher-order Network Data Analysis
Austin Benson
 
Higher-order link prediction and other hypergraph modeling
Austin Benson
 
Hypergraph Cuts with General Splitting Functions
Austin Benson
 
Hypergraph Cuts with General Splitting Functions
Austin Benson
 
Higher-order link prediction
Austin Benson
 
Simplicial closure & higher-order link prediction
Austin Benson
 
Three hypergraph eigenvector centralities
Austin Benson
 
Semi-supervised learning of edge flows
Austin Benson
 
Choosing to grow a graph
Austin Benson
 
Link prediction in networks with core-fringe structure
Austin Benson
 
Higher-order Link Prediction GraphEx
Austin Benson
 
Higher-order Link Prediction Syracuse
Austin Benson
 
Random spatial network models for core-periphery structure
Austin Benson
 
Random spatial network models for core-periphery structure.
Austin Benson
 
Simplicial closure & higher-order link prediction
Austin Benson
 
Simplicial closure and simplicial diffusions
Austin Benson
 
Sampling methods for counting temporal motifs
Austin Benson
 
Set prediction three ways
Austin Benson
 

Recently uploaded (20)

PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPT
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
1intro to AI.pptx AI components & composition
ssuserb993e5
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
akmibrahimbd
 
PPTX
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
1intro to AI.pptx AI components & composition
ssuserb993e5
 
Chad Readey - An Independent Thinker
Chad Readey
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
akmibrahimbd
 
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

  • 1. Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures Austin Benson ICME, Stanford University David Gleich (Purdue) and Jim Demmel (UC-Berkeley) A Q R m n m n n n IEEEDATA October 8, 2013
  • 2. Contributions 2 I Numerically stable and scalable algorithm for QR and SVD of tall-and-skinny matrices in MapReduce I Performance and stability tradeos of several methods I Performance model: prediction within a factor of two I Code: https://github.com/arbenson/mrtsqr
  • 3. MapReduce overview 3 Two functions that operate on key value pairs: (key; value) map ! (key; value) (key; hvalue1; : : : ; valueni) reduce ! (key; value) shue stage between map and reduce to sort values by key.
  • 4. MapReduce overview 4 The programmer implements: I map(key, value) I reduce(key, h value1, : : :, valuen i) Handled by MapReduce framework, e.g., Hadoop: I shue I load balancing I reading and writing data I data serialization I fault tolerance I ...
  • 5. MapReduce Example: ColorCount 5 (key, value) input is (image id, image) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 shuffle Map 1 Reduce 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 2 1 1 4 2 def ColorCountMap(key , val ) : for pixel in val : yield ( pixel , 1) def ColorCountReduce(key , vals ) : total = sum( vals ) yield (key , total )
  • 6. Why MapReduce? (for scientists) 6 MapReduce is restrictive! Why Bother? I Easy I load balancing I structured data I/O I fault tolerance I cheap clusters with large data storage Hadoop may not be the best option... Generate lots of data on supercomputer Post-process and analyze on MapReduce cluster
  • 7. Tall-and-skinny matrices 7 What are tall-and-skinny matrices? m n n m A Examples: rows are data samples; blocks of A are images from a video; Krylov subspaces; unrolled tensors
  • 8. Matrix representation 8 We have matrices, so what are the key-value pairs? A = 2 1:0 0:0 2:4 3:7 0:8 4:2 9:0 9:0 664 3 775 ! 2 (1; [1:0; 0:0]) (2; [2:4; 3:7]) (3; [0:8; 4:2]) (4; [9:0; 9:0]) 664 3 775 (key, value) ! (row index, row)
  • 9. Matrix representation: an example 9 Scienti
  • 10. c example: (x, y, z) coordinates and model number: ((47570,103.429767811242,0,-16.525510963787,iDV7), [0.00019924 -4.706066e-05 2.875293979e-05 2.456653e-05 -8.436627e-06 -1.508808e-05 3.731976e-06 -1.048795e-05 5.229153e-06 6.323812e-06]) Figure: Aircraft simulation data. Aero/Astro Department, Stanford
  • 11. Tall-and-skinny matrices 10 Tall-and-skinny: m n n m A Slightly more rigorous de
  • 12. nition: It is cheap to pass O(n2) data to all processors.
  • 13. Quick QR and SVD review 11 A Q R VT n n n n n n n n A U n n n n Σ n n Figure: Q, U, and V are orthogonal matrices. R is upper triangular and is diagonal with decreasing, nonnegative entries.
  • 14. Tall-and-skinny QR 12 A Q R m n m n n n Tall-and-skinny (TS): m n. QTQ = I .
  • 15. TS-QR ! TS-SVD 13 A Q R Σ VT Q UR U R is small, so computing its SVD is cheap.
  • 16. Why Tall-and-skinny QR and SVD? 14 1. Regression with many samples 2. Principle Component Analysis (PCA) 3. Model Reduction Pressure, Dilation, Jet Engine Figure: Dynamic mode decomposition of the screech of a jet. Joe Nichols, University of Minnesota.
  • 17. Cholesky QR 15 Cholesky QR ATA = (QR)T (QR) = RTQTQR = RTR I Computing ATA in MapReduce is easy and well-studied. I We call this Cholesky QR.
  • 18. Cholesky QR: Getting Q 16 Q = AR1. A A1 R-1 map R Q1 A2 R-1 map Q2 A3 R-1 map Q3 A4 R-1 map Q4 emit emit emit emit distribute Local MatMul
  • 19. Stability problems 17 I Can get Q = AR1 I Problem: Columns can be far from orthogonal and ATA squares condition number (data later) I Idea: Use a more advanced algorithm.
  • 20. Communication-avoiding TSQR 18 A = 2 A1 A2 A3 A4 664 3 775 | {z } 8n4n = 2 Q1 664 Q2 Q3 Q4 3 775 | {z } 8n4n 2 R1 R2 R3 R4 664 3 775 | {z } 4nn = =Q z2 }| { Q1 664 Q2 Q3 Q4 3 775 | {z } 8n4n ~Q |{z} 4nn |{Rz} nn Demmel et al. 2008
  • 21. Communication-avoiding TSQR 19 A = 2 A1 A2 A3 A4 664 3 775 | {z } 8n4n = 2 Q1 664 Q2 Q3 Q4 3 775 | {z } 8n4n 2 R1 R2 R3 R4 664 3 775 | {z } 4nn Ai = QiRi can be computed in parallel. If we only need R, then we can throw out the intermediate Qi factors.
  • 22. MapReduce TSQR 20 S(1) A A1 A3 A3 R1 map A2 R emit 2 map R emit 3 map A4 R emit 4 map shuffle S1 A2 reduce S2 R2,1 emit reduce R2,2 emit emit shuffle AS23 reduce R2,3 emit Local TSQR identity map SA(22 ) reduce R emit Local TSQR Local TSQR Figure: S(1) is the matrix consisting of the rows of all of the Ri factors. Similarly, S(2) consists of all of the rows of the R2;j factors.
  • 23. MapReduce TSQR: Getting Q 21 I Again: have R, want Q A = QR ! Q = AR1 I We call this method Indirect TSQR. I Problem: Q can be far from orthogonal (again).
  • 26. nement: repeat TSQR for a more orthogonal Q A A1 R-1 map R Q1 A2 R-1 map Q2 A3 R-1 map Q3 A4 R-1 map Q4 emit emit emit emit distribute TSQR Q Q1 R1 -1 map R1 Q1 Q2 R1 -1 map Q2 Q3 R1 -1 map Q3 Q4 R1 -1 map Q4 emit emit emit emit distribute Local MatMul Local MatMul Iterative Refinement step
  • 27. Indirect TSQR: a randomized approach 23 I Idea: Take a small sample of rows of A and form Rs s , Qs ! R1, Q = QsR1 I Re
  • 28. nement step by Qs = AR1 1 I R = R1Rs , QTQ I for ill-conditioned A I Theory on why this works, need 100n log n rows [Mahoney 2011], [Avron, Maymounkov, and Toledo 2010], [Ipsen and Wentworth 2012] We call this Pseudo-Iterative Re
  • 31. nement 24 A A1 Rs -1 map Rs Q1 A2 Rs -1 map Q2 A3 Rs -1 map Q3 A4 Rs -1 map Q4 emit emit emit emit distribute TSQR Q Q1 R1 -1 map R1 Q1 Q2 R1 -1 map Q2 Q3 R1 -1 map Q3 Q4 R1 -1 map Q4 emit emit emit emit distribute Local MatMul Local MatMul Iterative Form Qs Refinement step A1 TSQR Form Rs (In the implementation, combine AR1 s and TSQR in one pass)
  • 32. Direct TSQR 25 Why is computing truly orthogonal Q dicult in MapReduce? I Orthogonality is a global property, but we compute locally. I Can only label data via keys and
  • 34. Communication-avoiding TSQR 26 A = 2 A1 A2 A3 A4 664 3 775 | {z } 8n4n = 2 Q1 664 Q2 Q3 Q4 3 775 | {z } 8n4n 2 R1 R2 R3 R4 664 3 775 | {z } 4nn = 2 664 Q1 Q2 Q3 Q4 3 775 | {z } 8n4n 2 Q1;2 Q2;2 Q3;2 Q4;2 664 3 775 | {z } 4nn |{Rz} nn = 2 Q1Q1;2 Q2Q2;2 Q3Q1;2 Q4Q1;2 664 3 775 | {z } 8nn |{Rz} nn = QR
  • 35. Gathering Q 27 2 R1 R2 R3 R4 664 3 775 | {z } n#(mappers)n = 2 Q1;2 Q2;2 Q3;2 Q4;2 664 3 775 | {z } n#(mappers)n |{Rz} nn I Idea: Compute QR (n #(mappers) rows) in serial. I Idea: Pass Qi ;2 (n rows each) in second pass to reconstruct Q. I We call this Direct TSQR
  • 36. Direct TSQR: Steps 1 and 2 28 A A1 map R1 Q1 emit emit A2 map R2 Q2 emit emit A3 map R3 Q3 emit emit A4 map R4 Q4 emit emit First step R1 R2 R3 R4 Q1,2 Q2,2 Q3,2 Q4,2 R reduce emit emit emit emit emit Second step shuffle
  • 37. Direct TSQR: Step 3 29 Q1 Q1,2 emit map Q Q2 Q2,2 emit map Q Q3 Q3,2 emit map Q Q41 Q4,2 emit map Q Q12 Q22 Q32 Q42 distribute Third step
  • 38. Stability 30 0 2 10 0 10 −2 10 −4 10 −6 10 −8 10 −10 10 −12 10 −14 10 −16 10 2 10 4 10 6 10 8 10 10 10 12 10 14 10 16 10 10 (A) k 2 2 ||QTQ − I|| Numerical stability: 10,000x10 matrices Indir. TSQR + PIR Dir. TSQR Indir. TSQR Indir. TSQR + IR Chol. Chol. + IR
  • 39. Performance model 31 I Only count reads and writes I Streaming benchmark for read and write bandwidth of system I Within a factor of two of experimental data for all algorithms I I/O dominates runtime I Algorithms take same time as a few passes over data
  • 40. Performance 32 4B x 4 (134.6 GB) 2.5B x 10 (193.1 GB) 600M x 25 (112.0 GB) 500M x 50 (183.6 GB) 150M x 100 (109 GB) Matrix size 7000 6000 5000 4000 3000 2000 1000 0 Time to solution (seconds) Performance of QR algorithms on MapReduce Chol Indir TSQR Chol + PIR Indir TSQR + PIR Chol + IR Indir TSQR + IR Direct TSQR
  • 41. Direct TSQR: recursive extension 33 2 R1 R2 R3 R4 664 3 775 | {z } n#(mappers)n TSQR ! 2 Q1;2 Q2;2 Q3;2 Q4;2 664 3 775 | {z } n#(mappers)n |{Rz} nn I n #(mappers) rows is too large ! recurse
  • 42. Direct TSQR: recursive performance 34 0 50 100 150 200 6000 4000 2000 0 number of columns running time (s) 150M rows 0 50 100 150 200 250 8000 6000 4000 2000 0 number of columns running time (s) 100M rows 0 50 100 150 200 250 300 15000 10000 5000 0 number of columns running time (s) 50M rows no recursion recursion no recursion recursion no recursion recursion
  • 43. End 35 Contributions: I Numerically stable and scalable QR I Performance and stability tradeos I Performance model I Code: https://github.com/arbenson/mrtsqr Contact: I [email protected]