SlideShare a Scribd company logo
© 2015 IBM Corporation
Accelerating Machine Learning
Applications on Spark Using GPUs
Wei Tan, Liana Fong
Other contributors: Minisk Cho, Rajesh Bordawekar
October 25
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal
without notice at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction
and it should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or
legal obligation to deliver any material, code or functionality. Information about potential future
products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a
controlled environment. The actual throughput or performance that any user will experience will vary
depending upon many factors, including considerations such as the amount of multiprogramming in the
user’s job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results similar to those stated
here.
Please Note:
2
Background: Apache Spark and MLlib
• Apache Spark
 An in memory engine for large-scale data processing
 Used in database, stream, machine learning and graph
processing
2
iter. 1 iter. 2 . . .
Input
Background: Apache Spark and MLlib
3
Classification
(LR, SVM…) Trees Recommendation Clustering … …
Background: GPU computing
4
Xeon e5 2687 CPU Tesla K40 GPU
• Slower clock, fewer cache:
not optimized for latency
• More transistors to
compute
• Higher flops and memory
bw
• Optimized for data-parallel,
high-throughput workload
GPU is with:
Background: Apache Spark and MLlib
5
Classification
(LR, SVM…) Trees Recommendation Clustering … …
+ (GPU) connectors and libs?
Problem: large-scale matrix factorization
• Why
 Recommendation important in
cognitive applications
 Digital ads market in US: 37.3 b*:
Spark/Facebook/IBM Commerce
 Need a fast and scalable solution
6
Problem: large-scale matrix factorization
• Why
–Factorize the word co-occurrence
matrix as rating matrix
–Obtain word features that embeds
semantics
7
man – woman =
king – queen =
brother – sister ….
MF: the state-of-art
• Many systems optimized for medium-
sized problems; very few target at
huge problems.
• Distributed solutions are slow.
 Do not roofline CPU performance
 Do not optimize communication
• Distributed solutions need a lot of
resources and cost.
8
MF: what we what to achieve
• Scale to problems of any size.
• Fast.
• Cost-efficient.
9
Solution: cuMF - ALS on a machine with GPUs
• On one GPU
 GPU (Nvidia K40): Memory BW: 288 GB/sec, compute: 5 Tflops
 Memory slower than compute  need to optimize memory access!
• The roofline model
 Higher Gflops  higher op intensity (more flops per byte)  caching!
Operational intensity (Flops/Byte)
Gflops/s
5T
1
288G ×
17
×
Solution: cuMF - ALS on a machine with GPUs
• MO-ALS on one GPU: Memory-Optimized ALS
•Access many θv columns: irregular due to R’s sparseness
•Aggregate many θvθv
Ts: memory intensive
Solution: cuMF - ALS on a machine with GPUs
• Texture memory to smooth dis-contiguous, irregular memory access
• Register memory to hold hotspot variables
12
Solution: cuMF - ALS on a machine with GPUs
• On multiple GPUs
• Exploit data & model parallelism
– Data parallelism: solve using a portion of the training data
– Model parallelism: solve a portion of the model
• Exploit connection topology to minimize communication overhead
13
Data parallel
model
parallel
CuMF performance
CuMF Performance
• cuMF: ALS on a single machine with 2* Nvidia K80 (4 cards)
 Compared with state-of-art distributed solutions
• 6-10x as fast
• 33-100x as cost-efficient (cuMF costs $2.5 per hour on Softlayer)
 Able to factorize the largest matrix ever reported
15
CuMF Performance
• cuMF: ALS on a machine with one GPU
 4x speedup as Spark ALS accelerator
16
Spark ALS
Spark
run-time
MLlib
cuMF with Spark
cuMF
C
Roadmap
• Current work
 Impressive acceleration of MF with GPUs on one machine
 GPU acceleration techniques with model and data parallelism
 Illustrated applicability of GPU acceleration to Spark/Mllib
 Performance evaluations on K40, K80 GPUs, Intel and Power
• Future work
 GPU acceleration of other ML algorithms in Mllib or others
 Acceleration of algorithms for multiple GPUs on single and
across machines, with and without RDMA across machines
 Performance evaluation on other hardware, including
• Other GPUs such as Nvidia Maxwell
• Forthcoming NVLink connectively across GPUs within a single
machine
17
18
Notices and Disclaimers
Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form
without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for
accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to
update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO
EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO,
LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted
according to the terms and conditions of the agreements under which they are provided.
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as
illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other
results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services
available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or
other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the
identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the
customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will
ensure that the customer is in compliance with any law.
19
Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly
available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
• IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document
Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM
SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON,
OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®,
pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ,
Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of
International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at:
www.ibm.com/legal/copytrade.shtml.
© 2015 IBM Corporation
Thank You

More Related Content

What's hot (19)

PDF
IBM z/OS V2R2 Networking Technologies Update
Anderson Bassani
 
PPT
PureApp Hybrid Cloud Jonathan Langley Presentation 11th September 2014
IBM Systems UKI
 
PDF
Ims01 ims trends and directions - IMS UG May 2014 Sydney & Melbourne
Robert Hain
 
PDF
IBM Server Makeover. Your first step towards lower costs, lower risks
IBM India Smarter Computing
 
PDF
Optimizing z/OS Batch
Martin Packer
 
PDF
Getting the MAX from your Virtualized Environment: Comprehensive Solutions fr...
IBM India Smarter Computing
 
PDF
Tip from IBM Connect2014: XPages Accessibility
SocialBiz UserGroup
 
PPTX
OpenWhisk Part 1 Research Data at Interconnect 2017
Perry Cheng
 
PDF
Improving Software Delivery with Software Defined Environments (IBM Interconn...
Michael Elder
 
PDF
Become an IBM Cloud Architect in 40 Minutes
Andrew Ferrier
 
PDF
AD 1656 - Transforming social data into business insight
Vincent Burckhardt
 
PPTX
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
paul young cpa, cga
 
PPTX
OpenWhisk Part 2 Research Day at Interconnect 2017
Perry Cheng
 
PDF
IOD 2012_ADP_092912
Rachel Niedzwiecki
 
PDF
2016 interconnect 7 habits of a successful scaled agile adoption using ibm clm
Reedy Feggins Jr
 
PDF
From Creepy to Cool: Fine Lines in Audience Analytics
graemeknows
 
PPTX
Creepy to cool audience analytics e merge 2014
graemeknows
 
PDF
NRB - LUXEMBOURG MAINFRAME DAY 2017 - z platform - Strategy
NRB
 
PDF
TI 1641 - delivering enterprise software at the speed of cloud
Vincent Burckhardt
 
IBM z/OS V2R2 Networking Technologies Update
Anderson Bassani
 
PureApp Hybrid Cloud Jonathan Langley Presentation 11th September 2014
IBM Systems UKI
 
Ims01 ims trends and directions - IMS UG May 2014 Sydney & Melbourne
Robert Hain
 
IBM Server Makeover. Your first step towards lower costs, lower risks
IBM India Smarter Computing
 
Optimizing z/OS Batch
Martin Packer
 
Getting the MAX from your Virtualized Environment: Comprehensive Solutions fr...
IBM India Smarter Computing
 
Tip from IBM Connect2014: XPages Accessibility
SocialBiz UserGroup
 
OpenWhisk Part 1 Research Data at Interconnect 2017
Perry Cheng
 
Improving Software Delivery with Software Defined Environments (IBM Interconn...
Michael Elder
 
Become an IBM Cloud Architect in 40 Minutes
Andrew Ferrier
 
AD 1656 - Transforming social data into business insight
Vincent Burckhardt
 
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
paul young cpa, cga
 
OpenWhisk Part 2 Research Day at Interconnect 2017
Perry Cheng
 
IOD 2012_ADP_092912
Rachel Niedzwiecki
 
2016 interconnect 7 habits of a successful scaled agile adoption using ibm clm
Reedy Feggins Jr
 
From Creepy to Cool: Fine Lines in Audience Analytics
graemeknows
 
Creepy to cool audience analytics e merge 2014
graemeknows
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - z platform - Strategy
NRB
 
TI 1641 - delivering enterprise software at the speed of cloud
Vincent Burckhardt
 

Viewers also liked (20)

PPT
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
Mark Kilgard
 
PDF
PG-Strom - GPU Accelerated Asyncr
Kohei KaiGai
 
PDF
Computational Techniques for the Statistical Analysis of Big Data in R
herbps10
 
PDF
GPU Ecosystem
Ofer Rosenberg
 
PDF
GPUs in Big Data - StampedeCon 2014
StampedeCon
 
PDF
Deep learning on spark
Satyendra Rana
 
PPT
GTC 2012: GPU-Accelerated Path Rendering
Mark Kilgard
 
PPT
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
odsc
 
PDF
Heterogeneous System Architecture Overview
inside-BigData.com
 
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
PDF
PyData Amsterdam - Name Matching at Scale
GoDataDriven
 
PDF
Hadoop + GPU
Vladimir Starostenkov
 
PPTX
Deep Learning on Hadoop
DataWorks Summit
 
PPTX
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
Spark Summit
 
PDF
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
PPTX
Containerizing GPU Applications with Docker for Scaling to the Cloud
Subbu Rama
 
PDF
How to Solve Real-Time Data Problems
IBM Power Systems
 
PDF
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Chris Fregly
 
PPTX
The Potential of GPU-driven High Performance Data Analytics in Spark
Spark Summit
 
PDF
Spark Summit EU talk by Tim Hunter
Spark Summit
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
Mark Kilgard
 
PG-Strom - GPU Accelerated Asyncr
Kohei KaiGai
 
Computational Techniques for the Statistical Analysis of Big Data in R
herbps10
 
GPU Ecosystem
Ofer Rosenberg
 
GPUs in Big Data - StampedeCon 2014
StampedeCon
 
Deep learning on spark
Satyendra Rana
 
GTC 2012: GPU-Accelerated Path Rendering
Mark Kilgard
 
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
odsc
 
Heterogeneous System Architecture Overview
inside-BigData.com
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
PyData Amsterdam - Name Matching at Scale
GoDataDriven
 
Hadoop + GPU
Vladimir Starostenkov
 
Deep Learning on Hadoop
DataWorks Summit
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
Spark Summit
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
Containerizing GPU Applications with Docker for Scaling to the Cloud
Subbu Rama
 
How to Solve Real-Time Data Problems
IBM Power Systems
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Chris Fregly
 
The Potential of GPU-driven High Performance Data Analytics in Spark
Spark Summit
 
Spark Summit EU talk by Tim Hunter
Spark Summit
 
Ad

Similar to Accelerating Machine Learning Applications on Spark Using GPUs (20)

PDF
DESY's new data taking and analysis infrastructure for PETRA III
Ulf Troppens
 
PDF
Creating your own cloud hosted APIM platform
sflynn073
 
PDF
Best practices for cloud hosted api management
sflynn073
 
PPT
Making People Flow in Cities Measurable and Analyzable
Weiwei Yang
 
PPTX
Disaster Recovery using Spectrum Scale Active File Management
Trishali Nayar
 
PPT
Evolving a monolithic Java EE application to microservices
Erin Schnabel
 
PPT
The Bluemix Quadruple Threat
Ram Vennam
 
PDF
IBM Design Thinking + Agile + DevOps Interconnect 2017
David Luke
 
PDF
InterConnect 2017 : z/OS-as-a-Service: The Disposable LPAR
DevOps for Enterprise Systems
 
PDF
Union Bank Slashes Onboarding Times with Analytics
Pyramid Solutions, Inc.
 
PDF
IC6284A - The Art of Choosing the Best Cloud Solution
Hendrik van Run
 
PDF
Managing integration in a multi cluster world
Shikha Srivastava
 
PPTX
Fnb optimizes retail banking product offers using real-time propensity models...
Avsharn
 
PDF
Integrating BigInsights and Puredata system for analytics with query federati...
Seeling Cheung
 
PDF
Informix REST API Tutorial
Brian Hughes
 
PDF
Java and the GPU - Everything You Need To Know
Adam Roberts
 
PDF
IBM Message Hub: Cloud-Native Messaging
Andrew Schofield
 
PDF
Witness the Evolution of Teamwork
Matt Holitza
 
PDF
Exposing auto-generated Swagger 2.0 documents from Liberty!
Arthur De Magalhaes
 
PDF
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
0xdaryl
 
DESY's new data taking and analysis infrastructure for PETRA III
Ulf Troppens
 
Creating your own cloud hosted APIM platform
sflynn073
 
Best practices for cloud hosted api management
sflynn073
 
Making People Flow in Cities Measurable and Analyzable
Weiwei Yang
 
Disaster Recovery using Spectrum Scale Active File Management
Trishali Nayar
 
Evolving a monolithic Java EE application to microservices
Erin Schnabel
 
The Bluemix Quadruple Threat
Ram Vennam
 
IBM Design Thinking + Agile + DevOps Interconnect 2017
David Luke
 
InterConnect 2017 : z/OS-as-a-Service: The Disposable LPAR
DevOps for Enterprise Systems
 
Union Bank Slashes Onboarding Times with Analytics
Pyramid Solutions, Inc.
 
IC6284A - The Art of Choosing the Best Cloud Solution
Hendrik van Run
 
Managing integration in a multi cluster world
Shikha Srivastava
 
Fnb optimizes retail banking product offers using real-time propensity models...
Avsharn
 
Integrating BigInsights and Puredata system for analytics with query federati...
Seeling Cheung
 
Informix REST API Tutorial
Brian Hughes
 
Java and the GPU - Everything You Need To Know
Adam Roberts
 
IBM Message Hub: Cloud-Native Messaging
Andrew Schofield
 
Witness the Evolution of Teamwork
Matt Holitza
 
Exposing auto-generated Swagger 2.0 documents from Liberty!
Arthur De Magalhaes
 
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
0xdaryl
 
Ad

Recently uploaded (20)

PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 

Accelerating Machine Learning Applications on Spark Using GPUs

  • 1. © 2015 IBM Corporation Accelerating Machine Learning Applications on Spark Using GPUs Wei Tan, Liana Fong Other contributors: Minisk Cho, Rajesh Bordawekar October 25
  • 2. • IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. • Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. • The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. • The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Please Note: 2
  • 3. Background: Apache Spark and MLlib • Apache Spark  An in memory engine for large-scale data processing  Used in database, stream, machine learning and graph processing 2 iter. 1 iter. 2 . . . Input
  • 4. Background: Apache Spark and MLlib 3 Classification (LR, SVM…) Trees Recommendation Clustering … …
  • 5. Background: GPU computing 4 Xeon e5 2687 CPU Tesla K40 GPU • Slower clock, fewer cache: not optimized for latency • More transistors to compute • Higher flops and memory bw • Optimized for data-parallel, high-throughput workload GPU is with:
  • 6. Background: Apache Spark and MLlib 5 Classification (LR, SVM…) Trees Recommendation Clustering … … + (GPU) connectors and libs?
  • 7. Problem: large-scale matrix factorization • Why  Recommendation important in cognitive applications  Digital ads market in US: 37.3 b*: Spark/Facebook/IBM Commerce  Need a fast and scalable solution 6
  • 8. Problem: large-scale matrix factorization • Why –Factorize the word co-occurrence matrix as rating matrix –Obtain word features that embeds semantics 7 man – woman = king – queen = brother – sister ….
  • 9. MF: the state-of-art • Many systems optimized for medium- sized problems; very few target at huge problems. • Distributed solutions are slow.  Do not roofline CPU performance  Do not optimize communication • Distributed solutions need a lot of resources and cost. 8
  • 10. MF: what we what to achieve • Scale to problems of any size. • Fast. • Cost-efficient. 9
  • 11. Solution: cuMF - ALS on a machine with GPUs • On one GPU  GPU (Nvidia K40): Memory BW: 288 GB/sec, compute: 5 Tflops  Memory slower than compute  need to optimize memory access! • The roofline model  Higher Gflops  higher op intensity (more flops per byte)  caching! Operational intensity (Flops/Byte) Gflops/s 5T 1 288G × 17 ×
  • 12. Solution: cuMF - ALS on a machine with GPUs • MO-ALS on one GPU: Memory-Optimized ALS •Access many θv columns: irregular due to R’s sparseness •Aggregate many θvθv Ts: memory intensive
  • 13. Solution: cuMF - ALS on a machine with GPUs • Texture memory to smooth dis-contiguous, irregular memory access • Register memory to hold hotspot variables 12
  • 14. Solution: cuMF - ALS on a machine with GPUs • On multiple GPUs • Exploit data & model parallelism – Data parallelism: solve using a portion of the training data – Model parallelism: solve a portion of the model • Exploit connection topology to minimize communication overhead 13 Data parallel model parallel
  • 16. CuMF Performance • cuMF: ALS on a single machine with 2* Nvidia K80 (4 cards)  Compared with state-of-art distributed solutions • 6-10x as fast • 33-100x as cost-efficient (cuMF costs $2.5 per hour on Softlayer)  Able to factorize the largest matrix ever reported 15
  • 17. CuMF Performance • cuMF: ALS on a machine with one GPU  4x speedup as Spark ALS accelerator 16 Spark ALS Spark run-time MLlib cuMF with Spark cuMF C
  • 18. Roadmap • Current work  Impressive acceleration of MF with GPUs on one machine  GPU acceleration techniques with model and data parallelism  Illustrated applicability of GPU acceleration to Spark/Mllib  Performance evaluations on K40, K80 GPUs, Intel and Power • Future work  GPU acceleration of other ML algorithms in Mllib or others  Acceleration of algorithms for multiple GPUs on single and across machines, with and without RDMA across machines  Performance evaluation on other hardware, including • Other GPUs such as Nvidia Maxwell • Forthcoming NVLink connectively across GPUs within a single machine 17
  • 19. 18 Notices and Disclaimers Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
  • 20. 19 Notices and Disclaimers (con’t) Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. • IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
  • 21. © 2015 IBM Corporation Thank You