SlideShare a Scribd company logo
GPU Accelerated
Libraries for
Ruby
Prasun Anand
About me
● SciRuby Contributor
● Google Summer of Code 2016, 2017
● Genenetwork project
● Ruby Grant 2017
● Projects:
○ JRuby port of NMatrix
○ ArrayFire gem
○ RbCUDA
Scientific Computing
Why should Python
programmers have all the
fun ?
Arrays / Matrices
BLAS and LAPACK
GPU Computing is not easy !
CUDA and OpenCL
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
[2] pry(main)> b = a + a
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
2.0000 6.0000
4.0000 8.0000
=> #<ArrayFire::Af_Array:0x000000020625c8>
[1] pry(main)> left = ArrayFire::Af_Array.new 2 , [3,3] , [1, 4, 6, 4, 11 , 2 ,-5, 8, 10]
No Name Array
[3 3 1 1]
1.0000 4.0000 -5.0000
4.0000 11.0000 8.0000
6.0000 2.0000 10.0000
=> #<ArrayFire::Af_Array:0x000000014e56c8>
[2] pry(main)> right = ArrayFire::Af_Array.new 2 , [3,2] , [1, 0, 8, 10, -11, 8]
No Name Array
[3 2 1 1]
1.0000 10.0000
0.0000 -11.0000
8.0000 8.0000
=> #<ArrayFire::Af_Array:0x00000001591db0>
[3] pry(main)> result = ArrayFire::BLAS.matmul(left, right, :AF_MAT_NONE, :AF_MAT_NONE)
No Name Array
[3 2 1 1]
-39.0000 -74.0000
68.0000 -17.0000
86.0000 118.0000
=> #<ArrayFire::Af_Array:0x000000016136f8>
VALUE arf_init(int argc, VALUE* argv, VALUE self)
{
afstruct* afarray;
Data_Get_Struct(self, afstruct, afarray);
dim_t ndims = (dim_t)NUM2LONG(argv[0]);
dim_t* dimensions = (dim_t*)malloc(ndims * sizeof(dim_t));
dim_t count = 1;
for (size_t index = 0; index < ndims; index++) {
dimensions[index] = (dim_t)NUM2LONG(RARRAY_AREF(argv[1], index));
count *= dimensions[index];
}
double* host_array = (double*)malloc(count * sizeof(double));
for (size_t index = 0; index < count; index++) {
host_array[index] = (double)NUM2DBL(RARRAY_AREF(argv[2], index));
}
af_create_array(&afarray->carray, host_array, ndims, dimensions, f64);
return self;
}
static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val, VALUE left_prop_val, VALUE
right_prop_val){
afstruct* left;
afstruct* right;
afstruct* result = ALLOC(afstruct);
Data_Get_Struct(left_val, afstruct, left);
Data_Get_Struct(right_val, afstruct, right);
af_mat_prop left_mat_prop = arf_mat_type_from_rbsymbol(left_prop_val);
af_mat_prop right_mat_prop = arf_mat_type_from_rbsymbol(right_prop_val);
af_matmul(&result->carray, left->carray, right->carray, left_mat_prop, right_mat_prop);
return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result);
}
BLAS functionalities
● Matmult
● Transpose
LAPACK functionalities
● Det
● Inverse
● Norm
● Qr
● Cholesky
● Svd
● lu
Statistics
● Mean
● Median
● Variance
Benchmarks
● AMD FX 8350 octacore processor
● Nvidia GTX 750Ti GPU
● Double dtype
Rubyconfindia2018 - GPU accelerated libraries for Ruby
10 X
Faster than NMatrix-Ruby-Lapack
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
10,000 X
Faster than NMatrix-Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
100,000 X
Faster than NMatrix-Ruby-BLAS
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
RbCUDA
GPU Array
● Generic pointer used to handle an array of elements on the GPU.
● Memory copying from CPU to GPU and vice-versa.
● Interfaced with NMatrix and NArray
vadd_kernel_src = <<-EOS
extern "C" {
__global__ void matSum(int *a, int *b, int *c)
{
int tid = blockIdx.x;
if (tid < 100)
c[tid] = a[tid] + b[tid];
}
}
EOS
f = compile(vadd_kernel_src)
RbCUDA::Driver.run_kernel(f.path)
● CuBLAS
● CuSolver
● CuRand
Benchmarks
● AMD FX 8350 octacore processor
● Nvidia GTX 750Ti GPU
● Double dtype
Rubyconfindia2018 - GPU accelerated libraries for Ruby
1,000,000 X
Faster than NMatrix-Ruby-BLAS
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Future Work
● Image Processing APIs and Indexers
● Multiple dtypes
● RbCUDA is under active development.
● Project RbCUDA is being funded by Ruby Association
(Ruby Grant 2017)
Hobby Projects
1. Synapse : A Ruby first Deep Learning Library
2. Ninjaplot : A plotting library
● https://github.com/arrayfire/arrayfire-rb
● https://github.com/prasunanand/rbcuda
● https://github.com/prasunanand/synapse
● https://github.com/prasunanand/ninjaplot
Contributions are Welcome!
Acknowledgements
1. Pjotr Prins
2. Pradeep Garigipati
3. Kenta Murata
Thanks!
Github: prasunanand
Twitter: @prasun_anand
Blog: prasunanand.com
Questions

More Related Content

What's hot (19)

PDF
Beyond tf idf why, what & how
lucenerevolution
 
PDF
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
Wanbok Choi
 
PDF
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
Taro Matsuzawa
 
KEY
Parallel Computing in R
mickey24
 
KEY
Spl Not A Bridge Too Far phpNW09
Michelangelo van Dam
 
PDF
2014-04-09, Data mining demo for astronomy researchers
Samuel Harrold
 
PDF
Statistical Schema Induction
Johanna Voelker
 
PDF
Unleash your inner console cowboy
Kenneth Geisshirt
 
PPTX
Python GC
delimitry
 
PDF
What they don't tell you about JavaScript
Raphael Cruzeiro
 
PPT
Introduction to R
Happy Garg
 
PPTX
Javascript Arrays
shaheenakv
 
PDF
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward
 
KEY
Pointer Events in Canvas
deanhudson
 
PDF
JS OO and Closures
Jussi Pohjolainen
 
PDF
D3.js workshop
Anton Katunin
 
PDF
LLVM Backend の紹介
Akira Maruoka
 
PPT
10b. Graph Databases Lab
Fabio Fumarola
 
PPT
Python 101 language features and functional programming
Lukasz Dynowski
 
Beyond tf idf why, what & how
lucenerevolution
 
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
Wanbok Choi
 
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
Taro Matsuzawa
 
Parallel Computing in R
mickey24
 
Spl Not A Bridge Too Far phpNW09
Michelangelo van Dam
 
2014-04-09, Data mining demo for astronomy researchers
Samuel Harrold
 
Statistical Schema Induction
Johanna Voelker
 
Unleash your inner console cowboy
Kenneth Geisshirt
 
Python GC
delimitry
 
What they don't tell you about JavaScript
Raphael Cruzeiro
 
Introduction to R
Happy Garg
 
Javascript Arrays
shaheenakv
 
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward
 
Pointer Events in Canvas
deanhudson
 
JS OO and Closures
Jussi Pohjolainen
 
D3.js workshop
Anton Katunin
 
LLVM Backend の紹介
Akira Maruoka
 
10b. Graph Databases Lab
Fabio Fumarola
 
Python 101 language features and functional programming
Lukasz Dynowski
 

Similar to Rubyconfindia2018 - GPU accelerated libraries for Ruby (20)

PDF
High Performance GPU computing with Ruby, Rubykaigi 2018
Prasun Anand
 
PDF
High performance GPU computing with Ruby
Prasun Anand
 
PPTX
Fosdem2017 Scientific computing on Jruby
Prasun Anand
 
PDF
Distributed computing with spark
Javier Santos Paniego
 
PPTX
numpy code and examples with attributes.pptx
swathis752031
 
PPTX
Lecture 2 _Foundions foundions NumPyI.pptx
disserdekabrcha
 
PDF
Coscup2021 - useful abstractions at rust and it's practical usage
Wayne Tsai
 
PDF
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
RootedCON
 
PDF
Getting Functional with Scala
Jorge Paez
 
PPTX
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
PDF
Python lecture 05
Tanwir Zaman
 
PDF
Write Python for Speed
Yung-Yu Chen
 
PDF
Effective Numerical Computation in NumPy and SciPy
Kimikazu Kato
 
PPT
Arrays in c programing. practicals and .ppt
Carlos701746
 
PDF
Regular expressions, Alex Perry, Google, PyCon2014
alex_perry
 
PDF
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
Andrew Lamb
 
PPTX
Chp4(ref dynamic)
Mohd Effandi
 
PPTX
Java vs Python comparison of Syntax.pptx
ahmedmishfaq
 
PPT
SP-First-Lecture.ppt
FareedIhsas
 
High Performance GPU computing with Ruby, Rubykaigi 2018
Prasun Anand
 
High performance GPU computing with Ruby
Prasun Anand
 
Fosdem2017 Scientific computing on Jruby
Prasun Anand
 
Distributed computing with spark
Javier Santos Paniego
 
numpy code and examples with attributes.pptx
swathis752031
 
Lecture 2 _Foundions foundions NumPyI.pptx
disserdekabrcha
 
Coscup2021 - useful abstractions at rust and it's practical usage
Wayne Tsai
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
RootedCON
 
Getting Functional with Scala
Jorge Paez
 
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
Python lecture 05
Tanwir Zaman
 
Write Python for Speed
Yung-Yu Chen
 
Effective Numerical Computation in NumPy and SciPy
Kimikazu Kato
 
Arrays in c programing. practicals and .ppt
Carlos701746
 
Regular expressions, Alex Perry, Google, PyCon2014
alex_perry
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
Andrew Lamb
 
Chp4(ref dynamic)
Mohd Effandi
 
Java vs Python comparison of Syntax.pptx
ahmedmishfaq
 
SP-First-Lecture.ppt
FareedIhsas
 
Ad

Recently uploaded (20)

PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PPTX
Farrell__10e_ch04_PowerPoint.pptx Programming Logic and Design slides
bashnahara11
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PDF
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
 
PDF
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
Troubleshooting Virtual Threads in Java!
Tier1 app
 
PDF
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Farrell__10e_ch04_PowerPoint.pptx Programming Logic and Design slides
bashnahara11
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
 
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Troubleshooting Virtual Threads in Java!
Tier1 app
 
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Ad

Rubyconfindia2018 - GPU accelerated libraries for Ruby