SlideShare a Scribd company logo
Effective Numerical Computation in NumPy and SciPy 
Kimikazu Kato 
PyCon JP 2014 
September 13, 2014 
1 / 35
About Myself 
Kimikazu Kato 
Chief Scientists at Silver Egg Technology Co., Ltd. 
Ph.D in Computer Science 
Background in Mathematics, Numerical Computation, Algorithms, etc. 
<2 year experience in Python 
>10 year experience in numerical computation 
Now designing algorithms for recommendation system, and doing research 
about machine learning and data analysis. 
2 / 35
This talk... 
is about effective usage of NumPy/SciPy 
is NOT exhaustive introduction of capabilities, but shows some case 
studies based on my experience and interest 
3 / 35
Table of Contents 
Introduction 
Basics about NumPy 
Broadcasting 
Indexing 
Sparse matrix 
Usage of scipy.sparse 
Internal structure 
Case studies 
Conclusion 
4 / 35
Numerical Computation 
Differential equations 
Simulations 
Signal processing 
Machine Learning 
etc... 
Why Numerical Computation in Python? 
Productivity 
Easy to write 
Easy to debug 
Connectivity with visualization tools 
Matplotlib 
IPython 
Connectivity with web system 
Many frameworks (Django, Pyramid, Flask, Bottle, etc.) 
5 / 35
But Python is Very Slow! 
Code in C 
#include <stdio.h> 
int main() { 
int i; double s=0; 
for (i=1; i<=100000000; i++) s+=i; 
printf("%.0fn",s); 
} 
Code in Python 
s=0. 
for i in xrange(1,100000001): 
s+=i 
print s 
Both of the codes compute the sum of integers from 1 to 100,000,000. 
Result of benchmark in a certain environment: 
Above: 0.109 sec (compiled with -O3 option) 
Below: 8.657 sec 
(80+ times slower!!) 
6 / 35
Better code 
import numpy as np 
a=np.arange(1,100000001) 
print a.sum() 
Now it takes 0.188 sec. (Measured by "time" command in Linux, loading time 
included) 
Still slower than C, but sufficiently fast as a script language. 
7 / 35
Lessons 
Python is very slow when written badly 
Translate C (or Java, C# etc.) code into Python is often a bad idea. 
Python-friendly rewriting sometimes result in drastic performance 
improvement 
8 / 35
Basic rules for better performance 
Avoid for-sentence as far as possible 
Utilize libraries' capabilities instead 
Forget about the cost of copying memory 
Typical C programmer might care about it, but ... 
9 / 35
Basic techniques for NumPy 
Broadcasting 
Indexing 
10 / 35
Broadcasting 
>>> import numpy as np 
>>> a=np.array([0,1,2]) 
>>> a*3 
array([0, 3, 6]) 
>>> b=np.array([1,4,9]) 
>>> np.sqrt(b) 
array([ 1., 2., 3.]) 
A function which is applied to each element when applied to an array is called 
a universal function. 
11 / 35
Broadcasting (2D) 
>>> import numpy as np 
>>> a=np.arange(9).reshape((3,3)) 
>>> b=np.array([1,2,3]) 
>>> a 
array([[0, 1, 2], 
[3, 4, 5], 
[6, 7, 8]]) 
>>> b 
array([1, 2, 3]) 
>>> a*b 
array([[ 0, 2, 6], 
[ 3, 8, 15], 
[ 6, 14, 24]]) 
12 / 35
Indexing 
>>> import numpy as np 
>>> a=np.arange(10) 
>>> a 
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
>>> indices=np.arange(0,10,2) 
>>> indices 
array([0, 2, 4, 6, 8]) 
>>> a[indices]=0 
>>> a 
array([0, 1, 0, 3, 0, 5, 0, 7, 0, 9]) 
>>> b=np.arange(100,600,100) 
>>> b 
array([100, 200, 300, 400, 500]) 
>>> a[indices]=b 
>>> a 
array([100, 1, 200, 3, 300, 5, 400, 7, 500, 9]) 
13 / 35
Refernces 
Gabriele Lanaro, "Python High Performance Programming," Packt 
Publishing, 2013. 
Stéfan van der Walt, Numpy Medkit 
14 / 35
Sparse matrix 
Defined as a matrix in which most elements are zero 
Compressed data structure is used to express it, so that it will be... 
Space effective 
Time effective 
15 / 35
scipy.sparse 
The class scipy.sparse has mainly three types as expressions of a sparse 
matrix. (There are other types but not mentioned here) 
lil_matrix : convenient to set data; setting a[i,j] is fast 
csr_matrix : convenient for computation, fast to retrieve a row 
csc_matrix : convenient for computation, fast to retrieve a column 
Usually, set the data into lil_matrix, and then, convert it to csc_matrix or 
csr_matrix. 
For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast, 
but you should avoid calculation of different types. 
16 / 35
Use case 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,0]=1.; a[0,2]=2. 
>>> a=a.tocsr() 
>>> print a 
(0, 0) 1.0 
(0, 2) 2.0 
>>> a.todense() 
matrix([[ 1., 0., 2.], 
[ 0., 0., 0.], 
[ 0., 0., 0.]]) 
>>> b=lil_matrix((3,3)) 
>>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5. 
>>> b=b.tocsr() 
>>> b.todense() 
matrix([[ 0., 0., 0.], 
[ 0., 3., 0.], 
[ 4., 0., 5.]]) 
>>> c=a.dot(b) 
>>> c.todense() 
matrix([[ 8., 0., 10.], 
[ 0., 0., 0.], 
[ 0., 0., 0.]]) 
>>> d=a+b 
>>> d.todense() 
matrix([[ 1., 0., 2.], 
[ 0., 3., 0.], 
[ 4., 0., 5.]]) 17 / 35
Internal structure: csr_matrix 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. 
>>> b=a.tocsr() 
>>> b.todense() 
matrix([[ 0., 1., 2.], 
[ 0., 0., 3.], 
[ 4., 5., 0.]]) 
>>> b.indices 
array([1, 2, 2, 0, 1], dtype=int32) 
>>> b.data 
array([ 1., 2., 3., 4., 5.]) 
>>> b.indptr 
array([0, 2, 3, 5], dtype=int32) 
18 / 35
Internal structure: csc_matrix 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. 
>>> b=a.tocsc() 
>>> b.todense() 
matrix([[ 0., 1., 2.], 
[ 0., 0., 3.], 
[ 4., 5., 0.]]) 
>>> b.indices 
array([2, 0, 2, 0, 1], dtype=int32) 
>>> b.data 
array([ 4., 1., 5., 2., 3.]) 
>>> b.indptr 
array([0, 1, 3, 5], dtype=int32) 
19 / 35
Merit of knowing the internal structure 
Setting csr_matrix or csc_matrix with its internal structure is much faster than 
setting lil_matrix with indices. 
See the benchmark of setting 
 
 
 
  
  
ý ý 
ý  
 
 
 
 
20 / 35
from scipy.sparse import lil_matrix, csr_matrix 
import numpy as np 
from timeit import timeit 
def set_lil(n): 
a=lil_matrix((n,n)) 
for i in xrange(n): 
a[i,i]=2. 
if i+1n: 
a[i,i+1]=1. 
return a 
def set_csr(n): 
data=np.empty(2*n-1) 
indices=np.empty(2*n-1,dtype=np.int32) 
indptr=np.empty(n+1,dtype=np.int32) 
# to be fair, for-sentence is intentionally used 
# (using indexing technique is faster) 
for i in xrange(n): 
indices[2*i]=i 
data[2*i]=2. 
if in-1: 
indices[2*i+1]=i+1 
data[2*i+1]=1. 
indptr[i]=2*i 
indptr[n]=2*n-1 
a=csr_matrix((data,indices,indptr),shape=(n,n)) 
return a 
print lil:,timeit(set_lil(10000), 
number=10,setup=from __main__ import set_lil) 
print csr:,timeit(set_csr(10000), 
number=10,setup=from __main__ import set_csr) 
21 / 35
Result: 
lil: 11.6730761528 
csr: 0.0562081336975 
Remark 
When you deal with already sorted data, setting csr_matrix or csc_matrix 
with data, indices, indptr is much faster than setting lil_matrix 
But the code tend to be more complicated if you use the internal structure 
of csr_matrix or csc_matrix 
22 / 35
Case Studies 
23 / 35
Case 1: Norms 
If 2 
is dense: 
norm=np.dot(v,v) 
Ï2  Ï % 
2% 
Expressed as product of matrices. (dot means matrix product, but you don't 
have to take transpose explicitly.) 
When is sparse, suppose that is expressed as matrix: 
2 2  g * 
norm=v.multiply(v).sum() 
(multiply() is element-wise product) 
This is because taking transpose of a sparse matrix changes the type. 
24 / 35
Frobenius norm: 
norm=a.multiply(a).sum() 
 ÏÏ'SP % 
 % 
25 / 35
Case 2: Applying a function to all of the elements of a 
sparse matrix 
A universal function can be applied to a dense matrix: 
 import numpy as np 
 a=np.arange(9).reshape((3,3)) 
 a 
array([[0, 1, 2], 
[3, 4, 5], 
[6, 7, 8]]) 
 np.tanh(a) 
array([[ 0. , 0.76159416, 0.96402758], 
[ 0.99505475, 0.9993293 , 0.9999092 ], 
[ 0.99998771, 0.99999834, 0.99999977]]) 
This is convenient and fast. 
However, we cannot do the same thing for a sparse matrix. 
26 / 35
from scipy.sparse import lil_matrix 
 a=lil_matrix((3,3)) 
 a[0,0]=1. 
 a[1,0]=2. 
 b=a.tocsr() 
 np.tanh(b) 
3x3 sparse matrix of type 'type 'numpy.float64'' 
with 2 stored elements in Compressed Sparse Row format 
This is because, for an arbitrary function, its application to a sparse matrix is 
not necessarily sparse. 
However, if a universal function  satisfies 	
   
, the density is 
preserved. 
Then, how can we compute it? 
27 / 35
Use the internal structure!! 
The positions of the non-zero elements are not changed after application of 
the function. 
Keep indices and indptr, and just change data. 
Solution: 
b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape) 
28 / 35
Case 3: Formula which appears in a paper 
In the algorithm for recommendation system [1], the following formula 
appears: 
 øø   
 * g  
where is dense matrix, and D is a diagonal matrix defined from a 
given array as: 
	 %
 
  
 
 
 
  
  
ý 
 * 
 
 
 
Here, (which corresponds to the number of users or items) is big and 
(which means the number of latent factors) is small. 
[1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM, 
2008. 
*  
29 / 35
Solution 1: 
There is a special class dia_matrix to deal with a diagonal sparse matrix. 
import scipy.sparse as sparse 
import numpy as np 
def f(a,d): 
a: 2d array of shape (n,f), d: 1d array of length n 
dd=sparse.diags([d],[0]) 
return np.dot(a.T,dd.dot(a)) 
30 / 35
Solution 2: 
Pack csr_matrix with data,indices,indptr 
data=d 
indices=[0,1,..,n] 
indptr=[0,1,...,n+1] 
def g(a,d): 
n,f=a.shape 
data=d 
indices=np.arange(n) 
indptr=np.arange(n+1) 
dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) 
return np.dot(a.T,dd.dot(a)) 
31 / 35
Solution 3: 
 
  
 
 
û 
) 
 
 
û 
) 
	 
  g g   
 
  
  
  
û 
)  
  
  
û 
)  
This is equivalent to the broadcasting! 
def h(a,d): 
return np.dot(a.T*d,a) 
ü 
ü 
ü 
* 
* 
û 
*) 
 
  
 
 
   
  
ý 
 * 
 
 
 
ü 
ü 
 g  
ü 
* * 
* * 
û 
*) * 
 
  
32 / 35
Benchmark 
def datagen(n,f): 
np.random.seed(0) 
a=np.random.random((n,f)) 
d=np.random.random(n) 
return a,d 
from timeit import timeit 
print dia_matrix :,timeit(f(a,d),number=10, 
setup=from __main__ import f,datagen; a,d=datagen(1000000,10)) 
print csr_matrix :,timeit(g(a,d),number=10, 
setup=from __main__ import g,datagen; a,d=datagen(1000000,10)) 
print broadcasting :,timeit(h(a,d),number=10, 
setup=from __main__ import h,datagen; a,d=datagen(1000000,10)) 
Result: 
dia_matrix : 1.60458707809 
csr_matrix : 1.32580018044 
broadcasting : 1.30032682419 
33 / 35
Conclusion 
Try not to use for-sentence, but use libraries' capabilities instead. 
Knowledge about the internal structure of the sparse matrix is useful to 
extract further performance. 
Mathematical derivation is important. The key is to find a mathematically 
equivalent and Python-friendly formula. 
Computational speed does not necessarily matter. Finding a better code in 
a short time is valuable. Otherwise, you shouldn't pursue too much. 
34 / 35
Acknowledgment 
I would like to thank 
(@shima__shima) 
who gave me useful advice in Twitter. 
35 / 35

More Related Content

What's hot (20)

PDF
Algorithm chapter 1
chidabdu
 
PDF
Lecture: Context-Free Grammars
Marina Santini
 
PPTX
Chomsky Normal Form
Dhrumil Panchal
 
PPTX
Support vector machine-SVM's
Anudeep Chowdary Kamepalli
 
PPTX
Python Scipy Numpy
Girish Khanzode
 
PDF
2.2. interactive computer graphics
Ratnadeepsinh Jadeja
 
PPTX
Optimization in Deep Learning
Yan Xu
 
PPTX
Theory of automata and formal language
Rabia Khalid
 
PPTX
4.3 data structures records
missstevenson01
 
PDF
Algorithm and Programming (Sequential Structure)
Adam Mukharil Bachtiar
 
PPT
Greedy Algorithms WITH Activity Selection Problem.ppt
Ruchika Sinha
 
PPT
Theory of Automata Lesson 02
hamzamughal39
 
PPT
Extension principle
Savo Delić
 
PPTX
Python in 30 minutes!
Fariz Darari
 
PPT
Shell sort
Rajendran
 
PPTX
Brute force method
priyankabhansali217
 
PDF
Design & Analysis of Algorithms Lecture Notes
FellowBuddy.com
 
PPTX
Advanced Python : Decorators
Bhanwar Singh Meena
 
PDF
15CS664- Python Application Programming- Question bank 1
Syed Mustafa
 
PPTX
DataFrame in Python Pandas
Sangita Panchal
 
Algorithm chapter 1
chidabdu
 
Lecture: Context-Free Grammars
Marina Santini
 
Chomsky Normal Form
Dhrumil Panchal
 
Support vector machine-SVM's
Anudeep Chowdary Kamepalli
 
Python Scipy Numpy
Girish Khanzode
 
2.2. interactive computer graphics
Ratnadeepsinh Jadeja
 
Optimization in Deep Learning
Yan Xu
 
Theory of automata and formal language
Rabia Khalid
 
4.3 data structures records
missstevenson01
 
Algorithm and Programming (Sequential Structure)
Adam Mukharil Bachtiar
 
Greedy Algorithms WITH Activity Selection Problem.ppt
Ruchika Sinha
 
Theory of Automata Lesson 02
hamzamughal39
 
Extension principle
Savo Delić
 
Python in 30 minutes!
Fariz Darari
 
Shell sort
Rajendran
 
Brute force method
priyankabhansali217
 
Design & Analysis of Algorithms Lecture Notes
FellowBuddy.com
 
Advanced Python : Decorators
Bhanwar Singh Meena
 
15CS664- Python Application Programming- Question bank 1
Syed Mustafa
 
DataFrame in Python Pandas
Sangita Panchal
 

Viewers also liked (20)

PDF
Zuang-FPSGD
Kimikazu Kato
 
PDF
A Safe Rule for Sparse Logistic Regression
Kimikazu Kato
 
PDF
Recommendation System --Theory and Practice
Kimikazu Kato
 
PDF
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
Kimikazu Kato
 
PDF
特定の不快感を与えるツイートの分類と自動生成について
Kimikazu Kato
 
PDF
About Our Recommender System
Kimikazu Kato
 
PDF
養成読本と私
Kimikazu Kato
 
PDF
Googleにおける機械学習の活用とクラウドサービス
Etsuji Nakai
 
PDF
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
Shuyo Nakatani
 
PDF
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
Etsuji Nakai
 
PDF
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Etsuji Nakai
 
PDF
Life with jupyter
Etsuji Nakai
 
PDF
Numpy scipy matplotlibの紹介
Tatsuro Yasukawa
 
PDF
Introducton to Convolutional Nerural Network with TensorFlow
Etsuji Nakai
 
PDF
数式をnumpyに落としこむコツ
Shuyo Nakatani
 
PDF
NumPy闇入門
Ryosuke Okuta
 
PDF
Spannerに関する技術メモ
Etsuji Nakai
 
PDF
言語処理するのに Python でいいの? #PyDataTokyo
Shuyo Nakatani
 
PDF
Using Kubernetes on Google Container Engine
Etsuji Nakai
 
PDF
数式を綺麗にプログラミングするコツ #spro2013
Shuyo Nakatani
 
Zuang-FPSGD
Kimikazu Kato
 
A Safe Rule for Sparse Logistic Regression
Kimikazu Kato
 
Recommendation System --Theory and Practice
Kimikazu Kato
 
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
Kimikazu Kato
 
特定の不快感を与えるツイートの分類と自動生成について
Kimikazu Kato
 
About Our Recommender System
Kimikazu Kato
 
養成読本と私
Kimikazu Kato
 
Googleにおける機械学習の活用とクラウドサービス
Etsuji Nakai
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
Shuyo Nakatani
 
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
Etsuji Nakai
 
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Etsuji Nakai
 
Life with jupyter
Etsuji Nakai
 
Numpy scipy matplotlibの紹介
Tatsuro Yasukawa
 
Introducton to Convolutional Nerural Network with TensorFlow
Etsuji Nakai
 
数式をnumpyに落としこむコツ
Shuyo Nakatani
 
NumPy闇入門
Ryosuke Okuta
 
Spannerに関する技術メモ
Etsuji Nakai
 
言語処理するのに Python でいいの? #PyDataTokyo
Shuyo Nakatani
 
Using Kubernetes on Google Container Engine
Etsuji Nakai
 
数式を綺麗にプログラミングするコツ #spro2013
Shuyo Nakatani
 
Ad

Similar to Effective Numerical Computation in NumPy and SciPy (20)

KEY
Numpy Talk at SIAM
Enthought, Inc.
 
PDF
Kaggle tokyo 2018
Cournapeau David
 
PDF
Introduction to NumPy (PyData SV 2013)
PyData
 
PDF
Introduction to NumPy
Huy Nguyen
 
PDF
Introduction to NumPy for Machine Learning Programmers
Kimikazu Kato
 
PDF
Numpy.pdf
Arvind Pathak
 
PPTX
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
PPTX
Lecture 2 _Foundions foundions NumPyI.pptx
disserdekabrcha
 
PDF
The num py_library_20200818
Haim Michael
 
PDF
numpy.pdf
ssuser457188
 
PDF
Lecture 5 of Stanford university about python librarys
nirmensalama
 
PPTX
UNIT-03_Numpy (1) python yeksodbbsisbsjsjsh
tony8553004135
 
PPTX
Introduction-to-NumPy-in-Python (1).pptx
disserdekabrcha
 
PPTX
NUMPY LIBRARY study materials PPT 2.pptx
CHETHANKUMAR274045
 
PPTX
NumPy-python-27-9-24-we.pptxNumPy-python-27-9-24-we.pptx
tahirnaquash2
 
PPTX
THE NUMPY LIBRARY of python with slides.pptx
fareedullah211398
 
PPTX
Numpy_Pandas_for beginners_________.pptx
Abhi Marvel
 
PPTX
numpy code and examples with attributes.pptx
swathis752031
 
PDF
Numpy questions with answers and practice
basicinfohub67
 
Numpy Talk at SIAM
Enthought, Inc.
 
Kaggle tokyo 2018
Cournapeau David
 
Introduction to NumPy (PyData SV 2013)
PyData
 
Introduction to NumPy
Huy Nguyen
 
Introduction to NumPy for Machine Learning Programmers
Kimikazu Kato
 
Numpy.pdf
Arvind Pathak
 
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
Lecture 2 _Foundions foundions NumPyI.pptx
disserdekabrcha
 
The num py_library_20200818
Haim Michael
 
numpy.pdf
ssuser457188
 
Lecture 5 of Stanford university about python librarys
nirmensalama
 
UNIT-03_Numpy (1) python yeksodbbsisbsjsjsh
tony8553004135
 
Introduction-to-NumPy-in-Python (1).pptx
disserdekabrcha
 
NUMPY LIBRARY study materials PPT 2.pptx
CHETHANKUMAR274045
 
NumPy-python-27-9-24-we.pptxNumPy-python-27-9-24-we.pptx
tahirnaquash2
 
THE NUMPY LIBRARY of python with slides.pptx
fareedullah211398
 
Numpy_Pandas_for beginners_________.pptx
Abhi Marvel
 
numpy code and examples with attributes.pptx
swathis752031
 
Numpy questions with answers and practice
basicinfohub67
 
Ad

More from Kimikazu Kato (15)

PDF
Tokyo webmining 2017-10-28
Kimikazu Kato
 
PDF
機械学習ゴリゴリ派のための数学とPython
Kimikazu Kato
 
PDF
Pythonを使った機械学習の学習
Kimikazu Kato
 
PDF
Fast and Probvably Seedings for k-Means
Kimikazu Kato
 
PDF
Pythonで機械学習入門以前
Kimikazu Kato
 
PDF
Pythonによる機械学習
Kimikazu Kato
 
PDF
Introduction to behavior based recommendation system
Kimikazu Kato
 
PDF
Pythonによる機械学習の最前線
Kimikazu Kato
 
PDF
Sparse pca via bipartite matching
Kimikazu Kato
 
PDF
正しいプログラミング言語の覚え方
Kimikazu Kato
 
PDF
Sapporo20140709
Kimikazu Kato
 
PDF
ネット通販向けレコメンドシステム提供サービスについて
Kimikazu Kato
 
PPTX
関東GPGPU勉強会資料
Kimikazu Kato
 
PDF
2012-03-08 MSS研究会
Kimikazu Kato
 
PPTX
純粋関数型アルゴリズム入門
Kimikazu Kato
 
Tokyo webmining 2017-10-28
Kimikazu Kato
 
機械学習ゴリゴリ派のための数学とPython
Kimikazu Kato
 
Pythonを使った機械学習の学習
Kimikazu Kato
 
Fast and Probvably Seedings for k-Means
Kimikazu Kato
 
Pythonで機械学習入門以前
Kimikazu Kato
 
Pythonによる機械学習
Kimikazu Kato
 
Introduction to behavior based recommendation system
Kimikazu Kato
 
Pythonによる機械学習の最前線
Kimikazu Kato
 
Sparse pca via bipartite matching
Kimikazu Kato
 
正しいプログラミング言語の覚え方
Kimikazu Kato
 
Sapporo20140709
Kimikazu Kato
 
ネット通販向けレコメンドシステム提供サービスについて
Kimikazu Kato
 
関東GPGPU勉強会資料
Kimikazu Kato
 
2012-03-08 MSS研究会
Kimikazu Kato
 
純粋関数型アルゴリズム入門
Kimikazu Kato
 

Recently uploaded (20)

DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Biography of Daniel Podor.pdf
Daniel Podor
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 

Effective Numerical Computation in NumPy and SciPy

  • 1. Effective Numerical Computation in NumPy and SciPy Kimikazu Kato PyCon JP 2014 September 13, 2014 1 / 35
  • 2. About Myself Kimikazu Kato Chief Scientists at Silver Egg Technology Co., Ltd. Ph.D in Computer Science Background in Mathematics, Numerical Computation, Algorithms, etc. <2 year experience in Python >10 year experience in numerical computation Now designing algorithms for recommendation system, and doing research about machine learning and data analysis. 2 / 35
  • 3. This talk... is about effective usage of NumPy/SciPy is NOT exhaustive introduction of capabilities, but shows some case studies based on my experience and interest 3 / 35
  • 4. Table of Contents Introduction Basics about NumPy Broadcasting Indexing Sparse matrix Usage of scipy.sparse Internal structure Case studies Conclusion 4 / 35
  • 5. Numerical Computation Differential equations Simulations Signal processing Machine Learning etc... Why Numerical Computation in Python? Productivity Easy to write Easy to debug Connectivity with visualization tools Matplotlib IPython Connectivity with web system Many frameworks (Django, Pyramid, Flask, Bottle, etc.) 5 / 35
  • 6. But Python is Very Slow! Code in C #include <stdio.h> int main() { int i; double s=0; for (i=1; i<=100000000; i++) s+=i; printf("%.0fn",s); } Code in Python s=0. for i in xrange(1,100000001): s+=i print s Both of the codes compute the sum of integers from 1 to 100,000,000. Result of benchmark in a certain environment: Above: 0.109 sec (compiled with -O3 option) Below: 8.657 sec (80+ times slower!!) 6 / 35
  • 7. Better code import numpy as np a=np.arange(1,100000001) print a.sum() Now it takes 0.188 sec. (Measured by "time" command in Linux, loading time included) Still slower than C, but sufficiently fast as a script language. 7 / 35
  • 8. Lessons Python is very slow when written badly Translate C (or Java, C# etc.) code into Python is often a bad idea. Python-friendly rewriting sometimes result in drastic performance improvement 8 / 35
  • 9. Basic rules for better performance Avoid for-sentence as far as possible Utilize libraries' capabilities instead Forget about the cost of copying memory Typical C programmer might care about it, but ... 9 / 35
  • 10. Basic techniques for NumPy Broadcasting Indexing 10 / 35
  • 11. Broadcasting >>> import numpy as np >>> a=np.array([0,1,2]) >>> a*3 array([0, 3, 6]) >>> b=np.array([1,4,9]) >>> np.sqrt(b) array([ 1., 2., 3.]) A function which is applied to each element when applied to an array is called a universal function. 11 / 35
  • 12. Broadcasting (2D) >>> import numpy as np >>> a=np.arange(9).reshape((3,3)) >>> b=np.array([1,2,3]) >>> a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> b array([1, 2, 3]) >>> a*b array([[ 0, 2, 6], [ 3, 8, 15], [ 6, 14, 24]]) 12 / 35
  • 13. Indexing >>> import numpy as np >>> a=np.arange(10) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> indices=np.arange(0,10,2) >>> indices array([0, 2, 4, 6, 8]) >>> a[indices]=0 >>> a array([0, 1, 0, 3, 0, 5, 0, 7, 0, 9]) >>> b=np.arange(100,600,100) >>> b array([100, 200, 300, 400, 500]) >>> a[indices]=b >>> a array([100, 1, 200, 3, 300, 5, 400, 7, 500, 9]) 13 / 35
  • 14. Refernces Gabriele Lanaro, "Python High Performance Programming," Packt Publishing, 2013. Stéfan van der Walt, Numpy Medkit 14 / 35
  • 15. Sparse matrix Defined as a matrix in which most elements are zero Compressed data structure is used to express it, so that it will be... Space effective Time effective 15 / 35
  • 16. scipy.sparse The class scipy.sparse has mainly three types as expressions of a sparse matrix. (There are other types but not mentioned here) lil_matrix : convenient to set data; setting a[i,j] is fast csr_matrix : convenient for computation, fast to retrieve a row csc_matrix : convenient for computation, fast to retrieve a column Usually, set the data into lil_matrix, and then, convert it to csc_matrix or csr_matrix. For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast, but you should avoid calculation of different types. 16 / 35
  • 17. Use case >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,0]=1.; a[0,2]=2. >>> a=a.tocsr() >>> print a (0, 0) 1.0 (0, 2) 2.0 >>> a.todense() matrix([[ 1., 0., 2.], [ 0., 0., 0.], [ 0., 0., 0.]]) >>> b=lil_matrix((3,3)) >>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5. >>> b=b.tocsr() >>> b.todense() matrix([[ 0., 0., 0.], [ 0., 3., 0.], [ 4., 0., 5.]]) >>> c=a.dot(b) >>> c.todense() matrix([[ 8., 0., 10.], [ 0., 0., 0.], [ 0., 0., 0.]]) >>> d=a+b >>> d.todense() matrix([[ 1., 0., 2.], [ 0., 3., 0.], [ 4., 0., 5.]]) 17 / 35
  • 18. Internal structure: csr_matrix >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. >>> b=a.tocsr() >>> b.todense() matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]]) >>> b.indices array([1, 2, 2, 0, 1], dtype=int32) >>> b.data array([ 1., 2., 3., 4., 5.]) >>> b.indptr array([0, 2, 3, 5], dtype=int32) 18 / 35
  • 19. Internal structure: csc_matrix >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. >>> b=a.tocsc() >>> b.todense() matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]]) >>> b.indices array([2, 0, 2, 0, 1], dtype=int32) >>> b.data array([ 4., 1., 5., 2., 3.]) >>> b.indptr array([0, 1, 3, 5], dtype=int32) 19 / 35
  • 20. Merit of knowing the internal structure Setting csr_matrix or csc_matrix with its internal structure is much faster than setting lil_matrix with indices. See the benchmark of setting ý ý ý 20 / 35
  • 21. from scipy.sparse import lil_matrix, csr_matrix import numpy as np from timeit import timeit def set_lil(n): a=lil_matrix((n,n)) for i in xrange(n): a[i,i]=2. if i+1n: a[i,i+1]=1. return a def set_csr(n): data=np.empty(2*n-1) indices=np.empty(2*n-1,dtype=np.int32) indptr=np.empty(n+1,dtype=np.int32) # to be fair, for-sentence is intentionally used # (using indexing technique is faster) for i in xrange(n): indices[2*i]=i data[2*i]=2. if in-1: indices[2*i+1]=i+1 data[2*i+1]=1. indptr[i]=2*i indptr[n]=2*n-1 a=csr_matrix((data,indices,indptr),shape=(n,n)) return a print lil:,timeit(set_lil(10000), number=10,setup=from __main__ import set_lil) print csr:,timeit(set_csr(10000), number=10,setup=from __main__ import set_csr) 21 / 35
  • 22. Result: lil: 11.6730761528 csr: 0.0562081336975 Remark When you deal with already sorted data, setting csr_matrix or csc_matrix with data, indices, indptr is much faster than setting lil_matrix But the code tend to be more complicated if you use the internal structure of csr_matrix or csc_matrix 22 / 35
  • 24. Case 1: Norms If 2 is dense: norm=np.dot(v,v) Ï2 Ï % 2% Expressed as product of matrices. (dot means matrix product, but you don't have to take transpose explicitly.) When is sparse, suppose that is expressed as matrix: 2 2 g * norm=v.multiply(v).sum() (multiply() is element-wise product) This is because taking transpose of a sparse matrix changes the type. 24 / 35
  • 26. Case 2: Applying a function to all of the elements of a sparse matrix A universal function can be applied to a dense matrix: import numpy as np a=np.arange(9).reshape((3,3)) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) np.tanh(a) array([[ 0. , 0.76159416, 0.96402758], [ 0.99505475, 0.9993293 , 0.9999092 ], [ 0.99998771, 0.99999834, 0.99999977]]) This is convenient and fast. However, we cannot do the same thing for a sparse matrix. 26 / 35
  • 27. from scipy.sparse import lil_matrix a=lil_matrix((3,3)) a[0,0]=1. a[1,0]=2. b=a.tocsr() np.tanh(b) 3x3 sparse matrix of type 'type 'numpy.float64'' with 2 stored elements in Compressed Sparse Row format This is because, for an arbitrary function, its application to a sparse matrix is not necessarily sparse. However, if a universal function satisfies , the density is preserved. Then, how can we compute it? 27 / 35
  • 28. Use the internal structure!! The positions of the non-zero elements are not changed after application of the function. Keep indices and indptr, and just change data. Solution: b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape) 28 / 35
  • 29. Case 3: Formula which appears in a paper In the algorithm for recommendation system [1], the following formula appears: øø * g where is dense matrix, and D is a diagonal matrix defined from a given array as: % ý * Here, (which corresponds to the number of users or items) is big and (which means the number of latent factors) is small. [1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM, 2008. * 29 / 35
  • 30. Solution 1: There is a special class dia_matrix to deal with a diagonal sparse matrix. import scipy.sparse as sparse import numpy as np def f(a,d): a: 2d array of shape (n,f), d: 1d array of length n dd=sparse.diags([d],[0]) return np.dot(a.T,dd.dot(a)) 30 / 35
  • 31. Solution 2: Pack csr_matrix with data,indices,indptr data=d indices=[0,1,..,n] indptr=[0,1,...,n+1] def g(a,d): n,f=a.shape data=d indices=np.arange(n) indptr=np.arange(n+1) dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) return np.dot(a.T,dd.dot(a)) 31 / 35
  • 32. Solution 3: û ) û ) g g û ) û ) This is equivalent to the broadcasting! def h(a,d): return np.dot(a.T*d,a) ü ü ü * * û *) ý * ü ü g ü * * * * û *) * 32 / 35
  • 33. Benchmark def datagen(n,f): np.random.seed(0) a=np.random.random((n,f)) d=np.random.random(n) return a,d from timeit import timeit print dia_matrix :,timeit(f(a,d),number=10, setup=from __main__ import f,datagen; a,d=datagen(1000000,10)) print csr_matrix :,timeit(g(a,d),number=10, setup=from __main__ import g,datagen; a,d=datagen(1000000,10)) print broadcasting :,timeit(h(a,d),number=10, setup=from __main__ import h,datagen; a,d=datagen(1000000,10)) Result: dia_matrix : 1.60458707809 csr_matrix : 1.32580018044 broadcasting : 1.30032682419 33 / 35
  • 34. Conclusion Try not to use for-sentence, but use libraries' capabilities instead. Knowledge about the internal structure of the sparse matrix is useful to extract further performance. Mathematical derivation is important. The key is to find a mathematically equivalent and Python-friendly formula. Computational speed does not necessarily matter. Finding a better code in a short time is valuable. Otherwise, you shouldn't pursue too much. 34 / 35
  • 35. Acknowledgment I would like to thank (@shima__shima) who gave me useful advice in Twitter. 35 / 35