SlideShare a Scribd company logo
Yung-Yu Chen (@yungyuc)
On the necessity and
inapplicability of Python
Help us develop numerical software
Whom I am
• I am a mechanical engineer by training, focusing on
applications of continuum mechanics. A computational
scientist / engineer rather than a computer scientist.

• In my day job, I write high-performance code for
semiconductor applications of computational geometry
and lithography.

• In my spare time, I am teaching a course ‘numerical
software development’ in the dept. of computer science
in NCTU.
2
You can contact me through twitter: https://twitter.com/yungyuc
or linkedin: https://www.linkedin.com/in/yungyuc/.
PyHUG
• Python Hsinchu User Group (established in late
2011)

• The first group of staff of PyCon Taiwan (2012)

• Weekly meetups at a pub for 3 years, not
stopped by COVID-19

• 7+ active user groups in Taiwan 

• I have been in PyConJP in 2012, 2013 (APAC),
2015, 2019

• Last year I led a visit group to PyConJP (thank
you Terada san for the sharing the know-
how!)

• I hope we can do more
3
PyCon
Taiwan
5-6 Sep, 2020, Tainan, Taiwan

• It is planned to be an on-site conference
(unless something incredibly bad
happens again)

• Speakers may choose to speak online

• We still need to wear a face mask

• Appreciate the Taiwan citizens and
government, who work hard to
counter COVID-19

• https://g0v.hackmd.io/@kiang/
mask-info 

• We hope to see you again in Taiwan!
4
https://tw.pycon.org/2020/
Numerical software
• Numerical software: Computer programs to solve scientific or
mathematic problems.

• Other names: Mathematical software, scientific software, technical
software.

• Python is a popular language for application experts to describe the
problems and solutions, because it is easy to use.

• Most of the computing systems (the numerical software) are designed in
a hybrid architecture.

• The computing kernel uses C++.

• Python is chosen for the user-level API.
5
Example: OPC
6
photoresist
silicon substrate
photomask
light source
Photolithography in semiconductor fabrication
wave length is only
hundreds of nm
image I want to
project on the PR
shape I need
on the mask
Optical proximity correction (OPC)
(smaller than the
wave length)
write code to
make it happen
Example: PDEs
7
Numerical simulations of
conservation laws:

∂u
∂t
+
3
∑
k=1
∂F(k)
(u)
∂xk
= 0
Use case: stress waves in 

anisotropic solids
Use case: compressible flows
Example: What others do
• Machine learning

• Examples: TensorFlow, PyTorch

• Also:

• Computer aided design and engineering (CAD/CAE)

• Computer graphics and visualization

• Hybrid architecture provides both speed and flexibility

• C++ makes it possible to do the huge amount of calculations, e.g.,
distributed computing of thousands of computers

• Python helps describe the complex problems of mathematics or sciences
8
Crunch real numbers
• Simple example: solve the Laplace equation

• 

• 

• 

• Use a two-dimensional array as the spatial grid

• Point-Jacobi method: 3-level nested loop
∂2
u
∂x2
+
∂2
u
∂y2
= 0 (0 < x < 1; 0 < y < 1)
u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1)
u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1)
def solve_python_loop():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
# Outer loop.
while not converged:
step += 1
# Inner loops. One for x and the other for y.
for it in range(1, nx-1):
for jt in range(1, nx-1):
un[it,jt] = (u[it+1,jt] + u[it-1,jt]
+ u[it,jt+1] + u[it,jt-1]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
9
Non-trivial boundary condition
Power of Numpy C++
def solve_numpy_array():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
while not converged:
step += 1
un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] +
u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
def solve_python_loop():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
# Outer loop.
while not converged:
step += 1
# Inner loops. One for x and the other for y.
for it in range(1, nx-1):
for jt in range(1, nx-1):
un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
CPU times: user 62.1 ms, sys: 1.6 ms, total: 63.7 ms
Wall time: 63.1 ms: Pretty good!
CPU times: user 5.24 s, sys: 22.5 ms, total: 5.26 s
Wall time: 5280 ms: Poor speed
10
std::tuple<xt::xarray<double>, size_t, double>
solve_cpp(xt::xarray<double> u)
{
const size_t nx = u.shape(0);
xt::xarray<double> un = u;
bool converged = false;
size_t step = 0;
double norm;
while (!converged)
{
++step;
for (size_t it=1; it<nx-1; ++it)
{
for (size_t jt=1; jt<nx-1; ++jt)
{
un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4;
}
}
norm = xt::amax(xt::abs(un-u))();
if (norm < 1.e-5) { converged = true; }
u = un;
}
return std::make_tuple(u, step, norm);
}
CPU times: user 29.7 ms, sys: 506 µs, total: 30.2 ms
Wall time: 29.9 ms: Definitely good!
Pure Python 5280 ms
Numpy 63.1 ms
C++ 29.9 ms
83.7x
2.1x 176.6x
Pure Python Numpy
C++
The speed is the reason

1000 computers → 5.67

Save a lot of $
Recap: Why Python?
• Python is slow, but numpy may be reasonably fast.

• Coding in C++ is time-consuming.

• C++ is only needed in the computing kernel.

• Most code is supportive code, but it must not slow down the
computing kernel.

• Python makes it easier to organize structure the code.

This is why high-performance system usually uses a hybrid
architecture (C++ with Python or another scripting language).
11
Let’s go hybrid, but …
• A dilemma:

• Engineers (domain experts) know the problems but
don’t know C++ and software engineering.

• Computer scientists (programmers) know about C++
and software engineering but not the problems.

• Either side takes years of practices and study.

• Not a lot of people want to play both roles.
12
NSD: attempt to improve
• Numerical software development: a graduate-level
course

• Train computer scientists the hybrid architecture
for numerical software

• https://github.com/yungyuc/nsd

• Runnable Jupyter notebooks
13
• Part 1: Start with Python
• Lecture 1: Introduction

• Lecture 2: Fundamental engineering practices

• Lecture 3: Python and numpy

• Part 2: Computer architecture for performance
• Lecture 4: C++ and computer architecture
• Lecture 5: Matrix operations

• Lecture 6: Cache optimization

• Lecture 7: SIMD

• Part 3: Resource management
• Lecture 8: Memory management

• Lecture 9: Ownership and smart pointers

• Part 4: How to write C++ for Python
• Lecture 10: Modern C++

• Lecture 11: C++ and C for Python

• Lecture 12: Array code in C++

• Lecture 13: Array-oriented design

• Part 5: Conclude with Python
• Lecture 14: Advanced Python

• Term project presentation
Memory hierarchy
• We go to C++ to make it easier to access hardware

• Modern computer has faster CPU than memory

• High performance comes with hiding the memory-access latency
registers (0 cycle)
L1 cache (4 cycles)
L2 cache (10 cycles)
L3 cache (50 cycles)
Main memory (200 cycles)
Disk (storage) (100,000 cycles)
14
Data object
• Numerical software processes
huge amount of data. Copying
them is expensive.

• Use a pipeline to process the
same block of data

• Use an object to manage the
data: data object

• Data objects may not always be a
good idea in other fields.

• Here we do what it takes for
uncompromisable
performance.
Field initialization
Interior time-marching
Boundary condition
Parallel data sync
Finalization
Data
15
Data access at all phases
Zero-copy: do it where it fits
Python app C++ app
C++
container
Ndarray
manage
access
Python app C++ app
C++
container
Ndarray
manage
accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language memory buffer shared across language
Top (Python) - down (C++) Bottom (C++) - up (Python)
Python app C++ app
a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language
Ndarray
C++
container
16
More detail …
Notes about moving from Python to C++ 

• Python frame object

• Building Python extensions using pybind11
and cmake

• Inspecting assembly code

• x86 intrinsics

• PyObject, CPython API and pybind11 API

• Shared pointer, unique pointer, raw pointer,
and ownership

• Template generic programming

https://tw.pycon.org/2020/en-us/events/talk/
1164539411870777736/
17
How to learn
• Work on a real project.

• Keep in mind that Python is 100x slower than C/C++.

• Always profile (time).

• Don’t treat Python as simply Python.

• View Python as an interpreter library written in C.

• Use tools to call C/C++: Cython, pybind11, etc.
18
What we want
19
See problems
Formulate the
problems
Get something
working
Automate PrototypeReusable
software
? ?
One-time programs may happen
Thanks!
Questions?

More Related Content

What's hot (19)

PDF
TensorFlow example for AI Ukraine2016
Andrii Babii
 
PPTX
Tensorflow - Intro (2017)
Alessio Tonioni
 
PDF
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
PDF
Multithreading to Construct Neural Networks
Altoros
 
PPTX
Tensorflow windows installation
marwa Ayad Mohamed
 
PDF
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
PPTX
Introduction to Machine Learning with TensorFlow
Paolo Tomeo
 
PPTX
An Introduction to TensorFlow architecture
Mani Goswami
 
PDF
Introduction to TensorFlow
Ralph Vincent Regalado
 
PDF
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Kendall
 
PPTX
Getting started with TensorFlow
ElifTech
 
PDF
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
PPTX
Tensor flow
Nikhil Krishna Nair
 
PDF
TensorFlow Dev Summit 2017 요약
Jin Joong Kim
 
PPTX
Tensorflow internal
Hyunghun Cho
 
PPTX
Neural Networks with Google TensorFlow
Darshan Patel
 
PDF
Towards Machine Learning in Pharo with TensorFlow
ESUG
 
PDF
PAKDD2016 Tutorial DLIF: Introduction and Basics
Atsunori Kanemura
 
TensorFlow example for AI Ukraine2016
Andrii Babii
 
Tensorflow - Intro (2017)
Alessio Tonioni
 
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Multithreading to Construct Neural Networks
Altoros
 
Tensorflow windows installation
marwa Ayad Mohamed
 
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
Introduction to Machine Learning with TensorFlow
Paolo Tomeo
 
An Introduction to TensorFlow architecture
Mani Goswami
 
Introduction to TensorFlow
Ralph Vincent Regalado
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Kendall
 
Getting started with TensorFlow
ElifTech
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
Tensor flow
Nikhil Krishna Nair
 
TensorFlow Dev Summit 2017 요약
Jin Joong Kim
 
Tensorflow internal
Hyunghun Cho
 
Neural Networks with Google TensorFlow
Darshan Patel
 
Towards Machine Learning in Pharo with TensorFlow
ESUG
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
Atsunori Kanemura
 

Similar to On the necessity and inapplicability of python (20)

PDF
Travis Oliphant "Python for Speed, Scale, and Science"
Fwdays
 
PPTX
algorithms and data structure Time complexity
libannpost
 
PPTX
Online learning, Vowpal Wabbit and Hadoop
Héloïse Nonne
 
PDF
EL3011 1-Course-Introduction for Architecture of Computer.pdf
creojr88
 
PPT
iNTRODUCATION TO PYTHON IN PROGRAMMING LANGUAGE
shuhbou39
 
PDF
Intro to Multitasking
Brian Schrader
 
PPT
Python_intro.ppt
Mariela Gamarra Paredes
 
PDF
Introduction to OpenSees by Frank McKenna
openseesdays
 
PDF
What’s eating python performance
Piotr Przymus
 
PPTX
2017 arab wic marwa ayad machine learning
marwa Ayad Mohamed
 
PPTX
Parallel Computing-Part-1.pptx
krnaween
 
PPTX
Pregel
Weiru Dai
 
PPTX
Return of c++
Yongwei Wu
 
PDF
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
Naoki (Neo) SATO
 
PPTX
Recursion And Implementation C Programming
WaelBadawy6
 
PPT
01 introduction to cpp
Manzoor ALam
 
PDF
Free Python Notes PDF - Python Crash Course
Amend Ed Tech
 
PPT
Harnessing OpenCL in Modern Coprocessors
Unai Lopez-Novoa
 
PPTX
Python for IoT CoE.pptx KDOJWIHJNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
SalihaBathool
 
PPTX
Role of python in hpc
Dr Reeja S R
 
Travis Oliphant "Python for Speed, Scale, and Science"
Fwdays
 
algorithms and data structure Time complexity
libannpost
 
Online learning, Vowpal Wabbit and Hadoop
Héloïse Nonne
 
EL3011 1-Course-Introduction for Architecture of Computer.pdf
creojr88
 
iNTRODUCATION TO PYTHON IN PROGRAMMING LANGUAGE
shuhbou39
 
Intro to Multitasking
Brian Schrader
 
Python_intro.ppt
Mariela Gamarra Paredes
 
Introduction to OpenSees by Frank McKenna
openseesdays
 
What’s eating python performance
Piotr Przymus
 
2017 arab wic marwa ayad machine learning
marwa Ayad Mohamed
 
Parallel Computing-Part-1.pptx
krnaween
 
Pregel
Weiru Dai
 
Return of c++
Yongwei Wu
 
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
Naoki (Neo) SATO
 
Recursion And Implementation C Programming
WaelBadawy6
 
01 introduction to cpp
Manzoor ALam
 
Free Python Notes PDF - Python Crash Course
Amend Ed Tech
 
Harnessing OpenCL in Modern Coprocessors
Unai Lopez-Novoa
 
Python for IoT CoE.pptx KDOJWIHJNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
SalihaBathool
 
Role of python in hpc
Dr Reeja S R
 
Ad

More from Yung-Yu Chen (10)

PDF
Write Python for Speed
Yung-Yu Chen
 
PDF
SimpleArray between Python and C++
Yung-Yu Chen
 
PDF
Write code and find a job
Yung-Yu Chen
 
PDF
Notes about moving from python to c++ py contw 2020
Yung-Yu Chen
 
PDF
Take advantage of C++ from Python
Yung-Yu Chen
 
PDF
Start Wrap Episode 11: A New Rope
Yung-Yu Chen
 
PDF
Harmonic Stack for Speed
Yung-Yu Chen
 
PDF
Your interactive computing
Yung-Yu Chen
 
PDF
Engineer Engineering Software
Yung-Yu Chen
 
PDF
Craftsmanship in Computational Work
Yung-Yu Chen
 
Write Python for Speed
Yung-Yu Chen
 
SimpleArray between Python and C++
Yung-Yu Chen
 
Write code and find a job
Yung-Yu Chen
 
Notes about moving from python to c++ py contw 2020
Yung-Yu Chen
 
Take advantage of C++ from Python
Yung-Yu Chen
 
Start Wrap Episode 11: A New Rope
Yung-Yu Chen
 
Harmonic Stack for Speed
Yung-Yu Chen
 
Your interactive computing
Yung-Yu Chen
 
Engineer Engineering Software
Yung-Yu Chen
 
Craftsmanship in Computational Work
Yung-Yu Chen
 
Ad

Recently uploaded (20)

PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 

On the necessity and inapplicability of python

  • 1. Yung-Yu Chen (@yungyuc) On the necessity and inapplicability of Python Help us develop numerical software
  • 2. Whom I am • I am a mechanical engineer by training, focusing on applications of continuum mechanics. A computational scientist / engineer rather than a computer scientist. • In my day job, I write high-performance code for semiconductor applications of computational geometry and lithography. • In my spare time, I am teaching a course ‘numerical software development’ in the dept. of computer science in NCTU. 2 You can contact me through twitter: https://twitter.com/yungyuc or linkedin: https://www.linkedin.com/in/yungyuc/.
  • 3. PyHUG • Python Hsinchu User Group (established in late 2011) • The first group of staff of PyCon Taiwan (2012) • Weekly meetups at a pub for 3 years, not stopped by COVID-19 • 7+ active user groups in Taiwan • I have been in PyConJP in 2012, 2013 (APAC), 2015, 2019 • Last year I led a visit group to PyConJP (thank you Terada san for the sharing the know- how!) • I hope we can do more 3
  • 4. PyCon Taiwan 5-6 Sep, 2020, Tainan, Taiwan • It is planned to be an on-site conference (unless something incredibly bad happens again) • Speakers may choose to speak online • We still need to wear a face mask • Appreciate the Taiwan citizens and government, who work hard to counter COVID-19 • https://g0v.hackmd.io/@kiang/ mask-info • We hope to see you again in Taiwan! 4 https://tw.pycon.org/2020/
  • 5. Numerical software • Numerical software: Computer programs to solve scientific or mathematic problems. • Other names: Mathematical software, scientific software, technical software. • Python is a popular language for application experts to describe the problems and solutions, because it is easy to use. • Most of the computing systems (the numerical software) are designed in a hybrid architecture. • The computing kernel uses C++. • Python is chosen for the user-level API. 5
  • 6. Example: OPC 6 photoresist silicon substrate photomask light source Photolithography in semiconductor fabrication wave length is only hundreds of nm image I want to project on the PR shape I need on the mask Optical proximity correction (OPC) (smaller than the wave length) write code to make it happen
  • 7. Example: PDEs 7 Numerical simulations of conservation laws: ∂u ∂t + 3 ∑ k=1 ∂F(k) (u) ∂xk = 0 Use case: stress waves in 
 anisotropic solids Use case: compressible flows
  • 8. Example: What others do • Machine learning • Examples: TensorFlow, PyTorch • Also: • Computer aided design and engineering (CAD/CAE) • Computer graphics and visualization • Hybrid architecture provides both speed and flexibility • C++ makes it possible to do the huge amount of calculations, e.g., distributed computing of thousands of computers • Python helps describe the complex problems of mathematics or sciences 8
  • 9. Crunch real numbers • Simple example: solve the Laplace equation • • • • Use a two-dimensional array as the spatial grid • Point-Jacobi method: 3-level nested loop ∂2 u ∂x2 + ∂2 u ∂y2 = 0 (0 < x < 1; 0 < y < 1) u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1) u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1) def solve_python_loop(): u = uoriginal.copy() un = u.copy() converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm 9 Non-trivial boundary condition
  • 10. Power of Numpy C++ def solve_numpy_array(): u = uoriginal.copy() un = u.copy() converged = False step = 0 while not converged: step += 1 un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] + u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm def solve_python_loop(): u = uoriginal.copy() un = u.copy() converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm CPU times: user 62.1 ms, sys: 1.6 ms, total: 63.7 ms Wall time: 63.1 ms: Pretty good! CPU times: user 5.24 s, sys: 22.5 ms, total: 5.26 s Wall time: 5280 ms: Poor speed 10 std::tuple<xt::xarray<double>, size_t, double> solve_cpp(xt::xarray<double> u) { const size_t nx = u.shape(0); xt::xarray<double> un = u; bool converged = false; size_t step = 0; double norm; while (!converged) { ++step; for (size_t it=1; it<nx-1; ++it) { for (size_t jt=1; jt<nx-1; ++jt) { un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4; } } norm = xt::amax(xt::abs(un-u))(); if (norm < 1.e-5) { converged = true; } u = un; } return std::make_tuple(u, step, norm); } CPU times: user 29.7 ms, sys: 506 µs, total: 30.2 ms Wall time: 29.9 ms: Definitely good! Pure Python 5280 ms Numpy 63.1 ms C++ 29.9 ms 83.7x 2.1x 176.6x Pure Python Numpy C++ The speed is the reason 1000 computers → 5.67 Save a lot of $
  • 11. Recap: Why Python? • Python is slow, but numpy may be reasonably fast. • Coding in C++ is time-consuming. • C++ is only needed in the computing kernel. • Most code is supportive code, but it must not slow down the computing kernel. • Python makes it easier to organize structure the code. This is why high-performance system usually uses a hybrid architecture (C++ with Python or another scripting language). 11
  • 12. Let’s go hybrid, but … • A dilemma: • Engineers (domain experts) know the problems but don’t know C++ and software engineering. • Computer scientists (programmers) know about C++ and software engineering but not the problems. • Either side takes years of practices and study. • Not a lot of people want to play both roles. 12
  • 13. NSD: attempt to improve • Numerical software development: a graduate-level course • Train computer scientists the hybrid architecture for numerical software • https://github.com/yungyuc/nsd • Runnable Jupyter notebooks 13 • Part 1: Start with Python • Lecture 1: Introduction • Lecture 2: Fundamental engineering practices • Lecture 3: Python and numpy • Part 2: Computer architecture for performance • Lecture 4: C++ and computer architecture • Lecture 5: Matrix operations • Lecture 6: Cache optimization • Lecture 7: SIMD • Part 3: Resource management • Lecture 8: Memory management • Lecture 9: Ownership and smart pointers • Part 4: How to write C++ for Python • Lecture 10: Modern C++ • Lecture 11: C++ and C for Python • Lecture 12: Array code in C++ • Lecture 13: Array-oriented design • Part 5: Conclude with Python • Lecture 14: Advanced Python • Term project presentation
  • 14. Memory hierarchy • We go to C++ to make it easier to access hardware • Modern computer has faster CPU than memory • High performance comes with hiding the memory-access latency registers (0 cycle) L1 cache (4 cycles) L2 cache (10 cycles) L3 cache (50 cycles) Main memory (200 cycles) Disk (storage) (100,000 cycles) 14
  • 15. Data object • Numerical software processes huge amount of data. Copying them is expensive. • Use a pipeline to process the same block of data • Use an object to manage the data: data object • Data objects may not always be a good idea in other fields. • Here we do what it takes for uncompromisable performance. Field initialization Interior time-marching Boundary condition Parallel data sync Finalization Data 15 Data access at all phases
  • 16. Zero-copy: do it where it fits Python app C++ app C++ container Ndarray manage access Python app C++ app C++ container Ndarray manage accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language memory buffer shared across language Top (Python) - down (C++) Bottom (C++) - up (Python) Python app C++ app a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language Ndarray C++ container 16
  • 17. More detail … Notes about moving from Python to C++ • Python frame object • Building Python extensions using pybind11 and cmake • Inspecting assembly code • x86 intrinsics • PyObject, CPython API and pybind11 API • Shared pointer, unique pointer, raw pointer, and ownership • Template generic programming https://tw.pycon.org/2020/en-us/events/talk/ 1164539411870777736/ 17
  • 18. How to learn • Work on a real project. • Keep in mind that Python is 100x slower than C/C++. • Always profile (time). • Don’t treat Python as simply Python. • View Python as an interpreter library written in C. • Use tools to call C/C++: Cython, pybind11, etc. 18
  • 19. What we want 19 See problems Formulate the problems Get something working Automate PrototypeReusable software ? ? One-time programs may happen