SlideShare a Scribd company logo
From NumPy to PyTorch
Mike Ruberry
software engineer @ Facebook
Outline
- NumPy and working with tensors
- PyTorch and hardware accelerators, autograd, and computational graphs
- Adding NumPy operators to Pytorch
- When PyTorch is Different from NumPy
- Lessons learned and future work
NumPy and working
with tensors
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets
Tensor creation
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets
Addition
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets
Matrix multiplication
1 >> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8))
array([-3.44509285e-16 +1.14423775e-17 j,
8.00000000e+00 -8.11483250e-16 j,
2.33486982e-16 +1.22464680e-16 j,
0.00000000e+00 +1.22464680e-16 j,
9.95799250e-17 +2.33486982e-16 j,
0.00000000e+00 +7.66951701e-17 j,
1.14423775e-17 +1.22464680e-16 j,
0.00000000e+00 +1.22464680e-16 j])
2 >> A = np.array([[1,-2j],[2j,5]])
3 >> np.linalg.cholesky(A)
array([[1.+0.j, 0.+0.j],
[0.+2.j, 1.+0.j]])
More Complicated NumPy Snippets
NumPy
Operators
Composites Primitives
Composites Primitives
1 def sinc(x):
2 x = np.asanyarray(x)
3 y = pi * where(x == 0, 1.0e-20, x)
4 return sin(y)/y
1 double npy_copysign(
double x,
double y)
2 {
3 npy_uint32 hx , hy;
4 GET_HIGH_WORD(hx, x);
5 GET_HIGH_WORD(hy, y);
6 SET_HIGH_WORD(x,
(hx & 0x7fffffff) |
(hy & 0x80000000));
7 return x;
8 }
PyTorch and
hardware accelerators,
autograd, and computational
graphs
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets (Again)
1 >> import torch
2 >> a = torch.tensor(((1, 2), (3, 4)))
tensor([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets to PyTorch Snippets
Tensor creation
1 >> import torch
2 >> a = torch.tensor(((1, 2), (3, 4)))
tensor([[1, 2],
[3, 4]])
3 >> b = torch.tensor(((-1, -2), (-3, -4)))
4 >> torch.add(a, b)
tensor([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Addition
Simple NumPy Snippets to PyTorch Snippets
1 >> import torch
2 >> a = torch.tensor(((1, 2), (3, 4)))
tensor([[1, 2],
[3, 4]])
3 >> b = torch.tensor(((-1, -2), (-3, -4)))
4 >> torch.add(a, b)
tensor([[0, 0],
[0, 0]])
5 >> torch.matmul(a, b)
tensor([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets to PyTorch Snippets
Matrix multiplication
1 >> import torch
2 >> a = torch.tensor(((1, 2), (3, 4)))
tensor([[1, 2],
[3, 4]])
3 >> b = torch.tensor(((-1, -2), (-3, -4)))
4 >> torch.add(a, b)
tensor([[0, 0],
[0, 0]])
5 >> torch.matmul(a, b)
tensor([[ -7, -10],
[-15, -22]])
Simple PyTorch Snippets
1 >> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8))
array([-3.44509285e-16 +1.14423775e-17 j,
8.00000000e+00 -8.11483250e-16 j,
2.33486982e-16 +1.22464680e-16 j,
0.00000000e+00 +1.22464680e-16 j,
9.95799250e-17 +2.33486982e-16 j,
0.00000000e+00 +7.66951701e-17 j,
1.14423775e-17 +1.22464680e-16 j,
0.00000000e+00 +1.22464680e-16 j])
2 >> A = np.array([[1,-2j],[2j,5]])
3 >> np.linalg.cholesky(A)
array([[1.+0.j, 0.+0.j],
[0.+2.j, 1.+0.j]])
More Complicated NumPy Snippets (Again)
1 >> torch.fft.fft(torch.exp(2j * math.pi * torch.arange(8) / 8))
2 tensor([ 3.2584e-07+3.1787e-08j, 8.0000e+00+4.8023e-07j,
3 -3.2584e-07+3.1787e-08j, -1.6859e-07+3.1787e-08j,
4 -3.8941e-07-2.0663e-07j, 1.3691e-07-1.9412e-07j,
5 3.8941e-07-2.0663e-07j, 1.6859e-07+3.1787e-08j])
1 >> A = torch.tensor([[1,-2j],[2j,5]])
2 >> torch.linalg.cholesky(A)
3 tensor([[1.+0.j, 0.+0.j],
4 [0.+2.j, 1.+0.j]])
More Complicated PyTorch Snippets
1 >> t = torch.tensor((1, 2, 3))
2 >> a = t.numpy()
3 array([1, 2, 3])
3 >> b = np.array((-1, -2, -3))
4 >> result = a + b
array([0, 0, 0])
5 >> torch.from_numpy(result)
tensor([0, 0, 0])
PyTorch and NumPy Interoperability
Does PyTorch have EVERY NumPy operator?
- No!
- NumPy has a lot of operators: A LOT
- Many of them are rarely used, niche, deprecated, or in need of deprecation
- But PyTorch does have hundreds of NumPy operators
1 >> import torch
2 >> a = torch.tensor(((1., 2), (3, 4)), device='cuda')
tensor([[1, 2],
[3, 4]], device='cuda:0')
3 >> b = torch.tensor(((-1, -2), (-3, -4)), device='cuda')
4 >> torch.add(a, b)
tensor([[0, 0],
[0, 0]], device='cuda:0')
5 >> torch.matmul(a.float(), b.float())
tensor([[ -7., -10.],
[-15., -22.]], device='cuda:0')
Simple PyTorch Snippets on CUDA
1 >> a = torch.tensor((1., 2.), requires_grad=True)
2 >> b = torch.tensor((3., 4.))
3 >> result = (a * b).sum()
4 >> result.backward()
5 >> a.grad
tensor([3., 4.])
Autograd in PyTorch
1 def sinc(x):
2 y = math.pi * torch.where(x == 0, 1.0e-20, x)
3 return torch.sin(y)/y
4
5 scripted_sinc = torch.jit.script(sinc)
graph(%x.1 : Tensor):
%1 : float = prim::Constant[value=3.1415926535897931 ]
%3 : int = prim::Constant[value=0]
%5 : float = prim::Constant[value=9.9999999999999995e-21 ]
%4 : Tensor = aten::eq(%x.1, %3)
%7 : Tensor = aten::where(%4, %5, %x.1)
%y.1 : Tensor = aten::mul(%7, %1)
%10 : Tensor = aten::sin(%y.1)
%12 : Tensor = aten::div(%10, %y.1)
return (%12)
Computational Graphs in PyTorch
1 >> t = torch.randn(10)
2 >> linear_layer = torch.nn.Linear(10, 5)
3 >> linear_layer(t)
tensor([ 0.0066, 0.2467, -0.0137, -0.4091, -1.1756],
grad_fn=<AddBackward0>)
Deep Learning in PyTorch
PyTorch as NumPy+
- While PyTorch doesn’t have every NumPy operator, for those it supports we
can think of it as NumPy PLUS:
- Support for hardware accelerators, like GPUs and TPUs
- Support for autograd
- Support computational graphs
- Support for deep learning
- A C++ API
- … and many additional features (visualization, distributed training, …)
- PyTorch also has additional operators that NumPy does not
PyTorch Behind the Scenes
- To recap, NumPy had…
- Composite operators (typically implemented in Python)
- Primitive operators (implemented in C++)
- And PyTorch has...
- Composite operators (implemented in C++)
- Primitive operators (implemented in C++, CPU intrinsics, and CUDA)
- Computational graphs (executed by torchscript or XLA)
- Plus autograd formulas for differentiable operations
1 def sinc(x):
2 x = np.asanyarray(x)
3 y = pi * where(x == 0, 1.0e-20, x)
4 return sin(y)/y
Sinc in NumPy (reminder)
1 static void sinc_kernel(TensorIteratorBase& iter) {
2 AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(
kBFloat16, iter.common_dtype(), "sinc_cpu", [&]() {
3 cpu_kernel(
4 iter,
5 [=](scalar_t a) -> scalar_t {
6 if (a == scalar_t(0)) {
7 return scalar_t(1);
8 } else {
9 scalar_t product = c10::pi<scalar_t> * a;
10 return std::sin(product) / product;
11 }
12 });
13 });
14 }
Sinc in PyTorch, CPU kernel
Sinc in PyTorch, Autograd Formula
1 name: sinc(Tensor self) -> Tensor
2 self: grad *
((M_PI * self *
(M_PI * self).cos() - (M_PI * self).sin()) /
(M_PI * self * self)).conj()
Adding NumPy Operators to
PyTorch
Porting an operator from NumPy
- Need to write a C++ implementation
- Possibly a CPU kernel or a CUDA kernel
- Need to write an autograd formula (if the op is differentiable)
- Need to write comprehensive tests (more on this in a moment)
… why do we bother?
From NumPy to PyTorch
From NumPy to PyTorch
From NumPy to PyTorch
Porting an operator from NumPy
- Need to write a C++ implementation
- Possibly a CPU kernel or a CUDA kernel
- Made easier with the C++ “TensorIterator” architecture
- Need to write an autograd formula (if the op is differentiable)
- Simplified by allowing users to write Pythonic YAML formulas
- Need to write comprehensive tests (more on this in a moment)
- Significant coverage automated with PyTorch’s OpInfo metadata and test generation
framework
PyTorch’s test matrix
- Tensor properties:
- Datatype (long, float, complexfloat, etc.)
- Device (CPU, CUDA, TPU, etc.)
- Differentiable operations support autograd
- Operations need to work in computational graphs
- Operations have “function,” “method” and “inplace” variants
OpInfo for torch.mul
1 OpInfo('mul',
2 aliases =('multiply',),
3 dtypes =all_types_and_complex_and (
torch.float16, torch.bfloat16, torch.bool),
4 sample_inputs_func =sample_inputs_binary_pwise )
OpInfo for torch.sin
1 UnaryUfuncInfo ('sin',
2 ref=np.sin,
3 dtypes=all_types_and_complex_and (
torch.bool, torch.bfloat16),
4 dtypesIfCUDA=all_types_and_complex_and (
torch.bool, torch.half),
5 handles_large_floats =False,
6 handles_complex_extremals =False,
7 safe_casts_outputs =True,
8 decorators=(precisionOverride ({torch.bfloat16: 1e-2}),))
OpInfo test template
1 @ops(unary_ufuncs)
2 def test_contig_vs_transposed (self, device, dtype, op):
3 contig = make_tensor((789, 357),
device=device, dtype=dtype,
low=op.domain[0], high=op.domain[1])
4 non_contig = contig.T
5 self.assertTrue(contig.is_contiguous())
6 self.assertFalse(non_contig.is_contiguous())
7 torch_kwargs, _ = op.sample_kwargs(device, dtype, contig)
8 self.assertEqual(
op(contig, **torch_kwargs).T,
op(non_contig, **torch_kwargs))
Instantiated tests for torch.sin
@ops(unary_ufuncs)
def test_contig_vs_transposed (self, device, dtype, op):
test_contig_vs_transposed_sin_cuda_complex64
test_contig_vs_transposed_sin_cuda_float16
test_contig_vs_transposed_sin_cuda_float32
test_contig_vs_transposed_sin_cuda_int64
test_contig_vs_transposed_sin_cuda_uint8
test_contig_vs_transposed_sin_cpu_complex64
test_contig_vs_transposed_sin_cpu_float16
test_contig_vs_transposed_sin_cpu_float32
test_contig_vs_transposed_sin_cpu_int64
test_contig_vs_transposed_sin_cpu_uint8
Example properties validated for every operator
- Autograd is implemented correctly
- Tested using finite differences
- The operation works with torchscript and torch.fx
- The operation’s function, method, and inplace variants all compute the same
operation
- One big caveat: can’t automatically test correctness except for special
classes of operators (like unary ufuncs)
Features of PyTorch’s test generator
- Works with pytest and unittest
- Dynamically identifies available device types
- Allows for device type-specific logic for setup and teardown
- Extensible by other packages adding new device types (like PyTorch/XLA)
- Provides a central “source of truth” for operator’s functionality
- Makes it easy to test new features with every PyTorch operator
When PyTorch is Different
from NumPy
NumPy PyTorch
1 >> a = np.array((1, 2, 3))
2 >> np.reciprocal(a)
array([1, 0, 0])
np.reciprocal vs torch.reciprocal
1 >> t = torch.tensor((1, 2, 3))
2 >> torch.reciprocal(t)
tensor([
1.0000,
0.5000,
0.3333])
NumPy PyTorch
1 >> a = np.diag(
np.array((1., 2, 3)))
2 >> w, v = np.linalg.eig(a)
array([1., 2., 3.]),
array([
[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]]))
np.linalg.eig vs torch.linalg.eig
1 >> t = torch.diag(
torch.tensor((1., 2, 3)))
2 >> w, v = torch.linalg.eig(t)
torch.return_types.linalg_eig(
eigenvalues=tensor(
[1.+0.j, 2.+0.j, 3.+0.j]),
eigenvectors=tensor(
[[1.+0.j, 0.+0.j, 0.+0.j],
[0.+0.j, 1.+0.j, 0.+0.j],
[0.+0.j, 0.+0.j, 1.+0.j]]))
NumPy PyTorch
1 >> a = np.array(
(complex(1, 2),
complex(2, 1)))
2 >> np.amax(a)
(2+1j)
3 >> np.sort(a)
array([1.+2.j, 2.+1.j],
dtype=complex64)
Ordering complex numbers in NumPy vs. PyTorch
1 >> t = torch.tensor(
(complex(1, 2),
complex(2, 1)))
2 >> torch.amax(t)
RUNTIME ERROR
3 >> torch.sort(t)
RUNTIME ERROR
Principled discrepancies
- The PyTorch community seems OK with these principled discrepancies
- Different behavior must be very similar to NumPy’s behavior
- It’s OK to not support some things, as long as there are other mechanisms to do them
- PyTorch also has systematic discrepancies with NumPy that pass without
comment
- Type promotion
- Functions vs. method variants
- Returning scalars vs tensors
Lessons Learned
and Future Work
Recap
- NumPy and PyTorch are popular Python packages with operators that manipulate
tensors
- PyTorch implements many of NumPy’s operators, and extends them with support for
hardware accelerators, autograd, and other systems that support modern scientific
computing and deep learning
- The PyTorch community wants both the functionality and familiarity these operators
provide
- But it’s OK with principled differences
- To make implementing all these operators tractable, PyTorch has had to develop
architecture supporting C++ and CUDA implementations, autograd formulas and
testing
Lessons Learned
- Do the work to engage your community and listen carefully to their feedback
- At first it wasn’t clear whether people just wanted the functionality of NumPy operators, but our
community has clarified they also want fidelity
- Focus on developer efficiency
- Be clear about your own principles when implementing operators from
another project
Future Work
- Prioritize deprecating and updating the few PyTorch operators with
significantly different behavior than their NumPy counterparts
- Make success criteria clearer: implementing every NumPy operator is
impractical and inadvisable
- The new Python Array API may solve this problem
- More focus on SciPy functionality, including SciPy’s special module, linear
algebra module, and optimizers
Thank you!

More Related Content

What's hot (20)

PDF
GoLightly - a customisable virtual machine written in Go
Eleanor McHugh
 
PDF
The Ring programming language version 1.7 book - Part 84 of 196
Mahmoud Samir Fayed
 
DOCX
Network security
Rakesh chaudhary
 
PDF
HPX: C++11 runtime система для параллельных и распределённых вычислений
Platonov Sergey
 
PDF
Simple, fast, and scalable torch7 tutorial
Jin-Hwa Kim
 
PPT
Queues & ITS TYPES
Soumen Santra
 
PPTX
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
Yu-Hsun (lymanblue) Lin
 
PDF
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward
 
PDF
PyTorch crash course
Nader Karimi
 
PDF
Deep Learning with PyTorch
Mayur Bhangale
 
PDF
The Ring programming language version 1.5.1 book - Part 75 of 180
Mahmoud Samir Fayed
 
PDF
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Codemotion
 
PDF
program on string in java Lab file 2 (3-year)
Ankit Gupta
 
PDF
The Ring programming language version 1.5.4 book - Part 25 of 185
Mahmoud Samir Fayed
 
DOC
Awt
Swarup Saha
 
DOCX
Java programs
jojeph
 
PDF
Scaling Deep Learning with MXNet
AI Frontiers
 
PDF
The Ring programming language version 1.5.2 book - Part 76 of 181
Mahmoud Samir Fayed
 
PDF
Go vs C++ - CppRussia 2019 Piter BoF
Timur Safin
 
GoLightly - a customisable virtual machine written in Go
Eleanor McHugh
 
The Ring programming language version 1.7 book - Part 84 of 196
Mahmoud Samir Fayed
 
Network security
Rakesh chaudhary
 
HPX: C++11 runtime система для параллельных и распределённых вычислений
Platonov Sergey
 
Simple, fast, and scalable torch7 tutorial
Jin-Hwa Kim
 
Queues & ITS TYPES
Soumen Santra
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
Yu-Hsun (lymanblue) Lin
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward
 
PyTorch crash course
Nader Karimi
 
Deep Learning with PyTorch
Mayur Bhangale
 
The Ring programming language version 1.5.1 book - Part 75 of 180
Mahmoud Samir Fayed
 
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Codemotion
 
program on string in java Lab file 2 (3-year)
Ankit Gupta
 
The Ring programming language version 1.5.4 book - Part 25 of 185
Mahmoud Samir Fayed
 
Java programs
jojeph
 
Scaling Deep Learning with MXNet
AI Frontiers
 
The Ring programming language version 1.5.2 book - Part 76 of 181
Mahmoud Samir Fayed
 
Go vs C++ - CppRussia 2019 Piter BoF
Timur Safin
 

Similar to From NumPy to PyTorch (20)

PDF
PyTorch Introduction
Yash Kawdiya
 
PPTX
2Wisjshsbebe pehele isienew Dorene isksnwnw
YashAbhayKawdiyaH44
 
PDF
Machine learning with py torch
Riza Fahmi
 
PDF
pytdddddddddddddddddddddddddddddddddorch.pdf
drjigarsoni28
 
PPTX
Pytroch-basic.pptx
rebeen4
 
PDF
Dive Into PyTorch
Illarion Khlestov
 
PDF
PyTorch for Deep Learning Practitioners
Bayu Aldi Yansyah
 
PPTX
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
PDF
Julien Simon - Deep Dive: Compiling Deep Learning Models
Julien SIMON
 
PDF
TensorFlow example for AI Ukraine2016
Andrii Babii
 
PDF
Pytorch A Detailed Overview Agladze Mikhail
ilzobrzan47
 
PDF
Memory efficient pytorch
Hyungjoo Cho
 
PPTX
lec08-numpy.pptx
lekha572836
 
PDF
OpenPOWER Workshop in Silicon Valley
Ganesan Narayanasamy
 
PDF
Pytorch meetup
Dmitri Azarnyh
 
PDF
Numba: Array-oriented Python Compiler for NumPy
Travis Oliphant
 
PDF
1-pytorch-CNN-RNN.pdf
Andrey63387
 
PPTX
Scaling Python to CPUs and GPUs
Travis Oliphant
 
PPSX
Tensorflow basics
IshaNemaCSPOcertifie
 
PPTX
Introduction to Deep Learning and TensorFlow
Oswald Campesato
 
PyTorch Introduction
Yash Kawdiya
 
2Wisjshsbebe pehele isienew Dorene isksnwnw
YashAbhayKawdiyaH44
 
Machine learning with py torch
Riza Fahmi
 
pytdddddddddddddddddddddddddddddddddorch.pdf
drjigarsoni28
 
Pytroch-basic.pptx
rebeen4
 
Dive Into PyTorch
Illarion Khlestov
 
PyTorch for Deep Learning Practitioners
Bayu Aldi Yansyah
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Julien Simon - Deep Dive: Compiling Deep Learning Models
Julien SIMON
 
TensorFlow example for AI Ukraine2016
Andrii Babii
 
Pytorch A Detailed Overview Agladze Mikhail
ilzobrzan47
 
Memory efficient pytorch
Hyungjoo Cho
 
lec08-numpy.pptx
lekha572836
 
OpenPOWER Workshop in Silicon Valley
Ganesan Narayanasamy
 
Pytorch meetup
Dmitri Azarnyh
 
Numba: Array-oriented Python Compiler for NumPy
Travis Oliphant
 
1-pytorch-CNN-RNN.pdf
Andrey63387
 
Scaling Python to CPUs and GPUs
Travis Oliphant
 
Tensorflow basics
IshaNemaCSPOcertifie
 
Introduction to Deep Learning and TensorFlow
Oswald Campesato
 
Ad

Recently uploaded (20)

PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Ad

From NumPy to PyTorch

  • 1. From NumPy to PyTorch Mike Ruberry software engineer @ Facebook
  • 2. Outline - NumPy and working with tensors - PyTorch and hardware accelerators, autograd, and computational graphs - Adding NumPy operators to Pytorch - When PyTorch is Different from NumPy - Lessons learned and future work
  • 4. 1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets
  • 5. 1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Tensor creation
  • 6. 1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Addition
  • 7. 1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Matrix multiplication
  • 8. 1 >> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8)) array([-3.44509285e-16 +1.14423775e-17 j, 8.00000000e+00 -8.11483250e-16 j, 2.33486982e-16 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j, 9.95799250e-17 +2.33486982e-16 j, 0.00000000e+00 +7.66951701e-17 j, 1.14423775e-17 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j]) 2 >> A = np.array([[1,-2j],[2j,5]]) 3 >> np.linalg.cholesky(A) array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) More Complicated NumPy Snippets
  • 10. Composites Primitives 1 def sinc(x): 2 x = np.asanyarray(x) 3 y = pi * where(x == 0, 1.0e-20, x) 4 return sin(y)/y 1 double npy_copysign( double x, double y) 2 { 3 npy_uint32 hx , hy; 4 GET_HIGH_WORD(hx, x); 5 GET_HIGH_WORD(hy, y); 6 SET_HIGH_WORD(x, (hx & 0x7fffffff) | (hy & 0x80000000)); 7 return x; 8 }
  • 12. 1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets (Again)
  • 13. 1 >> import torch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets to PyTorch Snippets Tensor creation
  • 14. 1 >> import torch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Addition Simple NumPy Snippets to PyTorch Snippets
  • 15. 1 >> import torch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> torch.matmul(a, b) tensor([[ -7, -10], [-15, -22]]) Simple NumPy Snippets to PyTorch Snippets Matrix multiplication
  • 16. 1 >> import torch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> torch.matmul(a, b) tensor([[ -7, -10], [-15, -22]]) Simple PyTorch Snippets
  • 17. 1 >> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8)) array([-3.44509285e-16 +1.14423775e-17 j, 8.00000000e+00 -8.11483250e-16 j, 2.33486982e-16 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j, 9.95799250e-17 +2.33486982e-16 j, 0.00000000e+00 +7.66951701e-17 j, 1.14423775e-17 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j]) 2 >> A = np.array([[1,-2j],[2j,5]]) 3 >> np.linalg.cholesky(A) array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) More Complicated NumPy Snippets (Again)
  • 18. 1 >> torch.fft.fft(torch.exp(2j * math.pi * torch.arange(8) / 8)) 2 tensor([ 3.2584e-07+3.1787e-08j, 8.0000e+00+4.8023e-07j, 3 -3.2584e-07+3.1787e-08j, -1.6859e-07+3.1787e-08j, 4 -3.8941e-07-2.0663e-07j, 1.3691e-07-1.9412e-07j, 5 3.8941e-07-2.0663e-07j, 1.6859e-07+3.1787e-08j]) 1 >> A = torch.tensor([[1,-2j],[2j,5]]) 2 >> torch.linalg.cholesky(A) 3 tensor([[1.+0.j, 0.+0.j], 4 [0.+2.j, 1.+0.j]]) More Complicated PyTorch Snippets
  • 19. 1 >> t = torch.tensor((1, 2, 3)) 2 >> a = t.numpy() 3 array([1, 2, 3]) 3 >> b = np.array((-1, -2, -3)) 4 >> result = a + b array([0, 0, 0]) 5 >> torch.from_numpy(result) tensor([0, 0, 0]) PyTorch and NumPy Interoperability
  • 20. Does PyTorch have EVERY NumPy operator? - No! - NumPy has a lot of operators: A LOT - Many of them are rarely used, niche, deprecated, or in need of deprecation - But PyTorch does have hundreds of NumPy operators
  • 21. 1 >> import torch 2 >> a = torch.tensor(((1., 2), (3, 4)), device='cuda') tensor([[1, 2], [3, 4]], device='cuda:0') 3 >> b = torch.tensor(((-1, -2), (-3, -4)), device='cuda') 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]], device='cuda:0') 5 >> torch.matmul(a.float(), b.float()) tensor([[ -7., -10.], [-15., -22.]], device='cuda:0') Simple PyTorch Snippets on CUDA
  • 22. 1 >> a = torch.tensor((1., 2.), requires_grad=True) 2 >> b = torch.tensor((3., 4.)) 3 >> result = (a * b).sum() 4 >> result.backward() 5 >> a.grad tensor([3., 4.]) Autograd in PyTorch
  • 23. 1 def sinc(x): 2 y = math.pi * torch.where(x == 0, 1.0e-20, x) 3 return torch.sin(y)/y 4 5 scripted_sinc = torch.jit.script(sinc) graph(%x.1 : Tensor): %1 : float = prim::Constant[value=3.1415926535897931 ] %3 : int = prim::Constant[value=0] %5 : float = prim::Constant[value=9.9999999999999995e-21 ] %4 : Tensor = aten::eq(%x.1, %3) %7 : Tensor = aten::where(%4, %5, %x.1) %y.1 : Tensor = aten::mul(%7, %1) %10 : Tensor = aten::sin(%y.1) %12 : Tensor = aten::div(%10, %y.1) return (%12) Computational Graphs in PyTorch
  • 24. 1 >> t = torch.randn(10) 2 >> linear_layer = torch.nn.Linear(10, 5) 3 >> linear_layer(t) tensor([ 0.0066, 0.2467, -0.0137, -0.4091, -1.1756], grad_fn=<AddBackward0>) Deep Learning in PyTorch
  • 25. PyTorch as NumPy+ - While PyTorch doesn’t have every NumPy operator, for those it supports we can think of it as NumPy PLUS: - Support for hardware accelerators, like GPUs and TPUs - Support for autograd - Support computational graphs - Support for deep learning - A C++ API - … and many additional features (visualization, distributed training, …) - PyTorch also has additional operators that NumPy does not
  • 26. PyTorch Behind the Scenes - To recap, NumPy had… - Composite operators (typically implemented in Python) - Primitive operators (implemented in C++) - And PyTorch has... - Composite operators (implemented in C++) - Primitive operators (implemented in C++, CPU intrinsics, and CUDA) - Computational graphs (executed by torchscript or XLA) - Plus autograd formulas for differentiable operations
  • 27. 1 def sinc(x): 2 x = np.asanyarray(x) 3 y = pi * where(x == 0, 1.0e-20, x) 4 return sin(y)/y Sinc in NumPy (reminder)
  • 28. 1 static void sinc_kernel(TensorIteratorBase& iter) { 2 AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( kBFloat16, iter.common_dtype(), "sinc_cpu", [&]() { 3 cpu_kernel( 4 iter, 5 [=](scalar_t a) -> scalar_t { 6 if (a == scalar_t(0)) { 7 return scalar_t(1); 8 } else { 9 scalar_t product = c10::pi<scalar_t> * a; 10 return std::sin(product) / product; 11 } 12 }); 13 }); 14 } Sinc in PyTorch, CPU kernel
  • 29. Sinc in PyTorch, Autograd Formula 1 name: sinc(Tensor self) -> Tensor 2 self: grad * ((M_PI * self * (M_PI * self).cos() - (M_PI * self).sin()) / (M_PI * self * self)).conj()
  • 31. Porting an operator from NumPy - Need to write a C++ implementation - Possibly a CPU kernel or a CUDA kernel - Need to write an autograd formula (if the op is differentiable) - Need to write comprehensive tests (more on this in a moment) … why do we bother?
  • 35. Porting an operator from NumPy - Need to write a C++ implementation - Possibly a CPU kernel or a CUDA kernel - Made easier with the C++ “TensorIterator” architecture - Need to write an autograd formula (if the op is differentiable) - Simplified by allowing users to write Pythonic YAML formulas - Need to write comprehensive tests (more on this in a moment) - Significant coverage automated with PyTorch’s OpInfo metadata and test generation framework
  • 36. PyTorch’s test matrix - Tensor properties: - Datatype (long, float, complexfloat, etc.) - Device (CPU, CUDA, TPU, etc.) - Differentiable operations support autograd - Operations need to work in computational graphs - Operations have “function,” “method” and “inplace” variants
  • 37. OpInfo for torch.mul 1 OpInfo('mul', 2 aliases =('multiply',), 3 dtypes =all_types_and_complex_and ( torch.float16, torch.bfloat16, torch.bool), 4 sample_inputs_func =sample_inputs_binary_pwise )
  • 38. OpInfo for torch.sin 1 UnaryUfuncInfo ('sin', 2 ref=np.sin, 3 dtypes=all_types_and_complex_and ( torch.bool, torch.bfloat16), 4 dtypesIfCUDA=all_types_and_complex_and ( torch.bool, torch.half), 5 handles_large_floats =False, 6 handles_complex_extremals =False, 7 safe_casts_outputs =True, 8 decorators=(precisionOverride ({torch.bfloat16: 1e-2}),))
  • 39. OpInfo test template 1 @ops(unary_ufuncs) 2 def test_contig_vs_transposed (self, device, dtype, op): 3 contig = make_tensor((789, 357), device=device, dtype=dtype, low=op.domain[0], high=op.domain[1]) 4 non_contig = contig.T 5 self.assertTrue(contig.is_contiguous()) 6 self.assertFalse(non_contig.is_contiguous()) 7 torch_kwargs, _ = op.sample_kwargs(device, dtype, contig) 8 self.assertEqual( op(contig, **torch_kwargs).T, op(non_contig, **torch_kwargs))
  • 40. Instantiated tests for torch.sin @ops(unary_ufuncs) def test_contig_vs_transposed (self, device, dtype, op): test_contig_vs_transposed_sin_cuda_complex64 test_contig_vs_transposed_sin_cuda_float16 test_contig_vs_transposed_sin_cuda_float32 test_contig_vs_transposed_sin_cuda_int64 test_contig_vs_transposed_sin_cuda_uint8 test_contig_vs_transposed_sin_cpu_complex64 test_contig_vs_transposed_sin_cpu_float16 test_contig_vs_transposed_sin_cpu_float32 test_contig_vs_transposed_sin_cpu_int64 test_contig_vs_transposed_sin_cpu_uint8
  • 41. Example properties validated for every operator - Autograd is implemented correctly - Tested using finite differences - The operation works with torchscript and torch.fx - The operation’s function, method, and inplace variants all compute the same operation - One big caveat: can’t automatically test correctness except for special classes of operators (like unary ufuncs)
  • 42. Features of PyTorch’s test generator - Works with pytest and unittest - Dynamically identifies available device types - Allows for device type-specific logic for setup and teardown - Extensible by other packages adding new device types (like PyTorch/XLA) - Provides a central “source of truth” for operator’s functionality - Makes it easy to test new features with every PyTorch operator
  • 43. When PyTorch is Different from NumPy
  • 44. NumPy PyTorch 1 >> a = np.array((1, 2, 3)) 2 >> np.reciprocal(a) array([1, 0, 0]) np.reciprocal vs torch.reciprocal 1 >> t = torch.tensor((1, 2, 3)) 2 >> torch.reciprocal(t) tensor([ 1.0000, 0.5000, 0.3333])
  • 45. NumPy PyTorch 1 >> a = np.diag( np.array((1., 2, 3))) 2 >> w, v = np.linalg.eig(a) array([1., 2., 3.]), array([ [1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])) np.linalg.eig vs torch.linalg.eig 1 >> t = torch.diag( torch.tensor((1., 2, 3))) 2 >> w, v = torch.linalg.eig(t) torch.return_types.linalg_eig( eigenvalues=tensor( [1.+0.j, 2.+0.j, 3.+0.j]), eigenvectors=tensor( [[1.+0.j, 0.+0.j, 0.+0.j], [0.+0.j, 1.+0.j, 0.+0.j], [0.+0.j, 0.+0.j, 1.+0.j]]))
  • 46. NumPy PyTorch 1 >> a = np.array( (complex(1, 2), complex(2, 1))) 2 >> np.amax(a) (2+1j) 3 >> np.sort(a) array([1.+2.j, 2.+1.j], dtype=complex64) Ordering complex numbers in NumPy vs. PyTorch 1 >> t = torch.tensor( (complex(1, 2), complex(2, 1))) 2 >> torch.amax(t) RUNTIME ERROR 3 >> torch.sort(t) RUNTIME ERROR
  • 47. Principled discrepancies - The PyTorch community seems OK with these principled discrepancies - Different behavior must be very similar to NumPy’s behavior - It’s OK to not support some things, as long as there are other mechanisms to do them - PyTorch also has systematic discrepancies with NumPy that pass without comment - Type promotion - Functions vs. method variants - Returning scalars vs tensors
  • 49. Recap - NumPy and PyTorch are popular Python packages with operators that manipulate tensors - PyTorch implements many of NumPy’s operators, and extends them with support for hardware accelerators, autograd, and other systems that support modern scientific computing and deep learning - The PyTorch community wants both the functionality and familiarity these operators provide - But it’s OK with principled differences - To make implementing all these operators tractable, PyTorch has had to develop architecture supporting C++ and CUDA implementations, autograd formulas and testing
  • 50. Lessons Learned - Do the work to engage your community and listen carefully to their feedback - At first it wasn’t clear whether people just wanted the functionality of NumPy operators, but our community has clarified they also want fidelity - Focus on developer efficiency - Be clear about your own principles when implementing operators from another project
  • 51. Future Work - Prioritize deprecating and updating the few PyTorch operators with significantly different behavior than their NumPy counterparts - Make success criteria clearer: implementing every NumPy operator is impractical and inadvisable - The new Python Array API may solve this problem - More focus on SciPy functionality, including SciPy’s special module, linear algebra module, and optimizers