SlideShare a Scribd company logo
Tensor Comprehensions
って、何?
2018/03/11
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
Twitter (2009年~) :
@Vengineer
ソースコード解析職人
Announcing Tensor Comprehensions
February 14, 2018
https://research.fb.com/announcing-tensor-comprehensions/
Facebook AI Research
カスタムレイヤーを書くための道具。
カスタムレイヤーを書ける人が限られているので、
それを普通の人でもある程度の性能が出すためのツール。
なので、PyTorchやCaffe2だけでなく、
他のMLフレームワークでも利用可能、ということになっている。
現時点でのターゲットは、CUDA のみ。
現在のバージョンは、v0.1.1
Tensor Comprehensionsって、何?
https://research.fb.com/announcing-tensor-comprehensions/
import tensor_comprehensions as tc
import torch
lang = """
def matmul(float(M,N) A, float(N,K) B) -> (output) {
output(i, j) +=! A(i, kk) * B(kk, j)
}
"""
matmul = tc.define(lang, name="matmul")
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()
out = matmul(mat1, mat2)
始めてみよう!
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html
def matmul(float(M,N) A, float(N,K) B) -> (output) {
output(i, j) +=! A(i, kk) * B(kk, j)
}
=>
for(int i = 0; i < M; i++) {
for(int j = 0; j < K; j++) {
output(i,j) = 0.0f;
for(int kk = 0; kk < N; kk++) {
output(i,j) += A(i,kk) * B(kk,j);
}
}
}
記法例 と 等価コード
def conv(float(B,IP,H,W) input, float(OP,IP,KH,KW) weight)
-> (output) {
output(b, op, h, w) += input(b, ip, h + kh, w + kw)
* weight(op, ip, kh, kw)
}
Simple 2-D convolution (no stride, no padding)
https://facebookresearch.github.io/TensorComprehensions/introduction.html
def maxpool2x2(float(B,C,H,W) input)
-> (output) {
output(b,c,i,j) max= input(b,c,2*i + kw, 2*j + kh)
where kw in 0:2, kh in 0:2
}
Simple 2D max pooling
https://facebookresearch.github.io/TensorComprehensions/introduction.html
Pooling Layers (Average pooling / Max pooling)
Convolution layers (Simple Convolution / Strided Convolution / Strided Convolution
Gradient / Simple Group Convolution / Group Convolution Strided)
Linear layers (Fully Connected layer)
Non-Linear layers (ReLU / Sigmoid / Softmax / Tanh / Cosine)
Math Operations (TensorDot / Matmul / Matmul Gradient / Batch Matmul / Absolute / Add /
Indexing / Lookup Table / Transpose / Concat / Cast / Copy / Scale)
Fused layers (FCRelu / Small MobileNet)
Normalization layers (Batch Normalization / Layer Normalization)
Distance Functions (Cosine Similarity)
レイヤーデータベース
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/layers_database.html
遺伝的アルゴリズムによる自動最適化
パラメータ:
 Number of generations: The number of tuning generation to be run.
 Population size: The number of candidates in each generation.
 Number of elites: The number of best candidates that are preserved intact
           between generations (without any mutations).
 Crossover rate: The rate at which new candidates are bred instead of just surviving across generations.
 Mutation rate: The rate at which candidate options are randomly changed (mutated).
 Number of threads: The number of threads that are used to compile different candidates in parallel.
 GPUs: A comma separated list of GPUs (ids) to use for evaluating candidates (e.g., “0,1,2,3”).
     RNG state: The state used to seed the tuner’s RNG.
 Proto: A protobuf filename to (re)store compilation results
     and profiling information of the candidate solutions.
 min_launch_total_threads: Prune out kernels mapped to fewer than this many threads and block.
                Set this to 1 to avoid pruning.
オートチューナー
https://facebookresearch.github.io/TensorComprehensions/autotuner.html
Tensor Comprehensions in PyTorch
Mar 5, 2018
http://pytorch.org/2018/03/05/tensor-comprehensions.html
PYTORCH
 1). Define your TC language and pass it to tc.define
 2). Create input torch tensors
 3). Run the layer and get output
import tensor_comprehensions as tc
import torch
MATMUL_LANG = """
def matmul(float(M,N) A, float(N,K) B) -> (output) { <= 1)
output(i, j) +=! A(i, kk) * B(kk, j)
}
"""
# the `name` should match the definition name in the `lang`
matmul = tc.define(MATMUL_LANG, name="matmul") <= 1)
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda() <= 2)
out = matmul(mat1, mat2) <= 3)
TCを使って、PyTorchのレイヤーを書くには!
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
tensor_comprehensions.define(lang, **kwargs_define)
パラメータ:
lang (string, required)
name (string, required)
training (bool)
backward (string, optional)
constants (dict,  optional)
inject_kernel (string, optional)
cuda_code (string, optional)
戻り値:
  TC layer that you can run by passing the tensors.
レイヤーの定義
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
class tensor_comprehensions.TcUnit(lang, **kwargs_define)
__call__(*inputs, **kwargs)
パラメータ:
*inputs (required)
options (optional)
outputs (optional)
cache (string, optional)
grid (int, 3D list)
block (int, 3D list)
reorder_function (optional)
戻り値:
List of PyTorch tensors/Variables which is the output of running TC layer.
レイヤーの実行
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
import tensor_comprehensions as tc
import torch
lang = """
def add(float(N) A, float(N) B) -> (output) {
output(i) = A(i) + B(i) + 1
}
"""
add = tc.define(lang, name="add")
a, b = torch.randn(100).cuda(), torch.randn(100).cuda()
out = add(a, b, grid=[1, 1, 1], block=[100, 1, 1])
通常は、TCコードを指定する
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
cuda_code = """
extern "C"{
__global__ void my_add(float* __restrict__ output, const float*
__restrict__ A, const float* __restrict B)
{
int t = threadIdx.x;
output[t] = A[t] + B[t];
}
}
"""
add = tc.define(lang, name="add",
inject_kernel="my_add", cuda_code=cuda_code)
オプションで、CUDAコードを指定できる
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
ありがとうございました
Twitter (2009年~) :
@Vengineer
ソースコード解析職人

More Related Content

PDF
Tiramisu概要
Mr. Vengineer
 
PDF
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Mr. Vengineer
 
PDF
TensorFlow Lite (r1.5) & Android 8.1 Neural Network API
Mr. Vengineer
 
PDF
Tiramisu をちょっと、味見してみました。
Mr. Vengineer
 
PDF
TensorFlow XLA RPC
Mr. Vengineer
 
PDF
TensorFlow local Python XLA client
Mr. Vengineer
 
PDF
LeFlowを調べてみました
Mr. Vengineer
 
PDF
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Mr. Vengineer
 
Tiramisu概要
Mr. Vengineer
 
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Mr. Vengineer
 
TensorFlow Lite (r1.5) & Android 8.1 Neural Network API
Mr. Vengineer
 
Tiramisu をちょっと、味見してみました。
Mr. Vengineer
 
TensorFlow XLA RPC
Mr. Vengineer
 
TensorFlow local Python XLA client
Mr. Vengineer
 
LeFlowを調べてみました
Mr. Vengineer
 
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Mr. Vengineer
 

What's hot (20)

PDF
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Mr. Vengineer
 
PDF
Facebook Glow Compiler のソースコードをグダグダ語る会
Mr. Vengineer
 
PDF
Антон Бикинеев, Reflection in C++Next
Sergey Platonov
 
PDF
TVM VTA (TSIM)
Mr. Vengineer
 
PDF
Антон Бикинеев, Writing good std::future&lt; C++ >
Sergey Platonov
 
PDF
Fuzzing: The New Unit Testing
Dmitry Vyukov
 
PDF
C++20 the small things - Timur Doumler
corehard_by
 
PDF
C++ idioms by example (Nov 2008)
Olve Maudal
 
PPTX
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
PDF
C++ How I learned to stop worrying and love metaprogramming
cppfrug
 
PPTX
Дмитрий Нестерук, Паттерны проектирования в XXI веке
Sergey Platonov
 
PDF
Joel Falcou, Boost.SIMD
Sergey Platonov
 
PPTX
Дмитрий Демчук. Кроссплатформенный краш-репорт
Sergey Platonov
 
PDF
RAII and ScopeGuard
Andrey Dankevich
 
PDF
Kirk Shoop, Reactive programming in C++
Sergey Platonov
 
PDF
2018 cosup-delete unused python code safely - english
Jen Yee Hong
 
PDF
Clang tidy
Yury Yafimachau
 
PDF
Basic c++ 11/14 for python programmers
Jen Yee Hong
 
PDF
TDD in C - Recently Used List Kata
Olve Maudal
 
PDF
Boost.Python - domesticating the snake
Sławomir Zborowski
 
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Mr. Vengineer
 
Facebook Glow Compiler のソースコードをグダグダ語る会
Mr. Vengineer
 
Антон Бикинеев, Reflection in C++Next
Sergey Platonov
 
TVM VTA (TSIM)
Mr. Vengineer
 
Антон Бикинеев, Writing good std::future&lt; C++ >
Sergey Platonov
 
Fuzzing: The New Unit Testing
Dmitry Vyukov
 
C++20 the small things - Timur Doumler
corehard_by
 
C++ idioms by example (Nov 2008)
Olve Maudal
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
C++ How I learned to stop worrying and love metaprogramming
cppfrug
 
Дмитрий Нестерук, Паттерны проектирования в XXI веке
Sergey Platonov
 
Joel Falcou, Boost.SIMD
Sergey Platonov
 
Дмитрий Демчук. Кроссплатформенный краш-репорт
Sergey Platonov
 
RAII and ScopeGuard
Andrey Dankevich
 
Kirk Shoop, Reactive programming in C++
Sergey Platonov
 
2018 cosup-delete unused python code safely - english
Jen Yee Hong
 
Clang tidy
Yury Yafimachau
 
Basic c++ 11/14 for python programmers
Jen Yee Hong
 
TDD in C - Recently Used List Kata
Olve Maudal
 
Boost.Python - domesticating the snake
Sławomir Zborowski
 
Ad

Similar to Tensor comprehensions (20)

PDF
2016 dg2
Shin Asakawa
 
PPTX
Deep learning study 3
San Kim
 
PPTX
Tensorflow internal
Hyunghun Cho
 
PDF
Dive Into PyTorch
Illarion Khlestov
 
PDF
Random Thoughts on Paper Implementations [KAIST 2018]
Taehoon Kim
 
PDF
Julien Simon - Deep Dive: Compiling Deep Learning Models
Julien SIMON
 
PPTX
PyTorch Tutorial for NTU Machine Learing Course 2017
Yu-Hsun (lymanblue) Lin
 
PPTX
pytorch_tutorial_follow_this_to_start.pptx
gyungmindenniskim
 
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
PDF
Memory efficient pytorch
Hyungjoo Cho
 
PDF
Intel Nervana Graph とは?
Mr. Vengineer
 
PPTX
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
PDF
Pytorch A Detailed Overview Agladze Mikhail
ilzobrzan47
 
PDF
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
AI Frontiers
 
PDF
Deep learning with C++ - an introduction to tiny-dnn
Taiga Nomi
 
PDF
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
Alluxio, Inc.
 
PDF
Keras and TensorFlow
NopphawanTamkuan
 
PDF
Wwdc extended21 tomonish
智也 大西
 
PDF
1-pytorch-CNN-RNN.pdf
Andrey63387
 
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Databricks
 
2016 dg2
Shin Asakawa
 
Deep learning study 3
San Kim
 
Tensorflow internal
Hyunghun Cho
 
Dive Into PyTorch
Illarion Khlestov
 
Random Thoughts on Paper Implementations [KAIST 2018]
Taehoon Kim
 
Julien Simon - Deep Dive: Compiling Deep Learning Models
Julien SIMON
 
PyTorch Tutorial for NTU Machine Learing Course 2017
Yu-Hsun (lymanblue) Lin
 
pytorch_tutorial_follow_this_to_start.pptx
gyungmindenniskim
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
Memory efficient pytorch
Hyungjoo Cho
 
Intel Nervana Graph とは?
Mr. Vengineer
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Pytorch A Detailed Overview Agladze Mikhail
ilzobrzan47
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
AI Frontiers
 
Deep learning with C++ - an introduction to tiny-dnn
Taiga Nomi
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
Alluxio, Inc.
 
Keras and TensorFlow
NopphawanTamkuan
 
Wwdc extended21 tomonish
智也 大西
 
1-pytorch-CNN-RNN.pdf
Andrey63387
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Databricks
 
Ad

More from Mr. Vengineer (18)

PDF
XilinxのxsimでSoftware Driven Verification.pdf
Mr. Vengineer
 
PDF
VerilatorとSystemCでSoftware Driven Verification
Mr. Vengineer
 
PDF
VerilatorとSystemC
Mr. Vengineer
 
PDF
Cloud TPU Driver API ソースコード解析
Mr. Vengineer
 
PDF
Cloud Deep Learning Chips Training & Inference
Mr. Vengineer
 
PDF
TensorFlow Lite Delegateとは?
Mr. Vengineer
 
PDF
Pixel Visual Core device driver source code analysis
Mr. Vengineer
 
PDF
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
Mr. Vengineer
 
PDF
Ultra96(UltraZed)実践勉強会
Mr. Vengineer
 
PDF
Tensorflow dynamically loadable XLA plugin ソースコード解析
Mr. Vengineer
 
PDF
「ディープラーニングでは、エコシステムが大切よ!」
Mr. Vengineer
 
PDF
TensorFlow XLA とハードウェア
Mr. Vengineer
 
PDF
2017年のFPGA Community活動について
Mr. Vengineer
 
PDF
Zynq VIPを利用したテストベンチ
Mr. Vengineer
 
PDF
TensorFlow XLAの可能性
Mr. Vengineer
 
PDF
AWS EC2 F1とXilinx SDAccel
Mr. Vengineer
 
PDF
DSPでディープラーニング
Mr. Vengineer
 
PDF
TensorFlow XLAは、 中で何をやっているのか?
Mr. Vengineer
 
XilinxのxsimでSoftware Driven Verification.pdf
Mr. Vengineer
 
VerilatorとSystemCでSoftware Driven Verification
Mr. Vengineer
 
VerilatorとSystemC
Mr. Vengineer
 
Cloud TPU Driver API ソースコード解析
Mr. Vengineer
 
Cloud Deep Learning Chips Training & Inference
Mr. Vengineer
 
TensorFlow Lite Delegateとは?
Mr. Vengineer
 
Pixel Visual Core device driver source code analysis
Mr. Vengineer
 
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
Mr. Vengineer
 
Ultra96(UltraZed)実践勉強会
Mr. Vengineer
 
Tensorflow dynamically loadable XLA plugin ソースコード解析
Mr. Vengineer
 
「ディープラーニングでは、エコシステムが大切よ!」
Mr. Vengineer
 
TensorFlow XLA とハードウェア
Mr. Vengineer
 
2017年のFPGA Community活動について
Mr. Vengineer
 
Zynq VIPを利用したテストベンチ
Mr. Vengineer
 
TensorFlow XLAの可能性
Mr. Vengineer
 
AWS EC2 F1とXilinx SDAccel
Mr. Vengineer
 
DSPでディープラーニング
Mr. Vengineer
 
TensorFlow XLAは、 中で何をやっているのか?
Mr. Vengineer
 

Recently uploaded (20)

PDF
Endalamaw Kebede.pdfvvbhjjnhgggftygtttfgh
SirajudinAkmel1
 
PPTX
great itemsgreat itemsgreat itemsgreat items.pptx
saurabh13smr
 
PPTX
atoma.pptxejejejejeejejjeejeejeju3u3u3u3
manthan912009
 
PPTX
2.Important-Definihhhhhhtions18 (1).pptx
trishalasharma7
 
PPTX
cocomo-220726173706-141e08f0.tyuiuuupptx
DharaniMani4
 
PPTX
Operating-Systems-A-Journey ( by information
parthbhanushali307
 
PPTX
Normal distriutionvggggggggggggggggggg.pptx
JayeshTaneja4
 
PPTX
Aryanbarot28.pptx Introduction of window os for the projects
aryanbarot004
 
PPTX
PPT FOR BASIC UNDERSTANDING OF COMPUTER HARDWARE, SOFTWARE & FIRMWARE
kavishvora10
 
PPTX
13. ANAESTHETICS AND ALCOHOLS.pptx fucking
sriramraja650
 
PDF
Portable Veterinary Ultrasound Scanners & Animal Medical Equipment - TcCryo
3447752272
 
PPTX
原版UMiami毕业证文凭迈阿密大学学费单定制学历在线制作硕士毕业证
jicaaeb0
 
PPTX
Modern machinery.pptx sjsjnshhsnsnnjnnbbbb
raipureastha08
 
PPT
community diagnosis slides show health. ppt
michaelbrucebwana
 
PPTX
Intro_S4HANA_Using_Global_Bike_Slides_SD_en_v4.1.pptx
trishalasharma7
 
PPTX
办理HFM文凭|购买代特莫尔德音乐学院毕业证文凭100%复刻安全可靠的
1cz3lou8
 
PPTX
Boolean Algebra-Properties and Theorems.pptx
bhavanavarri5458
 
PPTX
Mobile-Device-Management-MDM-Architecture.pptx
pranavnandwanshi99
 
PPTX
basic_parts-of_computer-1618-754-622.pptx
patelravi16187
 
PPTX
Basics of Memristors from zero to hero.pptx
onterusmail
 
Endalamaw Kebede.pdfvvbhjjnhgggftygtttfgh
SirajudinAkmel1
 
great itemsgreat itemsgreat itemsgreat items.pptx
saurabh13smr
 
atoma.pptxejejejejeejejjeejeejeju3u3u3u3
manthan912009
 
2.Important-Definihhhhhhtions18 (1).pptx
trishalasharma7
 
cocomo-220726173706-141e08f0.tyuiuuupptx
DharaniMani4
 
Operating-Systems-A-Journey ( by information
parthbhanushali307
 
Normal distriutionvggggggggggggggggggg.pptx
JayeshTaneja4
 
Aryanbarot28.pptx Introduction of window os for the projects
aryanbarot004
 
PPT FOR BASIC UNDERSTANDING OF COMPUTER HARDWARE, SOFTWARE & FIRMWARE
kavishvora10
 
13. ANAESTHETICS AND ALCOHOLS.pptx fucking
sriramraja650
 
Portable Veterinary Ultrasound Scanners & Animal Medical Equipment - TcCryo
3447752272
 
原版UMiami毕业证文凭迈阿密大学学费单定制学历在线制作硕士毕业证
jicaaeb0
 
Modern machinery.pptx sjsjnshhsnsnnjnnbbbb
raipureastha08
 
community diagnosis slides show health. ppt
michaelbrucebwana
 
Intro_S4HANA_Using_Global_Bike_Slides_SD_en_v4.1.pptx
trishalasharma7
 
办理HFM文凭|购买代特莫尔德音乐学院毕业证文凭100%复刻安全可靠的
1cz3lou8
 
Boolean Algebra-Properties and Theorems.pptx
bhavanavarri5458
 
Mobile-Device-Management-MDM-Architecture.pptx
pranavnandwanshi99
 
basic_parts-of_computer-1618-754-622.pptx
patelravi16187
 
Basics of Memristors from zero to hero.pptx
onterusmail
 

Tensor comprehensions

  • 2. ブログ (2007年~) : Vengineerの戯言  http://blogs.yahoo.co.jp/verification_engineer SlideShare :  https://www.slideshare.net/ssuser479fa3 Twitter (2009年~) : @Vengineer ソースコード解析職人
  • 3. Announcing Tensor Comprehensions February 14, 2018 https://research.fb.com/announcing-tensor-comprehensions/ Facebook AI Research
  • 6. import tensor_comprehensions as tc import torch lang = """ def matmul(float(M,N) A, float(N,K) B) -> (output) { output(i, j) +=! A(i, kk) * B(kk, j) } """ matmul = tc.define(lang, name="matmul") mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda() out = matmul(mat1, mat2) 始めてみよう! https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html
  • 7. def matmul(float(M,N) A, float(N,K) B) -> (output) { output(i, j) +=! A(i, kk) * B(kk, j) } => for(int i = 0; i < M; i++) { for(int j = 0; j < K; j++) { output(i,j) = 0.0f; for(int kk = 0; kk < N; kk++) { output(i,j) += A(i,kk) * B(kk,j); } } } 記法例 と 等価コード
  • 8. def conv(float(B,IP,H,W) input, float(OP,IP,KH,KW) weight) -> (output) { output(b, op, h, w) += input(b, ip, h + kh, w + kw) * weight(op, ip, kh, kw) } Simple 2-D convolution (no stride, no padding) https://facebookresearch.github.io/TensorComprehensions/introduction.html
  • 9. def maxpool2x2(float(B,C,H,W) input) -> (output) { output(b,c,i,j) max= input(b,c,2*i + kw, 2*j + kh) where kw in 0:2, kh in 0:2 } Simple 2D max pooling https://facebookresearch.github.io/TensorComprehensions/introduction.html
  • 10. Pooling Layers (Average pooling / Max pooling) Convolution layers (Simple Convolution / Strided Convolution / Strided Convolution Gradient / Simple Group Convolution / Group Convolution Strided) Linear layers (Fully Connected layer) Non-Linear layers (ReLU / Sigmoid / Softmax / Tanh / Cosine) Math Operations (TensorDot / Matmul / Matmul Gradient / Batch Matmul / Absolute / Add / Indexing / Lookup Table / Transpose / Concat / Cast / Copy / Scale) Fused layers (FCRelu / Small MobileNet) Normalization layers (Batch Normalization / Layer Normalization) Distance Functions (Cosine Similarity) レイヤーデータベース https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/layers_database.html
  • 11. 遺伝的アルゴリズムによる自動最適化 パラメータ:  Number of generations: The number of tuning generation to be run.  Population size: The number of candidates in each generation.  Number of elites: The number of best candidates that are preserved intact            between generations (without any mutations).  Crossover rate: The rate at which new candidates are bred instead of just surviving across generations.  Mutation rate: The rate at which candidate options are randomly changed (mutated).  Number of threads: The number of threads that are used to compile different candidates in parallel.  GPUs: A comma separated list of GPUs (ids) to use for evaluating candidates (e.g., “0,1,2,3”).      RNG state: The state used to seed the tuner’s RNG.  Proto: A protobuf filename to (re)store compilation results      and profiling information of the candidate solutions.  min_launch_total_threads: Prune out kernels mapped to fewer than this many threads and block.                 Set this to 1 to avoid pruning. オートチューナー https://facebookresearch.github.io/TensorComprehensions/autotuner.html
  • 12. Tensor Comprehensions in PyTorch Mar 5, 2018 http://pytorch.org/2018/03/05/tensor-comprehensions.html PYTORCH
  • 13.  1). Define your TC language and pass it to tc.define  2). Create input torch tensors  3). Run the layer and get output import tensor_comprehensions as tc import torch MATMUL_LANG = """ def matmul(float(M,N) A, float(N,K) B) -> (output) { <= 1) output(i, j) +=! A(i, kk) * B(kk, j) } """ # the `name` should match the definition name in the `lang` matmul = tc.define(MATMUL_LANG, name="matmul") <= 1) mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda() <= 2) out = matmul(mat1, mat2) <= 3) TCを使って、PyTorchのレイヤーを書くには! https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 14. tensor_comprehensions.define(lang, **kwargs_define) パラメータ: lang (string, required) name (string, required) training (bool) backward (string, optional) constants (dict,  optional) inject_kernel (string, optional) cuda_code (string, optional) 戻り値:   TC layer that you can run by passing the tensors. レイヤーの定義 https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 15. class tensor_comprehensions.TcUnit(lang, **kwargs_define) __call__(*inputs, **kwargs) パラメータ: *inputs (required) options (optional) outputs (optional) cache (string, optional) grid (int, 3D list) block (int, 3D list) reorder_function (optional) 戻り値: List of PyTorch tensors/Variables which is the output of running TC layer. レイヤーの実行 https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 16. import tensor_comprehensions as tc import torch lang = """ def add(float(N) A, float(N) B) -> (output) { output(i) = A(i) + B(i) + 1 } """ add = tc.define(lang, name="add") a, b = torch.randn(100).cuda(), torch.randn(100).cuda() out = add(a, b, grid=[1, 1, 1], block=[100, 1, 1]) 通常は、TCコードを指定する https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 17. cuda_code = """ extern "C"{ __global__ void my_add(float* __restrict__ output, const float* __restrict__ A, const float* __restrict B) { int t = threadIdx.x; output[t] = A[t] + B[t]; } } """ add = tc.define(lang, name="add", inject_kernel="my_add", cuda_code=cuda_code) オプションで、CUDAコードを指定できる https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 18. ブログ (2007年~) : Vengineerの戯言  http://blogs.yahoo.co.jp/verification_engineer SlideShare :  https://www.slideshare.net/ssuser479fa3 ありがとうございました Twitter (2009年~) : @Vengineer ソースコード解析職人