Tensor comprehensions

Tensor Comprehensions
って、何？
2018/03/11

ブログ (2007年～) : Vengineerの戯言
　http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
　https://www.slideshare.net/ssuser479fa3
Twitter (2009年～) :
＠Vengineer
ソースコード解析職人

Announcing Tensor Comprehensions
February 14, 2018
https://research.fb.com/announcing-tensor-comprehensions/
Facebook AI Research

カスタムレイヤーを書くための道具。
カスタムレイヤーを書ける人が限られているので、
それを普通の人でもある程度の性能が出すためのツール。
なので、PyTorchやCaffe2だけでなく、
他のMLフレームワークでも利用可能、ということになっている。
現時点でのターゲットは、CUDA のみ。
現在のバージョンは、v0.1.1
Tensor Comprehensionsって、何？

https://research.fb.com/announcing-tensor-comprehensions/

import tensor_comprehensions as tc
import torch
lang = """
def matmul(float(M,N) A, float(N,K) B) -> (output) {
output(i, j) +=! A(i, kk) * B(kk, j)
}
"""
matmul = tc.define(lang, name="matmul")
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()
out = matmul(mat1, mat2)
始めてみよう！
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html

def matmul(float(M,N) A, float(N,K) B) -> (output) {
}
=>
for(int i = 0; i < M; i++) {
for(int j = 0; j < K; j++) {
output(i,j) = 0.0f;
for(int kk = 0; kk < N; kk++) {
output(i,j) += A(i,kk) * B(kk,j);
}
}
}
記法例と等価コード

def conv(float(B,IP,H,W) input, float(OP,IP,KH,KW) weight)
-> (output) {
output(b, op, h, w) += input(b, ip, h + kh, w + kw)
* weight(op, ip, kh, kw)
}
Simple 2-D convolution (no stride, no padding)
https://facebookresearch.github.io/TensorComprehensions/introduction.html

def maxpool2x2(float(B,C,H,W) input)
-> (output) {
output(b,c,i,j) max= input(b,c,2*i + kw, 2*j + kh)
where kw in 0:2, kh in 0:2
}
Simple 2D max pooling
https://facebookresearch.github.io/TensorComprehensions/introduction.html

Pooling Layers (Average pooling / Max pooling)
Convolution layers (Simple Convolution / Strided Convolution / Strided Convolution
Gradient / Simple Group Convolution / Group Convolution Strided)
Linear layers (Fully Connected layer)
Non-Linear layers (ReLU / Sigmoid / Softmax / Tanh / Cosine)
Math Operations (TensorDot / Matmul / Matmul Gradient / Batch Matmul / Absolute / Add /
Indexing / Lookup Table / Transpose / Concat / Cast / Copy / Scale)
Fused layers (FCRelu / Small MobileNet)
Normalization layers (Batch Normalization / Layer Normalization)
Distance Functions (Cosine Similarity)
レイヤーデータベース
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/layers_database.html

遺伝的アルゴリズムによる自動最適化
パラメータ：
　Number of generations: The number of tuning generation to be run.
　Population size: The number of candidates in each generation.
　Number of elites: The number of best candidates that are preserved intact
　　　　　　　　　　　between generations (without any mutations).
　Crossover rate: The rate at which new candidates are bred instead of just surviving across generations.
　Mutation rate: The rate at which candidate options are randomly changed (mutated).
　Number of threads: The number of threads that are used to compile different candidates in parallel.
　GPUs: A comma separated list of GPUs (ids) to use for evaluating candidates (e.g., “0,1,2,3”).
　　　　　RNG state: The state used to seed the tuner’s RNG.
　Proto: A protobuf filename to (re)store compilation results
　　　　　and profiling information of the candidate solutions.
　min_launch_total_threads: Prune out kernels mapped to fewer than this many threads and block.
　　　　　　　　　　　　　　　　Set this to 1 to avoid pruning.
オートチューナー
https://facebookresearch.github.io/TensorComprehensions/autotuner.html

Tensor Comprehensions in PyTorch
Mar 5, 2018
http://pytorch.org/2018/03/05/tensor-comprehensions.html
PYTORCH

1). Define your TC language and pass it to tc.define
　2). Create input torch tensors
　3). Run the layer and get output
import torch
MATMUL_LANG = """
def matmul(float(M,N) A, float(N,K) B) -> (output) { <= 1)
}
"""
# the `name` should match the definition name in the `lang`
matmul = tc.define(MATMUL_LANG, name="matmul") <= 1)
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda() <= 2)
out = matmul(mat1, mat2) <= 3)
TCを使って、PyTorchのレイヤーを書くには！
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html

tensor_comprehensions.define(lang, **kwargs_define)
パラメータ：
lang (string, required)
name (string, required)
training (bool)
backward (string, optional)
constants (dict, 　optional)
inject_kernel (string, optional)
cuda_code (string, optional)
戻り値：
　　TC layer that you can run by passing the tensors.
レイヤーの定義

class tensor_comprehensions.TcUnit(lang, **kwargs_define)
__call__(*inputs, **kwargs)
パラメータ：
*inputs (required)
options (optional)
outputs (optional)
cache (string, optional)
grid (int, 3D list)
block (int, 3D list)
reorder_function (optional)
戻り値：
List of PyTorch tensors/Variables which is the output of running TC layer.
レイヤーの実行

import torch
lang = """
def add(float(N) A, float(N) B) -> (output) {
output(i) = A(i) + B(i) + 1
}
"""
add = tc.define(lang, name="add")
a, b = torch.randn(100).cuda(), torch.randn(100).cuda()
out = add(a, b, grid=[1, 1, 1], block=[100, 1, 1])
通常は、TCコードを指定する

cuda_code = """
extern "C"{
__global__ void my_add(float* __restrict__ output, const float*
__restrict__ A, const float* __restrict B)
{
int t = threadIdx.x;
output[t] = A[t] + B[t];
}
}
"""
add = tc.define(lang, name="add",
inject_kernel="my_add", cuda_code=cuda_code)
オプションで、CUDAコードを指定できる

ブログ (2007年～) : Vengineerの戯言
　http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
　https://www.slideshare.net/ssuser479fa3
ありがとうございました
Twitter (2009年～) :
＠Vengineer
ソースコード解析職人

Tensor comprehensions

More Related Content

What's hot (20)

Similar to Tensor comprehensions (20)

More from Mr. Vengineer (18)

Recently uploaded (20)

Tensor comprehensions