Web Machine Learning (ML) API POC march update

Web Machine Learning (ML) API POC
March Update
Ningxin Hu <ningxin.hu@intel.com>
3/25/2018 Intel Corporation 1

Today’s ML on Web
deeplearn.js
WebAssembly WebGL/WebGPU
CPU GPU
WebDNN keras.js
Web App
Web Browser
Driver/Hardware
opencv.js
• Emerging web ML apps and frameworks running on client devices
• ML workload on Web is not fully optimized on client devices:
• Limited CPU parallelism and vectorization (WIP in WASM, but < native, e.g. SIMD128 vs. 512)
• Limited GPGPU (WIP in WebGL 2.x, early stage in WebGPU)
• No access to dedicated ML hardware accelerators (more efficient than CPU/GPU)
X
NPU VPU FPGA ASIC
Emerging ML HW
accelerators
Disconnected from ML
HW accelerators
Emerging AI-based
web apps and
JS frameworks
AI-based Web App
Intel Corporation3/25/2018 2

WebML
CoreML/BNNS/MPS
MacOS/iOS
WinML/DirectML
Windows
TF-Lite/NN API
Android
CPU GPU Accelerators
JS ML frameworks
Web App
Web Browser
OS ML API
Driver/Hardware
new
existing
Proposing WebML: accelerated Web Machine Learning API
• Standard-based ML Web API focus on pre-trained model inferencing
• Integrate with other Web APIs, e.g. text, multimedia, sensors and VR/AR, for real-
time AI-based apps on client devices
• Web ML workloads run on top of OS ML API and fully exploit the
CPU/GPU/Accelerator performance on client devices
WebAssembly
OS ML API is fully
optimized by
CPU/GPU/Accelerators
ONNX Models
WebGL/WebGPU
TensorFlow Models Other Models

WebML
MPSCNN API
MacOS
DirectML API
Windows
NN API
Android
CPU GPU Accelerators
JS ML frameworks
Web App
Web Browser
OS ML API
Driver/Hardware
new
existing
WebML Polyfill and POC
WebAssembly
ONNX Models
WebGL2
TensorFlow Models Other Models
1. Polyfill with WebAssembly and WebGL2 backends
2. Android POC with NN API
3. MacOS POC with MPSCNN API
1
23

WebML Polyfill and Examples
• Run WebML examples in any modern
browsers
• Current version supports two
backends:
• WebAssembly (WASM) for CPU
• WebGL2 for GPU
• Examples
• MobileNet* Image Demo
• MobileNet* Camera Demo
* MobileNet 1.0 224 trained by ImageNet in TensorFlow-Lite format

WebML POC
Screenshot captured
on Pixel XL phone
Prototype WebML API in Chromium M65 on Android and MacOS
Screenshot captured
on MacBook Pro 13
Implement with MPSCNN API of
MacOS 10.13+
Implement with NN API of
Android 8.1.0+

Performance Summary
0
200
400
600
800
1000
1200
1400
1600
Mobilenet 1.0 224 Float
Inference Time
WASM Polyfill WebML/NNAPI TensorFlow Lite
ms
Data collected on Pixel XL phone with Android 8.1.0
~ 7X
0
20
40
60
80
100
120
Mobilenet 1.0 224 Float
Inference Time
WebGL2 Polyfill WebML/MPS Native/MPS CoreML
ms
Data collected on MacBook Pro 13 2017 with MacOS 10.13.4 beta
• Observed significant speedup on CPU/GPU comparing to existing Web APIs
• Can bring close-to-native performance to Web apps
• Will scale with new dedicated ML hardware accelerators
~ 6X

WebML POC API
partial interface Navigator {
readonly attribute ML ml;
};
interface ML {
NeuralNetworkContext getNeuralNetworkContext();
};
• JavaScript API implemented in
WebML polyfill and POC
• Modeled from NN API
• Served only as a starting point
for WebML API proposal
interface Model {
void addOperand(OperandOptions options);
void setOperandValue(unsigned long index,
ArrayBufferView data);
void addOperation(long type,
sequence<unsigned long> inputs,
sequence<unsigned long> outputs);
void identifyInputsAndOutputs(
sequence<unsigned long> inputs,
sequence<unsigned long> outputs);
Promise<long> finish();
Promise<Compilation> createCompilation();
};
interface Compilation {
void setPreference(long preference);
Promise<long> finish();
Promise<Execution> createExecution();
};
interface Execution {
void setInput(unsigned long index, ArrayBufferView data);
void setOutput(unsigned long index, ArrayBufferView data);
Promise<long> startCompute();
};
interface NeuralNetworkContext {
// Operand types.
const long FLOAT32 = 0;
const long INT32 = 1;
const long UINT32 = 2;
const long TENSOR_FLOAT32 = 3;
const long TENSOR_INT32 = 4;
const long TENSOR_QUANT8_ASYMM = 5;
// Operation types.
const long ADD = 0;
const long AVERAGE_POOL_2D = 1;
const long CONCATENATION = 2;
const long CONV_2D = 3;
const long DEPTHWISE_CONV_2D = 4;
const long DEPTH_TO_SPACE = 5;
const long DEQUANTIZE = 6;
const long EMBEDDING_LOOKUP = 7;
const long FLOOR = 8;
const long FULLY_CONNECTED = 9;
const long HASHTABLE_LOOKUP = 10;
const long L2_NORMALIZATION = 11;
const long L2_POOL_2D = 12;
const long LOCAL_RESPONSE_NORMALIZATION = 13;
const long LOGISTIC = 14;
const long LSH_PROJECTION = 15;
const long LSTM = 16;
const long MAX_POOL_2D = 17;
const long MUL = 18;
const long RELU = 19;
const long RELU1 = 20;
const long RELU6 = 21;
const long RESHAPE = 22;
const long RESIZE_BILINEAR = 23;
const long RNN = 24;
const long SOFTMAX = 25;
const long SPACE_TO_DEPTH = 26;
const long SVDF = 27;
const long TANH = 28;
// Fused activation function types.
const long FUSED_NONE = 0;
const long FUSED_RELU = 1;
const long FUSED_RELU1 = 2;
const long FUSED_RELU6 = 3;
// Implicit padding algorithms.
const long PADDING_SAME = 1;
const long PADDING_VALID = 2;
// Execution preferences.
const long PREFER_LOW_POWER = 0;
const long PREFER_FAST_SINGLE_ANSWER = 1;
const long PREFER_SUSTAINED_SPEED = 2;
Promise<Model> createModel();
};

Web Machine Learning (ML) API POC march update

More Related Content

Similar to Web Machine Learning (ML) API POC march update (20)

Recently uploaded (20)

Web Machine Learning (ML) API POC march update