SlideShare a Scribd company logo
Web Machine Learning (ML) API POC
March Update
Ningxin Hu <ningxin.hu@intel.com>
3/25/2018 Intel Corporation 1
Today’s ML on Web
deeplearn.js
WebAssembly WebGL/WebGPU
CPU GPU
WebDNN keras.js
Web App
Web Browser
Driver/Hardware
opencv.js
• Emerging web ML apps and frameworks running on client devices
• ML workload on Web is not fully optimized on client devices:
• Limited CPU parallelism and vectorization (WIP in WASM, but < native, e.g. SIMD128 vs. 512)
• Limited GPGPU (WIP in WebGL 2.x, early stage in WebGPU)
• No access to dedicated ML hardware accelerators (more efficient than CPU/GPU)
X
NPU VPU FPGA ASIC
Emerging ML HW
accelerators
Disconnected from ML
HW accelerators
Emerging AI-based
web apps and
JS frameworks
AI-based Web App
Intel Corporation3/25/2018 2
WebML
CoreML/BNNS/MPS
MacOS/iOS
WinML/DirectML
Windows
TF-Lite/NN API
Android
CPU GPU Accelerators
JS ML frameworks
Web App
Web Browser
OS ML API
Driver/Hardware
new
existing
Proposing WebML: accelerated Web Machine Learning API
• Standard-based ML Web API focus on pre-trained model inferencing
• Integrate with other Web APIs, e.g. text, multimedia, sensors and VR/AR, for real-
time AI-based apps on client devices
• Web ML workloads run on top of OS ML API and fully exploit the
CPU/GPU/Accelerator performance on client devices
WebAssembly
OS ML API is fully
optimized by
CPU/GPU/Accelerators
Intel Corporation3/25/2018 3
ONNX Models
WebGL/WebGPU
TensorFlow Models Other Models
WebML
MPSCNN API
MacOS
DirectML API
Windows
NN API
Android
CPU GPU Accelerators
JS ML frameworks
Web App
Web Browser
OS ML API
Driver/Hardware
new
existing
WebML Polyfill and POC
WebAssembly
Intel Corporation3/25/2018 4
ONNX Models
WebGL2
TensorFlow Models Other Models
1. Polyfill with WebAssembly and WebGL2 backends
2. Android POC with NN API
3. MacOS POC with MPSCNN API
1
23
WebML Polyfill and Examples
• Run WebML examples in any modern
browsers
• Current version supports two
backends:
• WebAssembly (WASM) for CPU
• WebGL2 for GPU
• Examples
• MobileNet* Image Demo
• MobileNet* Camera Demo
3/25/2018 Intel Corporation 5
* MobileNet 1.0 224 trained by ImageNet in TensorFlow-Lite format
WebML POC
3/25/2018 Intel Corporation 6
Screenshot captured
on Pixel XL phone
Prototype WebML API in Chromium M65 on Android and MacOS
Screenshot captured
on MacBook Pro 13
Implement with MPSCNN API of
MacOS 10.13+
Implement with NN API of
Android 8.1.0+
Performance Summary
0
200
400
600
800
1000
1200
1400
1600
Mobilenet 1.0 224 Float
Inference Time
WASM Polyfill WebML/NNAPI TensorFlow Lite
3/25/2018 Intel Corporation 7
ms
Data collected on Pixel XL phone with Android 8.1.0
~ 7X
0
20
40
60
80
100
120
Mobilenet 1.0 224 Float
Inference Time
WebGL2 Polyfill WebML/MPS Native/MPS CoreML
ms
Data collected on MacBook Pro 13 2017 with MacOS 10.13.4 beta
• Observed significant speedup on CPU/GPU comparing to existing Web APIs
• Can bring close-to-native performance to Web apps
• Will scale with new dedicated ML hardware accelerators
~ 6X
WebML POC API
3/25/2018 Intel Corporation 8
partial interface Navigator {
readonly attribute ML ml;
};
interface ML {
NeuralNetworkContext getNeuralNetworkContext();
};
• JavaScript API implemented in
WebML polyfill and POC
• Modeled from NN API
• Served only as a starting point
for WebML API proposal
interface Model {
void addOperand(OperandOptions options);
void setOperandValue(unsigned long index,
ArrayBufferView data);
void addOperation(long type,
sequence<unsigned long> inputs,
sequence<unsigned long> outputs);
void identifyInputsAndOutputs(
sequence<unsigned long> inputs,
sequence<unsigned long> outputs);
Promise<long> finish();
Promise<Compilation> createCompilation();
};
interface Compilation {
void setPreference(long preference);
Promise<long> finish();
Promise<Execution> createExecution();
};
interface Execution {
void setInput(unsigned long index, ArrayBufferView data);
void setOutput(unsigned long index, ArrayBufferView data);
Promise<long> startCompute();
};
interface NeuralNetworkContext {
// Operand types.
const long FLOAT32 = 0;
const long INT32 = 1;
const long UINT32 = 2;
const long TENSOR_FLOAT32 = 3;
const long TENSOR_INT32 = 4;
const long TENSOR_QUANT8_ASYMM = 5;
// Operation types.
const long ADD = 0;
const long AVERAGE_POOL_2D = 1;
const long CONCATENATION = 2;
const long CONV_2D = 3;
const long DEPTHWISE_CONV_2D = 4;
const long DEPTH_TO_SPACE = 5;
const long DEQUANTIZE = 6;
const long EMBEDDING_LOOKUP = 7;
const long FLOOR = 8;
const long FULLY_CONNECTED = 9;
const long HASHTABLE_LOOKUP = 10;
const long L2_NORMALIZATION = 11;
const long L2_POOL_2D = 12;
const long LOCAL_RESPONSE_NORMALIZATION = 13;
const long LOGISTIC = 14;
const long LSH_PROJECTION = 15;
const long LSTM = 16;
const long MAX_POOL_2D = 17;
const long MUL = 18;
const long RELU = 19;
const long RELU1 = 20;
const long RELU6 = 21;
const long RESHAPE = 22;
const long RESIZE_BILINEAR = 23;
const long RNN = 24;
const long SOFTMAX = 25;
const long SPACE_TO_DEPTH = 26;
const long SVDF = 27;
const long TANH = 28;
// Fused activation function types.
const long FUSED_NONE = 0;
const long FUSED_RELU = 1;
const long FUSED_RELU1 = 2;
const long FUSED_RELU6 = 3;
// Implicit padding algorithms.
const long PADDING_SAME = 1;
const long PADDING_VALID = 2;
// Execution preferences.
const long PREFER_LOW_POWER = 0;
const long PREFER_FAST_SINGLE_ANSWER = 1;
const long PREFER_SUSTAINED_SPEED = 2;
Promise<Model> createModel();
};

More Related Content

Similar to Web Machine Learning (ML) API POC march update (20)

DOC
Parimal Resume
Parimal Thakkar
 
PDF
Where should I run my code? Serverless, Containers, Virtual Machines and more
Bret McGowen - NYC Google Developer Advocate
 
PPTX
NET core 2 e i fratelli
Andrea Tosato
 
PDF
Cisco project ideas
VIT University
 
PDF
Meteoro de pegasuus! Desenvolvendo aplicações realtime com MeteorJS
Julio Antonio Mendonça de Marins
 
PDF
Web Leaps Forward
Moh Haghighat
 
DOCX
software Documentation Certificate in department of computer
shriyanshrauthan833
 
ODP
Zero Downtime JEE Architectures
Alexander Penev
 
PDF
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
Edge AI and Vision Alliance
 
PPT
ASP.NET MVC - In the Wild
Brian Boatright
 
PPTX
mbed Connect Asia 2016 Developing IoT devices with mbed OS 5
armmbed
 
PPT
Node js
Chirag Parmar
 
PDF
A164 enterprise javascript ibm node sdk
Toby Corbin
 
PPTX
Developing Real-Time Systems on Application Processors
Toradex
 
DOC
Nanaji_Jonnadula
Nanaji Jonnadula
 
PDF
.NET framework vs .net core 3.1 commons &amp; differences
Alina Vilk
 
PPTX
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
mfrancis
 
PPT
Overview Of Parallel Development - Ericnel
ukdpe
 
PPTX
CloudBerry
Susmitha M
 
Parimal Resume
Parimal Thakkar
 
Where should I run my code? Serverless, Containers, Virtual Machines and more
Bret McGowen - NYC Google Developer Advocate
 
NET core 2 e i fratelli
Andrea Tosato
 
Cisco project ideas
VIT University
 
Meteoro de pegasuus! Desenvolvendo aplicações realtime com MeteorJS
Julio Antonio Mendonça de Marins
 
Web Leaps Forward
Moh Haghighat
 
software Documentation Certificate in department of computer
shriyanshrauthan833
 
Zero Downtime JEE Architectures
Alexander Penev
 
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
Edge AI and Vision Alliance
 
ASP.NET MVC - In the Wild
Brian Boatright
 
mbed Connect Asia 2016 Developing IoT devices with mbed OS 5
armmbed
 
Node js
Chirag Parmar
 
A164 enterprise javascript ibm node sdk
Toby Corbin
 
Developing Real-Time Systems on Application Processors
Toradex
 
Nanaji_Jonnadula
Nanaji Jonnadula
 
.NET framework vs .net core 3.1 commons &amp; differences
Alina Vilk
 
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
mfrancis
 
Overview Of Parallel Development - Ericnel
ukdpe
 
CloudBerry
Susmitha M
 

Recently uploaded (20)

PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
July Patch Tuesday
Ivanti
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Ad

Web Machine Learning (ML) API POC march update

  • 1. Web Machine Learning (ML) API POC March Update Ningxin Hu <[email protected]> 3/25/2018 Intel Corporation 1
  • 2. Today’s ML on Web deeplearn.js WebAssembly WebGL/WebGPU CPU GPU WebDNN keras.js Web App Web Browser Driver/Hardware opencv.js • Emerging web ML apps and frameworks running on client devices • ML workload on Web is not fully optimized on client devices: • Limited CPU parallelism and vectorization (WIP in WASM, but < native, e.g. SIMD128 vs. 512) • Limited GPGPU (WIP in WebGL 2.x, early stage in WebGPU) • No access to dedicated ML hardware accelerators (more efficient than CPU/GPU) X NPU VPU FPGA ASIC Emerging ML HW accelerators Disconnected from ML HW accelerators Emerging AI-based web apps and JS frameworks AI-based Web App Intel Corporation3/25/2018 2
  • 3. WebML CoreML/BNNS/MPS MacOS/iOS WinML/DirectML Windows TF-Lite/NN API Android CPU GPU Accelerators JS ML frameworks Web App Web Browser OS ML API Driver/Hardware new existing Proposing WebML: accelerated Web Machine Learning API • Standard-based ML Web API focus on pre-trained model inferencing • Integrate with other Web APIs, e.g. text, multimedia, sensors and VR/AR, for real- time AI-based apps on client devices • Web ML workloads run on top of OS ML API and fully exploit the CPU/GPU/Accelerator performance on client devices WebAssembly OS ML API is fully optimized by CPU/GPU/Accelerators Intel Corporation3/25/2018 3 ONNX Models WebGL/WebGPU TensorFlow Models Other Models
  • 4. WebML MPSCNN API MacOS DirectML API Windows NN API Android CPU GPU Accelerators JS ML frameworks Web App Web Browser OS ML API Driver/Hardware new existing WebML Polyfill and POC WebAssembly Intel Corporation3/25/2018 4 ONNX Models WebGL2 TensorFlow Models Other Models 1. Polyfill with WebAssembly and WebGL2 backends 2. Android POC with NN API 3. MacOS POC with MPSCNN API 1 23
  • 5. WebML Polyfill and Examples • Run WebML examples in any modern browsers • Current version supports two backends: • WebAssembly (WASM) for CPU • WebGL2 for GPU • Examples • MobileNet* Image Demo • MobileNet* Camera Demo 3/25/2018 Intel Corporation 5 * MobileNet 1.0 224 trained by ImageNet in TensorFlow-Lite format
  • 6. WebML POC 3/25/2018 Intel Corporation 6 Screenshot captured on Pixel XL phone Prototype WebML API in Chromium M65 on Android and MacOS Screenshot captured on MacBook Pro 13 Implement with MPSCNN API of MacOS 10.13+ Implement with NN API of Android 8.1.0+
  • 7. Performance Summary 0 200 400 600 800 1000 1200 1400 1600 Mobilenet 1.0 224 Float Inference Time WASM Polyfill WebML/NNAPI TensorFlow Lite 3/25/2018 Intel Corporation 7 ms Data collected on Pixel XL phone with Android 8.1.0 ~ 7X 0 20 40 60 80 100 120 Mobilenet 1.0 224 Float Inference Time WebGL2 Polyfill WebML/MPS Native/MPS CoreML ms Data collected on MacBook Pro 13 2017 with MacOS 10.13.4 beta • Observed significant speedup on CPU/GPU comparing to existing Web APIs • Can bring close-to-native performance to Web apps • Will scale with new dedicated ML hardware accelerators ~ 6X
  • 8. WebML POC API 3/25/2018 Intel Corporation 8 partial interface Navigator { readonly attribute ML ml; }; interface ML { NeuralNetworkContext getNeuralNetworkContext(); }; • JavaScript API implemented in WebML polyfill and POC • Modeled from NN API • Served only as a starting point for WebML API proposal interface Model { void addOperand(OperandOptions options); void setOperandValue(unsigned long index, ArrayBufferView data); void addOperation(long type, sequence<unsigned long> inputs, sequence<unsigned long> outputs); void identifyInputsAndOutputs( sequence<unsigned long> inputs, sequence<unsigned long> outputs); Promise<long> finish(); Promise<Compilation> createCompilation(); }; interface Compilation { void setPreference(long preference); Promise<long> finish(); Promise<Execution> createExecution(); }; interface Execution { void setInput(unsigned long index, ArrayBufferView data); void setOutput(unsigned long index, ArrayBufferView data); Promise<long> startCompute(); }; interface NeuralNetworkContext { // Operand types. const long FLOAT32 = 0; const long INT32 = 1; const long UINT32 = 2; const long TENSOR_FLOAT32 = 3; const long TENSOR_INT32 = 4; const long TENSOR_QUANT8_ASYMM = 5; // Operation types. const long ADD = 0; const long AVERAGE_POOL_2D = 1; const long CONCATENATION = 2; const long CONV_2D = 3; const long DEPTHWISE_CONV_2D = 4; const long DEPTH_TO_SPACE = 5; const long DEQUANTIZE = 6; const long EMBEDDING_LOOKUP = 7; const long FLOOR = 8; const long FULLY_CONNECTED = 9; const long HASHTABLE_LOOKUP = 10; const long L2_NORMALIZATION = 11; const long L2_POOL_2D = 12; const long LOCAL_RESPONSE_NORMALIZATION = 13; const long LOGISTIC = 14; const long LSH_PROJECTION = 15; const long LSTM = 16; const long MAX_POOL_2D = 17; const long MUL = 18; const long RELU = 19; const long RELU1 = 20; const long RELU6 = 21; const long RESHAPE = 22; const long RESIZE_BILINEAR = 23; const long RNN = 24; const long SOFTMAX = 25; const long SPACE_TO_DEPTH = 26; const long SVDF = 27; const long TANH = 28; // Fused activation function types. const long FUSED_NONE = 0; const long FUSED_RELU = 1; const long FUSED_RELU1 = 2; const long FUSED_RELU6 = 3; // Implicit padding algorithms. const long PADDING_SAME = 1; const long PADDING_VALID = 2; // Execution preferences. const long PREFER_LOW_POWER = 0; const long PREFER_FAST_SINGLE_ANSWER = 1; const long PREFER_SUSTAINED_SPEED = 2; Promise<Model> createModel(); };