Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based

Privacy-first in-browser
Generative AI web apps:
• offline-ready,
• future-proof,
• standards-based
Maxim Salnikov
Developer Productivity Lead at Microsoft

• Building on web platform since 90s
• Organizing developer communities and
technical conferences
• Speaking, training, blogging: Webdev,
Cloud, Generative AI, Prompt Engineering
• Member of Web Machine Learning
Community Group
Helping developers to succeed with the Dev Tools, Cloud & AI in Microsoft
I’m Maxim Salnikov
Making Machine Learning a first-class web citizen by incubating Web APIs for machine learning
inference in the browser and in products using modern web engines

Demo repo!
https://github.com/webmaxru/nextjs-webnn
• AI-capable: Transformers.js
(under the hood: ONNX Web Runtime, WebNN)
• WebGPU, WebNN, NPU features detection
• Smooth UX: AI computation is in the web worker
• Offline-ready: Workbox
https://github.com/webmaxru/ng-ai
Angular
React + Next.js

Native AI in
the browser.
Standardized.
We use web
(61% of PC
time)
We use AI
(> 1B people)
We [will] have
AI-capable
devices
We want
performance,
privacy,
offline-ready. All FREE!
@Dev: unified
codebase
@Dev: handy
abstractions

“Native” means
 Best possible performance: fast and energy-efficient
 Leveraging all relevant hardware capabilities
 Platform-specific implementations
 No trade-offs needed
 For web only: unified codebase

Not in today’s session scope
 Native options (Ollama): local but not web
 AI models/APIs shipped as a part of the browser (Prompt API):
native & web but non-standard [yet]
 “First generation” web ML and Gen AI frameworks (TensorFlow.js,
WebLLM): limited usecases, not fully native [yet]

Web Neural Network API (WebNN)
 Near native execution characteristics:
both speed and power efficiency
 Heterogeneous hardware execution:
CPU, GPU, NPU
 Unified abstraction: W3C API standard
 Model-agnostic: General computational
graph allows to BYOM
 Compatible with existing ML frameworks
https://www.w3.org/TR/webnn/

All starts from the usecases
https://www.w3.org/TR/webnn/#usecases
• Person Detection
• Semantic Segmentation
• Skeleton Detection
• Face Recognition
• Facial Landmark Detection
• Style Transfer
• Super Resolution
• Image Captioning
• Text-to-image
• Machine Translation
• Emotion Analysis
• Video Summarization
• Noise Suppression
• Speech Recognition
• Text Generation
• Detecting fake video

Edge AI ecosystem
CPU GPU NPU
Native
ML APIs
Web Browser
(e.g., Chrome/Edge)
Frameworks
Use cases
WebNN
JavaScript Runtime
(e.g., Electron/Node.js)
Noise
Suppression
Image
Classification
Background
Segmentation
TensorFlow.js
ONNX Runtime
Web
MediaPipe Web
Natural
Language
Hardware
CoreML
DirectML
Web API
Web
Engines
OpenCV.js
WebAssembly WebGPU
Object
Detection
TFLite Other ML OS APIs
Windows Studio
Effects
API extensions

WebNN “as a frontend” status: native ML frameworks
Check latest on: https://webmachinelearning.github.io/webnn-status/
…

WebNN “as a backend” status : JS ML frameworks
Check latest on: https://webmachinelearning.github.io/webnn-status/
…

Which hardware to choose for AI workloads
 CPU: Provides the broadest compatibility and usability across all
client devices with varying degrees of performance.
 GPU: Provides the broadest range of achievable performance across
graphics hardware platforms from consumer devices to professional
workstations.
 NPU: Provides power efficiency for sustained workloads across
hardware platforms with purpose-built accelerators.

WebNN performance is “near-native”
 WebNN on CPU is about 93% of native XNNPack
 WebNN on GPU is about 83% of native DirectML
 WebNN on NPU is about 80% of native DirectML
Source

WebNN for the users
 Low Latency
In-browser inference enables novel use cases with local media sources
 Privacy Preserving
User data stays on-device and preserves user-privacy
 High Availability
No reliance on the network after initial asset caching for offline case
 Low Cost
Computing on client devices means no server farms needed.
https://webmachinelearning.github.io/webnn-intro/

WebNN for the developers
 Take advantage of the native OS services for
machine learning
 Get capabilities from the underlying hardware
innovations
 Implement consistent, efficient, and reliable AI
experiences on the web
 Benefit web applications and frameworks
including ONNX Runtime Web, TensorFlow.js
https://webmachinelearning.github.io/webnn-intro/
enum MLDeviceType {
"cpu",
"gpu",
"npu“
};
enum MLPowerPreference
{
"default",
"high-performance",
"low-power“
};

Device selection will change
 Algorithmic steps or notes to implementations on how to map power
preference to devices?
 Excluding specific device types?
 Query mechanism for supported devices?
 Using device similarity grouping?
 Moving to higher abstraction level?
 Combination of above?
https://github.com/webmachinelearning/webnn/blob/main/device-selection-explainer.md

Pre-requisites
https://microsoft.github.io/webnn-developer-preview/install.html
about://flags#web-machine-learning-neural-network
Canary or Dev versions of the Edge or Chrome
Enabling NPU: latest drivers for Intel | ARM

Let’s build an app
AI
usecases
Platform AI
capabilities
WebNN API
Web frontend app
ONNX Web Runtime
Transformers.js
Low-level, operates execution graph
Mid-level, operates inference sessions,
defines model format (ONNX)
High-level*, operates task-based pipelines,
handles model fetching & caching
* - level distribution is relative

What is ONNX?
https://onnxruntime.ai/
https://onnx.ai/
ONNX is an open format built to represent machine learning models. ONNX defines a common set of
operators - the building blocks of machine learning and deep learning models - and a common file format to
enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
ONNX Runtime is a production-grade AI engine to speed up training and inferencing in your existing
technology stack.

ONNX Runtime Web
const options = {
executionProviders: [
{
name: 'webnn', // wasm | webgpu | webnn | webgl
deviceType: 'npu', // cpu | gpu | npu
powerPreference: 'low-power', // default | low-power | high-performance
},
],
}
...
const session = await ort.InferenceSession.create('./model.onnx');
const tensorA = new ort.Tensor('float32', dataA, [3, 4]);
const tensorB = new ort.Tensor('float32', dataB, [4, 3]);
const feeds = { a: tensorA, b: tensorB };
const results = await session.run(feeds);

What is Transformers.js?
https://github.com/huggingface/transformers.js
Natural Language Processing: text classification, named entity recognition, question
answering, language modeling, summarization, translation, multiple choice, and text generation
Computer Vision: image classification, object detection, segmentation, and depth estimation
Audio: automatic speech recognition, audio classification, and text-to-speech
Multimodal: embeddings, zero-shot audio classification, zero-shot image classification, and
zero-shot object detection
State-of-the-art Machine Learning for the web. Run Transformers
directly in your browser, with no need for a server!

Plus:
https://github.com/huggingface/transformers.js
Hosted pretrained models (subset of Hugging Face catalog)
https://huggingface.co/models?library=transformers.js
Seamless caching of the models (with the Cache Storage)
Serving your own models (converted to the ONNX format)

Task-based pipelines
https://huggingface.co/docs/transformers.js/en/pipelines
import { pipeline } from '@huggingface/transformers’;
const classifier = await pipeline('sentiment-analysis’);
const result = await classifier('I love AI!’);
// [{'label': 'POSITIVE', 'score': 0.9998}]

Text Vision Audio …
https://huggingface.co/docs/transformers.js/en/pipelines
sentence-similarity
summarization
text-generation
translation
question-answering
fill-mask
...
image-classification
image-segmentation
image-to-image
mask-generation
object-detection
...
audio-classification
automatic-speech-
recognition
text-to-speech
text-to-audio
...

Summary and call to action:
• Web standard for running ML tasks in the browser natively is here
• It’s the only way to leverage all in-device AI capabilities
• There are still some moving parts in the specification
• Choose your own comfortable abstraction level using higher-level frameworks
• Same frameworks could provide fallback mechanisms to handle API/device
availability fallbacks
• User experience first! Offline-readiness, web workers, providing choices

References
 Updates/slides from TPAC 2024 WebML WG meeting
 WebNN Spec: https://www.w3.org/TR/webnn/
 WebNN Explainer: https://github.com/webmachinelearning/webnn/blob/main/explainer.md
 WebNN Implementation Status: https://webmachinelearning.github.io/webnn-status/
 Awesome WebNN: https://github.com/webmachinelearning/awesome-webnn
 WebNN Samples: https://microsoft.github.io/webnn-developer-preview/ & https://webmachinelearning.github.io/webnn-samples/
 WebNN Image Classification: https://webmachinelearning.github.io/webnn-samples/image_classification/
 WebNN Semantic Segmentation: https://webmachinelearning.github.io/webnn-samples/semantic_segmentation/index.html
 ONNX Runtime WebNN Execution Provider:
https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/core/providers/webnn

Thank you! I kindly prompt you:

Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based

More Related Content

Similar to Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based (20)

More from Maxim Salnikov (20)

Recently uploaded (20)

Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based