LiteRT for Web with LiteRT.js

LiteRT.js is Google's high performance WebAI runtime, targeting production Web applications. It is a continuation of the LiteRT stack, ensuring multi-framework support and unifying our core runtime across all platforms.

LiteRT.js supports the following core features:

  1. In-browser support for LiteRT models: Run models with best in class performance on CPU, accelerated via XNNPack on WebAssembly (Wasm), and GPU using the WebGPU API.
  2. Multi-framework compatibility: Use your preferred ML Framework: PyTorch, Jax or TensorFlow.
  3. Build on existing pipelines: Integrate with existing TensorFlow.js pipelines by supporting TensorFlow.js Tensors as inputs and outputs.

Installation

Install the @litertjs/core package from npm:

npm install @litertjs/core

The Wasm files, are located in node_modules/@litertjs/core/wasm/. For convenience, copy and serve the entire wasm/ folder. Then, import the package and load the Wasm files:

import {loadLiteRt} from '@litertjs/core;

// Host LiteRT's Wasm files on your server.
await loadLiteRt(`your/path/to/wasm/`);

Model conversion

LiteRT.js uses the same .tflite format as Android and iOS, and it supports existing models on Kaggle and Huggingface. If you have a new PyTorch model, you'll need to convert it.

Convert a PyTorch Model to LiteRT

To convert a PyTorch model to LiteRT, use the ai-edge-torch converter.

import ai_edge_torch

# Load your torch model. We're using resnet for this example.
resnet18 = torchvision.models.resnet18(torchvision.models.ResNet18_Weights.IMAGENET1K_V1)

sample_inputs = (torch.randn(1, 3, 224, 224),)

# Convert the model to LiteRT.
edge_model = ai_edge_torch.convert(resnet18.eval(), sample_inputs)

# Export the model.
edge_model.export('resnet.tflite')

Run the Converted Model

After converting the model to a .tflite file, you can run it in the browser.

import {loadAndCompile} from '@litertjs/core';

// Load the model hosted from your server. This makes an http(s) request.
const model = await loadAndCompile('/path/to/model.tflite', {
    accelerator: 'webgpu', // or 'wasm' for XNNPack CPU inference
});
// The model can also be loaded from a Uint8Array if you want to fetch it yourself.

// Create image input data
const image = new Float32Array(224 * 224 * 3).fill(0);
const inputTensor =
    await new Tensor(image, /* shape */ [1, 3, 224, 224]).moveTo('webgpu');

// Run the model
const outputs = model(inputTensor);
// You can also use model([inputTensor])
// or model({'input_tensor_name': inputTensor})

// Clean up and get outputs
inputTensor.delete();
const outputTensorCpu = await outputs[0].moveTo('wasm');
const outputData = outputTensorCpu.toTypedArray();
outputTensorCpu.delete();

Integrate into existing TensorFlow.js pipelines

You should consider integrating LiteRT.js into your TensorFlow.js pipelines for the following reasons:

  1. Best-in-class WebGPU performance: Converted models running on LiteRT.js WebGPU are optimized for browser performance, and are especially fast on Chromium-based browsers.
  2. Easier model conversion path: The LiteRT.js conversion path goes directly from PyTorch to LiteRT. The PyTorch to TensorFlow.js conversion path is significantly more complicated, requiring you to go from PyTorch -> ONNX -> TensorFlow -> TensorFlow.js.
  3. Debugging tools: The LiteRT.js conversion path comes with debugging tools.

LiteRT.js is designed to function within TensorFlow.js pipelines, and is compatible with TensorFlow.js pre- and post-processing, so the only thing you need to migrate is the model itself.

Integrate LiteRT.js into TensorFlow.js pipelines with the following steps:

  1. Convert your original TensorFlow, JAX, or PyTorch model to .tflite. For details, see the model conversion section.
  2. Install the @litertjs/core and @litertjs/tfjs-interop NPM packages.
  3. Import and use the TensorFlow.js WebGPU backend. This is required for LiteRT.js to interoperate with TensorFlow.js.
  4. Replace loading the TensorFlow.js model with loading the LiteRT.js model.
  5. Substitute the TensorFlow.js model.predict(inputs) or model.execute(inputs) with runWithTfjsTensors(liteRtModel, inputs). runWithTfjsTensors takes the same input tensors that TensorFlow.js models use and outputs TensorFlow.js tensors.
  6. Test that the model pipeline outputs the results you expect.

Using LiteRT.js with runWithTfjsTensors may also require the following changes to the model inputs:

  1. Reorder inputs: Depending on how the converter ordered the inputs and outputs of the model, you may need to change their order as you pass them in.
  2. Transpose inputs: It's also possible that the converter changed the layout of the inputs and outputs of the model compared to what TensorFlow.js uses. You may need to transpose your inputs to match the model and outputs to match the rest of the pipeline.
  3. Rename inputs: If you're using named inputs, the names may have also changed.

You can get more information about the inputs and outputs of the model with model.getInputDetails() and model.getOutputDetails().