SlideShare a Scribd company logo
Samy Fodil
Founder & CEO of Taubyte
WebAssembly
is Key to Better
LLM
Performance
Samy Fodil
Founder & CEO of Taubyte
Taubyte
An Open Source Cloud
Platform on Autopilot, where
coding in local environment
equals scaling to global
production. 🚀
Local Coding
Global Production
Taubyte Locally
Taubyte In Prod
https://tau.how
LLMs
Generative AI
INPUT
Large Language Models
OUTPUT
Proprietary
Open Source
✅Train
✅Host
Large Language Models
LLM Training
LLM Inference
📦Huge PyTorch Images
🔗Complex Dependencies
🔒Lock GPU ressources
🗿Primitive Orchestration
Problems
🪽Lightweight
🪣Sand-boxed
⚡Easy to orchestrate
LLM Inference
WASM to the rescue
🪽 Lightweight
<2MB ~4GB
🪽 Lightweight
⚡Fast to provision
🌱Cheap to cache
Closer to Data (Data Gravity) &
User (Edge Computing)
🪣Sand-boxed
🛡️Secure
🕹️Interfaces
Which means No, Restricted, Virtual or
Mocked: Networking, Filesystem and more
LLM Inference
You can use transformers
directly from python. This
will result in PyTorch and
other dependencies.
📦Huge PyTorch Images
🔗Complex Dependencies
🔒Lock GPU ressources
LLM Inference
You can for example use onnx, llama.cpp, candle (which
wrapps llama.cpp) or TensorRT-LLM and have a lower foot
print. But it comes with a few challenges:
🔒Lock GPU ressources
🧠Way harder to implement
Scaling LLM Inference (lvl1)
🎹Orchestration
🌱Caching
Inference Engines for LLM
🤹Load balancing?
🚦Routing?
With WASM
If we made LLM and AI, in general, available through a set of
common host calls we can combine benfits of
Inference Engines
🎹Orchestration
🌱Caching
WebAssembly
🪽Lightweight
🪣Sand-boxed
🤹Load balancing?
🚦Routing?
github.com/taubyte/tau
Will provide:
🤹Load-balancing
🚦Routing
🎹Orchestration
It’ll also provide abstractions so what’s built locally will work
in production with no changes.
APP HTTP Inference Engine
HTTP
HTTP
WASM
MODULE
HOST CALL
HOST CALL
HOST CALL
HTTP
HTTP
Inference
Host
Module
The idea
1️⃣
2️⃣
Implementation of 1️⃣
tau implement a protocols called `gateway` that will
determine what host will be best suited to serve the request
based on:
WebAssembly caching and dependency modules
availability (including host modules)
Host Module Resource availability
Host Resource availability
Other constraints defined by developer like data gravity
Node Running
Gateway Proto
Gateway Protocol
Serving Node
(Substrate)
Serving Node
(Substrate)
Serving Node
(Substrate)
Satellite
(i.e. LLM Inference)
Satellite
Satellite
MUXED
TUNNEL
MUXED TUNNEL
M
UXED
TUN
N
EL
ORBIT PROTO
ORBIT PROTO
ORBIT PROTO
Host Module Resource
Availability
This is still to be implemented feature that will ask Host
Module to provide basic metrics for the gateway:
Caching score. Example: is the particular model loaded.
Resources Availability Score. Example: Is there enough
GPU mem to spin-up the model
Queue Score. Example: If Host Module uses queues, how
filled is the queue.
github.com/taubyte/vm-orbit
Extending WebAssembly Runtime in a secure way.
orbit
external
process
PROXY WASM CALL
PROXY ACCESS TO MODULE MEMORY
Example
github.com/ollama-cloud
trunings ollama into a satellite
Compile llama.cpp
Build plugin
Install dreamland
Start local Cloud
Attach plug-in
Login to Local Cloud
Create a project
Create a function
Call Generate
Stream Tokens
No SDK!
Check my previous project
github.com/samyfodil/taubyte-llama-satellite
Which actually has a nice SDK!
Trigger the function
- Copy ollama plugin to /tb/plugins
- Add it to config
In production
Your Application is Live!
taubyte/dllama
Ready for Cloud with better backends
Always local friendly!
Thanks!

More Related Content

Similar to WebAssembly is Key to Better LLM Performance (20)

PPTX
GOSIM 2024 - Porting Servo to OpenHarmony
GOSIM Foundation
 
PPTX
Node js meetup
Ansuman Roy
 
PPT
Open Source XMPP for Cloud Services
mattjive
 
PDF
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
Márton Kodok
 
PDF
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Timothy Spann
 
PDF
Near real-time anomaly detection at Lyft
markgrover
 
PDF
Tornado Web Server Internals
Praveen Gollakota
 
PDF
Import golang; struct microservice - Codemotion Rome 2015
Giorgio Cefaro
 
PDF
WASM Beyond the Browser [2022_07_21 meetup]
Salesforce
 
PPTX
Nodejs
Vinod Kumar Marupu
 
PDF
Where should I run my code? Serverless, Containers, Virtual Machines and more
Bret McGowen - NYC Google Developer Advocate
 
PPTX
Madrid meetup #7 deployment models
Mario Alberto Martinez Lopez
 
PDF
Monkey Server
Eduardo Silva Pereira
 
PDF
Red Hat Forum Benelux 2015
Microsoft
 
PDF
GDG DevFest Romania - Architecting for the Google Cloud Platform
Márton Kodok
 
PPTX
Get the Exact Identity Solution You Need - In the Cloud - Overview
ForgeRock
 
PPTX
APIs at the Edge
Red Hat
 
PPTX
AMF Flash and .NET
Yaniv Uriel
 
PPT
Lamp Zend Security
Ram Srivastava
 
PPTX
Kubernetes - State of the Union (Q1-2016)
DoiT International
 
GOSIM 2024 - Porting Servo to OpenHarmony
GOSIM Foundation
 
Node js meetup
Ansuman Roy
 
Open Source XMPP for Cloud Services
mattjive
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
Márton Kodok
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Timothy Spann
 
Near real-time anomaly detection at Lyft
markgrover
 
Tornado Web Server Internals
Praveen Gollakota
 
Import golang; struct microservice - Codemotion Rome 2015
Giorgio Cefaro
 
WASM Beyond the Browser [2022_07_21 meetup]
Salesforce
 
Where should I run my code? Serverless, Containers, Virtual Machines and more
Bret McGowen - NYC Google Developer Advocate
 
Madrid meetup #7 deployment models
Mario Alberto Martinez Lopez
 
Monkey Server
Eduardo Silva Pereira
 
Red Hat Forum Benelux 2015
Microsoft
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
Márton Kodok
 
Get the Exact Identity Solution You Need - In the Cloud - Overview
ForgeRock
 
APIs at the Edge
Red Hat
 
AMF Flash and .NET
Yaniv Uriel
 
Lamp Zend Security
Ram Srivastava
 
Kubernetes - State of the Union (Q1-2016)
DoiT International
 

Recently uploaded (20)

PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Ad

WebAssembly is Key to Better LLM Performance