SlideShare a Scribd company logo
Deploying Large Language
Models on a Raspberry Pi
Pete Warden
CEO
Useful Sensors
© 2024 Useful Sensors 1
• github.com/ee292d/labs/blob/m
ain/lab1/run_llm.py
• 60 lines of Python code, including
comments.
Running an LLM on a Raspberry Pi
2
© 2024 Useful Sensors
3
© 2024 Useful Sensors
Demo
• What’s the technology behind this code?
• Where can you get models?
• Which models will run efficiently on what hardware?
• How can you customize models?
• What’s coming in the future?
What you need to know
4
© 2024 Useful Sensors
• Llama.cpp was one of the first easy to deploy
implementations of Meta’s open weights Llama v1 LLM.
• It didn’t require Python or a lot of dependencies, unlike the
Python code originally released by Meta, and so it became
popular.
• It was also easy to optimize, and so became faster on many
platforms.
• Support started to be added for other models, and a GGML
format emerged that allowed export and import.
What’s the technology here?
5
© 2024 Useful Sensors
🔥 💯
🙋♀️
• No! Though Llama.cpp’s scope has expanded over time, it’s still limited in which models
it can support, and is focused on inference rather than training.
• The first generation of ML frameworks tried to be good at everything (TensorFlow more
than most) which makes them hard to port, optimize, modify, and understand.
• We’re seeing different design goals in this generation. PyTorch is the favorite for
prototyping and training, but other tools are used for inference, compression, and fine-
tuning.
So it’s like PyTorch or TensorFlow?
6
© 2024 Useful Sensors
• Another library I use a lot is CTransformers2. This is similar to GGML, but has more of a
focus on quantization and optimization.
• Don’t expect to bring your own model though. A key difference between gen 1
frameworks and these is that they only support a subset of models, and adding new
architectures may involve code changes.
• They also often break compatibility with saved files, requiring reconversion when you
upgrade to a new library version.
Other frameworks
7
© 2024 Useful Sensors
Where can you get models?
8
© 2024 Useful Sensors
You can find almost any
released model in any format
somewhere on the site, look in
the files section.
On Reddit, r/LocalLlama is the
place to find news and advice
on running models, along with
some impressive demos.
• Be aware, most models are “open weights”, but few are “open source”. You can use
the pretrained models, but the datasets and training code are usually kept
proprietary. The Allen Institute’s Olma project is a welcome exception.
• You need a lot of RAM for LLMs, because transformers use dynamic layers constructed in
memory. A good rule of thumb is that you need as much RAM as the model file size. For
example a 7-billion parameter model at eight bits will be 7GB on disk, and you can
expect to need at least 7GB of RAM to run it at a decent speed.
• The latency is also usually dominated by the RAM speed, so the faster the better.
• TPUs and other accelerators often don’t help much, since we’re memory bound.
Which models run on what HW?
9
© 2024 Useful Sensors
= Model
file size
Rule of
thumb
• Running as a regular Android or iOS app is hard because you need to use a lot more
memory and compute than most applications, and you’ll get throttled or blocked.
• If you have vendor-level access to avoid these limits, Android on a modern SoC is a good
option.
• Otherwise a Raspberry Pi 5 is a good option, with 8GB of RAM it can handle medium-
sized models. Other quad-core A76 SBCs are similar.
• Microcontrollers and DSPs (meaning low power or low cost) aren’t possible right now
because of how RAM-hungry these models are.
What hardware should you use?
10
© 2024 Useful Sensors
• Since all mainstream LLMs are Transformer-based, and Transformer models are memory
bound on batch-size-one inference, the size of the data you pull from memory matters.
• Quantization is an old technique that has become more relevant with models now
memory bound. It takes 32-bit floating point representations of weights and shrinks
them down to values that take fewer bits per value. Eight bit is standard for
convolutional image models, but since bandwidth is so critical and unpacking compute
can be hidden in memory latency, four, two, or even one bit schemes are now in use.
Quantization
11
© 2024 Useful Sensors
• Low Rank Adaptation (or LoRA) is a technique that’s similar in effect to transfer learning
in CNN models. It lets you add extra layers to a pretrained model to customize its
outputs, with shorter training times and less data than a full training run.
• Here’s an example you can run in a Colab notebook in under an hour:
• http://github.com/ee292d/labs/blob/main/lab6/notebook.ipynb
How can you customize models?
12
© 2024 Useful Sensors
13
© 2024 Useful Sensors
LoRA Training Demo
• The idea is to use conventional search techniques to retrieve factual information to
insert in the prompt as context, so the user question will draw on that knowledge.
• For example, you could notice a question contains the name of a product, and insert the
product description as the context. The result should then be able to use that extra
information to give a better answer.
• I hate it!
Retrieval Augmented Generation
14
© 2024 Useful Sensors
• It’s a neat technique, but it’s usually overkill for most practical situations. The
“generation” part means you’re still going to have some situations where the model
makes up answers.
• In most cases you can just do a good job on the “retrieval” and show those answers
directly to the user. They’re vetted, relevant, and easy to control. RAG is for when you
need to scale a solution, which isn’t relevant for most applications I encounter.
Why I hate RAG
15
© 2024 Useful Sensors
• Models keep getting smaller and more accurate. Microsoft’s latest Phi 3 is a great
example of the trend.
• Transformers are memory hungry and hard to accelerate. There are lots of alternatives
like Mamba and Conformers that offer different tradeoffs, maybe something new will
emerge that’s better for the edge.
• Shrinking scope will help us use even smaller models too, especially as I expect retrieval
will be more important than generation long term.
What’s coming next?
16
© 2024 Useful Sensors
• LLMs want to be on the edge!
• Dip your toes in the water with some simple code experiments, and prototype solutions
that make sense to you.
• These models are only going to get faster and more capable, and hardware will emerge
to help with that.
Conclusions
17
© 2024 Useful Sensors
• These slides: usfl.ink/ev_talk
• EE292D Labs: github.com/ee292d
• Intro to GGML: omkar.xyz/intro-ggml
• Huggingface: huggingface.co
Resources
18
© 2024 Useful Sensors
• We run the latest AI models on edge hardware to solve problems like person detection,
language translation, voice interfaces, LLM querying, and more!
• Come see us at our booth (#806)
Useful Sensors
19
© 2024 Useful Sensors
20
© 2024 Useful Sensors
Thank you

More Related Content

Similar to “Deploying Large Language Models on a Raspberry Pi,” a Presentation from Useful Sensors (20)

PPTX
Web frameworks
Arafat Hossan
 
PDF
Big data & frameworks: no book for you anymore
Stfalcon Meetups
 
PDF
Big data & frameworks: no book for you anymore.
Roman Nikitchenko
 
PPTX
RAG Techniques – for engineering student
ÑïshĶãrsʜ Shäh
 
PDF
Ideas spracklen-final
supportlogic
 
PPTX
Machine Learning Models: From Research to Production 6.13.18
Cloudera, Inc.
 
PPTX
Big Memory for HPC
MemVerge
 
PPTX
Internet of Things, TYBSC IT, Semester 5, Unit II
Arti Parab Academics
 
PDF
The View - 30 proven Lotuscript tips
Bill Buchan
 
PPTX
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
PPTX
Parallelization using open mp
ranjit banshpal
 
PPTX
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Data Con LA
 
PDF
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
PPS
CS101- Introduction to Computing- Lecture 45
Bilal Ahmed
 
PDF
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
Neo4j
 
PDF
Platform Engineering for the Modern Oracle World
Simon Haslam
 
PPTX
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
TriNimbus
 
PDF
10 Code Anti-Patterns to Avoid in Software Development.pdf
Ahmed Salama
 
PPTX
Google cloud Study Jam 2023.pptx
GDSCNiT
 
PDF
odsc_2023.pdf
Sanghamitra Deb
 
Web frameworks
Arafat Hossan
 
Big data & frameworks: no book for you anymore
Stfalcon Meetups
 
Big data & frameworks: no book for you anymore.
Roman Nikitchenko
 
RAG Techniques – for engineering student
ÑïshĶãrsʜ Shäh
 
Ideas spracklen-final
supportlogic
 
Machine Learning Models: From Research to Production 6.13.18
Cloudera, Inc.
 
Big Memory for HPC
MemVerge
 
Internet of Things, TYBSC IT, Semester 5, Unit II
Arti Parab Academics
 
The View - 30 proven Lotuscript tips
Bill Buchan
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Parallelization using open mp
ranjit banshpal
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Data Con LA
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
CS101- Introduction to Computing- Lecture 45
Bilal Ahmed
 
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
Neo4j
 
Platform Engineering for the Modern Oracle World
Simon Haslam
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
TriNimbus
 
10 Code Anti-Patterns to Avoid in Software Development.pdf
Ahmed Salama
 
Google cloud Study Jam 2023.pptx
GDSCNiT
 
odsc_2023.pdf
Sanghamitra Deb
 

More from Edge AI and Vision Alliance (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
Ad

Recently uploaded (20)

PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Ad

“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Useful Sensors

  • 1. Deploying Large Language Models on a Raspberry Pi Pete Warden CEO Useful Sensors © 2024 Useful Sensors 1
  • 2. • github.com/ee292d/labs/blob/m ain/lab1/run_llm.py • 60 lines of Python code, including comments. Running an LLM on a Raspberry Pi 2 © 2024 Useful Sensors
  • 3. 3 © 2024 Useful Sensors Demo
  • 4. • What’s the technology behind this code? • Where can you get models? • Which models will run efficiently on what hardware? • How can you customize models? • What’s coming in the future? What you need to know 4 © 2024 Useful Sensors
  • 5. • Llama.cpp was one of the first easy to deploy implementations of Meta’s open weights Llama v1 LLM. • It didn’t require Python or a lot of dependencies, unlike the Python code originally released by Meta, and so it became popular. • It was also easy to optimize, and so became faster on many platforms. • Support started to be added for other models, and a GGML format emerged that allowed export and import. What’s the technology here? 5 © 2024 Useful Sensors 🔥 💯 🙋♀️
  • 6. • No! Though Llama.cpp’s scope has expanded over time, it’s still limited in which models it can support, and is focused on inference rather than training. • The first generation of ML frameworks tried to be good at everything (TensorFlow more than most) which makes them hard to port, optimize, modify, and understand. • We’re seeing different design goals in this generation. PyTorch is the favorite for prototyping and training, but other tools are used for inference, compression, and fine- tuning. So it’s like PyTorch or TensorFlow? 6 © 2024 Useful Sensors
  • 7. • Another library I use a lot is CTransformers2. This is similar to GGML, but has more of a focus on quantization and optimization. • Don’t expect to bring your own model though. A key difference between gen 1 frameworks and these is that they only support a subset of models, and adding new architectures may involve code changes. • They also often break compatibility with saved files, requiring reconversion when you upgrade to a new library version. Other frameworks 7 © 2024 Useful Sensors
  • 8. Where can you get models? 8 © 2024 Useful Sensors You can find almost any released model in any format somewhere on the site, look in the files section. On Reddit, r/LocalLlama is the place to find news and advice on running models, along with some impressive demos. • Be aware, most models are “open weights”, but few are “open source”. You can use the pretrained models, but the datasets and training code are usually kept proprietary. The Allen Institute’s Olma project is a welcome exception.
  • 9. • You need a lot of RAM for LLMs, because transformers use dynamic layers constructed in memory. A good rule of thumb is that you need as much RAM as the model file size. For example a 7-billion parameter model at eight bits will be 7GB on disk, and you can expect to need at least 7GB of RAM to run it at a decent speed. • The latency is also usually dominated by the RAM speed, so the faster the better. • TPUs and other accelerators often don’t help much, since we’re memory bound. Which models run on what HW? 9 © 2024 Useful Sensors = Model file size Rule of thumb
  • 10. • Running as a regular Android or iOS app is hard because you need to use a lot more memory and compute than most applications, and you’ll get throttled or blocked. • If you have vendor-level access to avoid these limits, Android on a modern SoC is a good option. • Otherwise a Raspberry Pi 5 is a good option, with 8GB of RAM it can handle medium- sized models. Other quad-core A76 SBCs are similar. • Microcontrollers and DSPs (meaning low power or low cost) aren’t possible right now because of how RAM-hungry these models are. What hardware should you use? 10 © 2024 Useful Sensors
  • 11. • Since all mainstream LLMs are Transformer-based, and Transformer models are memory bound on batch-size-one inference, the size of the data you pull from memory matters. • Quantization is an old technique that has become more relevant with models now memory bound. It takes 32-bit floating point representations of weights and shrinks them down to values that take fewer bits per value. Eight bit is standard for convolutional image models, but since bandwidth is so critical and unpacking compute can be hidden in memory latency, four, two, or even one bit schemes are now in use. Quantization 11 © 2024 Useful Sensors
  • 12. • Low Rank Adaptation (or LoRA) is a technique that’s similar in effect to transfer learning in CNN models. It lets you add extra layers to a pretrained model to customize its outputs, with shorter training times and less data than a full training run. • Here’s an example you can run in a Colab notebook in under an hour: • http://github.com/ee292d/labs/blob/main/lab6/notebook.ipynb How can you customize models? 12 © 2024 Useful Sensors
  • 13. 13 © 2024 Useful Sensors LoRA Training Demo
  • 14. • The idea is to use conventional search techniques to retrieve factual information to insert in the prompt as context, so the user question will draw on that knowledge. • For example, you could notice a question contains the name of a product, and insert the product description as the context. The result should then be able to use that extra information to give a better answer. • I hate it! Retrieval Augmented Generation 14 © 2024 Useful Sensors
  • 15. • It’s a neat technique, but it’s usually overkill for most practical situations. The “generation” part means you’re still going to have some situations where the model makes up answers. • In most cases you can just do a good job on the “retrieval” and show those answers directly to the user. They’re vetted, relevant, and easy to control. RAG is for when you need to scale a solution, which isn’t relevant for most applications I encounter. Why I hate RAG 15 © 2024 Useful Sensors
  • 16. • Models keep getting smaller and more accurate. Microsoft’s latest Phi 3 is a great example of the trend. • Transformers are memory hungry and hard to accelerate. There are lots of alternatives like Mamba and Conformers that offer different tradeoffs, maybe something new will emerge that’s better for the edge. • Shrinking scope will help us use even smaller models too, especially as I expect retrieval will be more important than generation long term. What’s coming next? 16 © 2024 Useful Sensors
  • 17. • LLMs want to be on the edge! • Dip your toes in the water with some simple code experiments, and prototype solutions that make sense to you. • These models are only going to get faster and more capable, and hardware will emerge to help with that. Conclusions 17 © 2024 Useful Sensors
  • 18. • These slides: usfl.ink/ev_talk • EE292D Labs: github.com/ee292d • Intro to GGML: omkar.xyz/intro-ggml • Huggingface: huggingface.co Resources 18 © 2024 Useful Sensors
  • 19. • We run the latest AI models on edge hardware to solve problems like person detection, language translation, voice interfaces, LLM querying, and more! • Come see us at our booth (#806) Useful Sensors 19 © 2024 Useful Sensors
  • 20. 20 © 2024 Useful Sensors Thank you