Using Gemma for Flutter apps
First steps with on-device AI in Flutter using flutter_gemma
package
A couple of weeks ago, I had the chance to attend the FlutterNinja 2025 conference in Tokyo as a speaker. As a teacher and staff engineer, I often find myself in situations where I need to speak and teach others, but public speaking still makes me nervous before my talks. However, this time I decided not to get distracted by my nerves and instead spent time in the audience, focusing on learning from the other speakers. It was a great decision because I learned a lot.
Besides all the amazing talks, the talk by especially encouraged me to try to validate the technologies he presented. His talk, titled “Building Flutter Apps with AI Capabilities without Connection, Bills, and Privacy Concerns” presented a very exciting approach to using AI in mobile apps. The talk demonstrated how to run different AI models directly on a user’s device using the flutter_gemma package he published.
This idea motivated me to try his package, and I want to share with you how you can get started as well.
First, what is Gemma?
You have probably heard of Gemini, the powerful AI models that power many Google services. The Gemma family is its sibling, but here’s the key difference: Gemma models are lightweight, open-source, and specifically designed to run locally on mobile devices and laptops.
In this article, we focus on the Gemma 3N model, which offers multimodal AI capabilities (including audio, text, image, and video) for edge devices in 35 languages and supports over 140 languages for text-based tasks.
This on-device capability offers three main benefits for mobile engineers:
- Offline Functionality: AI features in your app work without an internet connection.
- Enhanced Privacy: User data is processed on their device and does not need to be sent to a server.
- No Server Costs: Since the models run locally, you avoid the costs associated with cloud-based API calls.
The flutter_gemma
package for on-device AI
The publicly availableflutter_gemma
from pub.dev allows you to use these models in your Flutter apps. It offers integration with Gemma and other popular open-source models, such as DeepSeek, Phi, and Falcon, directly into your iOS, Android, and Web applications.
Accessing the models
Before writing code, we need a model to run. Many models, including Google’s Gemma, are available on platforms like Kaggle or Hugging Face. To download them, you will need an account and a personal access token.
- Create a Hugging Face account: If you don’t already have one, sign up at huggingface.co.
- Generate an access token: Go to your Account Settings -> Access Tokens and create a new token. Make sure to give it
read
permissions.
You will use this token inside the app to download the models. For instance, we can take a look at the example app flow of the flutter_gemma
package:
When you first try to download a model, you might run into an issue. If you see an error in your logs similar to
Access to model... is restricted and you are not in the authorized list
, this typically means the model is "gated.", which simply means you need to accept the license terms on the model's webpage to gain access, as shown in the image below:
After successfully downloading the model, we can start interacting with it. In the case of Gemma 3N, beyond the basic text-to-text chat interface, we can also provide images within our conversation. I provided the model with an image of a Japanese road sign and asked for a translation. As the screenshots below show, it correctly analyzed the image and explained.
Did you notice something interesting about this flow? Taking a closer look at the status bar in the screenshots, it shows that the conversation, including image analysis and text generation, happened completely on-device, without internet connection. This is the core advantage of using Gemma.
Building an offline menu translator
My trip to Japan gave me the use case for this app, as one evening, my eSIM data had just expired, and the restaurant where I wanted to have dinner had no WiFi. I was looking at a menu written only in Japanese, with no option to use an online translation tool. Of course, the staff were incredibly kind and helped explain the options, which is so common in Japan. But it highlighted a different need. What if you’re more introverted, or want the comfort of understanding the menu yourself before asking for help?
This is a perfect problem for on-device AI, which can help you to discreetly get a private translation of the menu at your table even if you are offline. So, let’s take this inspiration and build our offline translator with flutter_gemma
.
Adding the package to your app
Inspired by the example app after interacting with our downloaded model, now, let’s add it flutter_gemma
to your project. In your pubspec.yaml
file, add the dependency:
dependencies:
flutter:
sdk: flutter
flutter_gemma: ^0.9.0
Then, run flutter pub get
in your terminal to install it. With the setup complete, let’s build our app, where the implementation is broken down into 3 main steps:
1. Download the Model
The first step is to get the AI model onto the user’s device. For our translator, which needs to understand both text and images, we use a multimodal-capable Gemma 3 Nano model.
// Gemma 3N model we download in lib/data/gemma_downloader_datasource.dart
DownloadModel(
modelUrl:'https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/resolve/main/gemma-3n-E4B-it-int4.task',
modelFilename: 'gemma-3n-E4B-it-int4.task',
)
Opening the app, a check is performed to see if the model has already been downloaded. If not, the app callsdownloadModel()
, which provides information also about the download progress, which is handy for updating a loading indicator in the UI.
// lib/ui/translator_screen.dart
Future<void> downloadModelIfNeeded() async {
// Check if the model is already on the device.
final isModelInstalled = await _downloaderDataSource.checkModelExistence();
if (!isModelInstalled) {
// If not, start the download and listen to the progress stream.
await _downloaderDataSource.downloadModel(
token: accessToken, // Access token for HuggingFace
onProgress: (progress) {
setState(() {
_downloadProgress = progress;
});
},
);
}
}
}
Important: For production apps, you should always load models from a network drive. The documentation warns against embedding model files directly in your app’s assets for production use. Also, a great consideration can be to utilize the
ModelFileManager
provided byflutter_gemma
to eliminate the custom logic I applied within my sample.
2. Creating a chat Instance
Once the Gemma 3 Nano model is on the device, we can prepare it for conversation. This involves creating an InferenceModel
from the downloaded file and then creating a Chat
instance from that model. The Chat
object is what will handle our conversational context.
// lib/ui/translator_screen.dart
Future<void> initializeChat() async {
final gemma = FlutterGemmaPlugin.instance;
// Load the downloaded Gemma3N model
_inferenceModel = await gemma.createModel(
modelType: ModelType.gemmaIt,
supportImage: true // Enabling the image support
maxTokens: 2048,
);
// Create a chat session from the loaded model.
_chat = await _inferenceModel!.createChat(supportImage: true);
}
3. Generating a response
With the chat instance ready, the app can now send prompts and receive responses. When the user submits an image with a Menu, we package it into a Message
object, send it to the chat, and listen to a stream for Gemma’s reply. This results in creating a real-time "typing" effect.
// Simplified implementation in lib/ui/translator_screen.dart
void _sendMessage() async {
// Package the text and image into a single Message object.
final userMessage = Message.withImage(
text: text.isNotEmpty ? text : "Please translate this menu into English.",
imageBytes: _selectedImageBytes,
isUser: true,
);
// Add the user's message to the UI
setState(() {
_messages.add(userMessage);
});
try {
// Send the user's message to the chat.
await _chat!.addQueryChunk(userMessage);
// Listen for the Gemma's response as a stream of text tokens.
final responseStream = _chat!.generateChatResponseAsync();
await for (final token in responseStream) {
// Append the incoming token to the message in the UI.
}
} catch (e) {
// Handle any errors during the response generation.
}
}
Menu translator flow:
Next steps
Our translator app is a great POC, but it’s just the starting point. If we want to take this project further and make it production-ready, we can consider to following next steps
- Adding proper state management: The current implementation purposely utilized the basic
StatefulWidget
But to scale it, we will need to add either BLoC or Riverpod to separate the UI from the business logic. - Secure Our API Key: We should store our Hugging Face access token securely using environment variables (
--dart-define
) or a dedicated API, rather than hardcoding it. - Simplify with the model downloading: To make our code even cleaner, we can remove our custom
GemmaDownloaderDataSource
. Theflutter_gemma
package has a great built-inModelFileManager
that handles downloading perfectly.
Additional advanced capabilities
For those who want to dive deeper, we could explore the package’s support for
- LoRA (Low-Rank Adaptation) allows for efficient customization of the model’s behavior without needing to retrain the entire model. Although this might be beyond the scope of a “getting started” article, it’s useful to know it’s available when you need more specialized behavior, such as translating culinary terms or adopting a specific formal or informal tone.
Important: the
flutter_gemma
package is actively developing. As of now, the multimodal features we've used (combining text and image input) are fully supported on both iOS and Android. Support for other modalities, like audio and video, and enhanced multimodal support for the Web platform are on the roadmap and coming soon. As this is an open-source project, all contributions are welcome to help bring these exciting features to our users' devices even faster.
Conclusion
This journey started with a common travel problem — being stuck in a restaurant without a translation — and ended with a fully functional, offline AI-powered solution. We saw how the flutter_gemma
package provided a surprisingly easy way to interact with Gemma, which solves our problem.
Having Gemma 3N model running locally on our device, we created a private, offline-capable tool that would have been incredibly complex to build just a few years ago. The tools for on-device AI are more accessible than ever, and I’m excited to see what real-world problems you will solve with them
As this is my first Medium article, your feedback is incredibly valuable. If you found it helpful, please consider sharing it with your teammates or leaving a comment. Thank you for reading! 🙌
📚Resources and further reading
- Repository of the Offline Menu Translator: https://github.com/gerfalcon/offline_menu_translator
- Gemma Official Site: https://deepmind.google/models/gemma/
- Gemma technical blog post: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/
- Gemma 3N Model Overview: https://ai.google.dev/gemma/docs/gemma-3n
- Download the Gemma models on Hugging Face and Kaggle.
Another alternative solution: