Using Gemma for Flutter apps

First steps with on-device AI in Flutter using `flutter_gemma` package

9 min readJun 29, 2025

A couple of weeks ago, I had the chance to attend the FlutterNinja 2025 conference in Tokyo as a speaker. As a teacher and staff engineer, I often find myself in situations where I need to speak and teach others, but public speaking still makes me nervous before my talks. However, this time I decided not to get distracted by my nerves and instead spent time in the audience, focusing on learning from the other speakers. It was a great decision because I learned a lot.

Besides all the amazing talks, the talk by

Sasha Denisov

especially encouraged me to try to validate the technologies he presented. His talk, titled “Building Flutter Apps with AI Capabilities without Connection, Bills, and Privacy Concerns” presented a very exciting approach to using AI in mobile apps. The talk demonstrated how to run different AI models directly on a user’s device using the flutter_gemma package he published.

This idea motivated me to try his package, and I want to share with you how you can get started as well.

First, what is Gemma?

You have probably heard of Gemini, the powerful AI models that power many Google services. The Gemma family is its sibling, but here’s the key difference: Gemma models are lightweight, open-source, and specifically designed to run locally on mobile devices and laptops.

In this article, we focus on the Gemma 3N model, which offers multimodal AI capabilities (including audio, text, image, and video) for edge devices in 35 languages and supports over 140 languages for text-based tasks.

Gemma 3n model, which provides multimodal AI capabilities to edge devices.

This on-device capability offers three main benefits for mobile engineers:

Offline Functionality: AI features in your app work without an internet connection.
Enhanced Privacy: User data is processed on their device and does not need to be sent to a server.
No Server Costs: Since the models run locally, you avoid the costs associated with cloud-based API calls.

The `flutter_gemma` package for on-device AI

The publicly availableflutter_gemma from pub.dev allows you to use these models in your Flutter apps. It offers integration with Gemma and other popular open-source models, such as DeepSeek, Phi, and Falcon, directly into your iOS, Android, and Web applications.

Accessing the models

Before writing code, we need a model to run. Many models, including Google’s Gemma, are available on platforms like Kaggle or Hugging Face. To download them, you will need an account and a personal access token.

Create a Hugging Face account: If you don’t already have one, sign up at huggingface.co.
Generate an access token: Go to your Account Settings -> Access Tokens and create a new token. Make sure to give it read permissions.

Generating an access token in Hugging Face.

You will use this token inside the app to download the models. For instance, we can take a look at the example app flow of the flutter_gemma package:

The flow of the example app, from left to right: Browse available models, download a selected model using an access token, and interact with the model in a chat interface.

When you first try to download a model, you might run into an issue. If you see an error in your logs similar to Access to model... is restricted and you are not in the authorized list, this typically means the model is "gated.", which simply means you need to accept the license terms on the model's webpage to gain access, as shown in the image below:

Accepting the terms of a “gated” model before downloading.

After successfully downloading the model, we can start interacting with it. In the case of Gemma 3N, beyond the basic text-to-text chat interface, we can also provide images within our conversation. I provided the model with an image of a Japanese road sign and asked for a translation. As the screenshots below show, it correctly analyzed the image and explained.

Test interaction with the Gemma 3N model while using image

Did you notice something interesting about this flow? Taking a closer look at the status bar in the screenshots, it shows that the conversation, including image analysis and text generation, happened completely on-device, without internet connection. This is the core advantage of using Gemma.

Having no WiFi symbol in the status bar confirming the interaction performed offline.

Building an offline menu translator

My trip to Japan gave me the use case for this app, as one evening, my eSIM data had just expired, and the restaurant where I wanted to have dinner had no WiFi. I was looking at a menu written only in Japanese, with no option to use an online translation tool. Of course, the staff were incredibly kind and helped explain the options, which is so common in Japan. But it highlighted a different need. What if you’re more introverted, or want the comfort of understanding the menu yourself before asking for help?

This is a perfect problem for on-device AI, which can help you to discreetly get a private translation of the menu at your table even if you are offline. So, let’s take this inspiration and build our offline translator with flutter_gemma.

Adding the package to your app

Inspired by the example app after interacting with our downloaded model, now, let’s add it flutter_gemma to your project. In your pubspec.yaml file, add the dependency:

dependencies:
  flutter:
    sdk: flutter
  flutter_gemma: ^0.9.0

Then, run flutter pub get in your terminal to install it. With the setup complete, let’s build our app, where the implementation is broken down into 3 main steps:

1. Download the Model

The first step is to get the AI model onto the user’s device. For our translator, which needs to understand both text and images, we use a multimodal-capable Gemma 3 Nano model.

// Gemma 3N model we download in lib/data/gemma_downloader_datasource.dart
DownloadModel(
  modelUrl:'https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/resolve/main/gemma-3n-E4B-it-int4.task',
  modelFilename: 'gemma-3n-E4B-it-int4.task',
)

Opening the app, a check is performed to see if the model has already been downloaded. If not, the app callsdownloadModel(), which provides information also about the download progress, which is handy for updating a loading indicator in the UI.

// lib/ui/translator_screen.dart
Future<void> downloadModelIfNeeded() async {
  
  // Check if the model is already on the device.
  final isModelInstalled = await _downloaderDataSource.checkModelExistence();

  if (!isModelInstalled) {
    // If not, start the download and listen to the progress stream.
      await _downloaderDataSource.downloadModel(
        token: accessToken, // Access token for HuggingFace 
        onProgress: (progress) {
          setState(() {
            _downloadProgress = progress;
          });
        },
      );
    }
  }
}

Important: For production apps, you should always load models from a network drive. The documentation warns against embedding model files directly in your app’s assets for production use. Also, a great consideration can be to utilize the ModelFileManager provided by flutter_gemma to eliminate the custom logic I applied within my sample.

2. Creating a chat Instance

Once the Gemma 3 Nano model is on the device, we can prepare it for conversation. This involves creating an InferenceModel from the downloaded file and then creating a Chat instance from that model. The Chat object is what will handle our conversational context.

// lib/ui/translator_screen.dart
Future<void> initializeChat() async {
  final gemma = FlutterGemmaPlugin.instance;
  
  // Load the downloaded Gemma3N model
  _inferenceModel = await gemma.createModel(
    modelType: ModelType.gemmaIt,
    supportImage: true // Enabling the image support
    maxTokens: 2048,
  );

  // Create a chat session from the loaded model.
  _chat = await _inferenceModel!.createChat(supportImage: true);
}

3. Generating a response

With the chat instance ready, the app can now send prompts and receive responses. When the user submits an image with a Menu, we package it into a Message object, send it to the chat, and listen to a stream for Gemma’s reply. This results in creating a real-time "typing" effect.

// Simplified implementation in lib/ui/translator_screen.dart
void _sendMessage() async {

  // Package the text and image into a single Message object.
  final userMessage = Message.withImage(
    text: text.isNotEmpty ? text : "Please translate this menu into English.",
    imageBytes: _selectedImageBytes,
    isUser: true,
  );

  // Add the user's message to the UI
    setState(() {
      _messages.add(userMessage);
    });


  try {
    // Send the user's message to the chat.
    await _chat!.addQueryChunk(userMessage);

    // Listen for the Gemma's response as a stream of text tokens.
    final responseStream = _chat!.generateChatResponseAsync();
    await for (final token in responseStream) {
      // Append the incoming token to the message in the UI.
    }
  } catch (e) {
    // Handle any errors during the response generation.
  }
}

Menu translator flow:

From left to right: The app loads the Gemma 3N model. Then, we are sending a Japanese Menu image for translation. Lastly, the app displays the translated Menu

Next steps

Our translator app is a great POC, but it’s just the starting point. If we want to take this project further and make it production-ready, we can consider to following next steps

Adding proper state management: The current implementation purposely utilized the basicStatefulWidget But to scale it, we will need to add either BLoC or Riverpod to separate the UI from the business logic.
Secure Our API Key: We should store our Hugging Face access token securely using environment variables (--dart-define) or a dedicated API, rather than hardcoding it.
Simplify with the model downloading: To make our code even cleaner, we can remove our custom GemmaDownloaderDataSource. The flutter_gemma package has a great built-in ModelFileManager that handles downloading perfectly.

Additional advanced capabilities

For those who want to dive deeper, we could explore the package’s support for

LoRA (Low-Rank Adaptation) allows for efficient customization of the model’s behavior without needing to retrain the entire model. Although this might be beyond the scope of a “getting started” article, it’s useful to know it’s available when you need more specialized behavior, such as translating culinary terms or adopting a specific formal or informal tone.

Important: the flutter_gemma package is actively developing. As of now, the multimodal features we've used (combining text and image input) are fully supported on both iOS and Android. Support for other modalities, like audio and video, and enhanced multimodal support for the Web platform are on the roadmap and coming soon. As this is an open-source project, all contributions are welcome to help bring these exciting features to our users' devices even faster.

Conclusion

This journey started with a common travel problem — being stuck in a restaurant without a translation — and ended with a fully functional, offline AI-powered solution. We saw how the flutter_gemma package provided a surprisingly easy way to interact with Gemma, which solves our problem.

Having Gemma 3N model running locally on our device, we created a private, offline-capable tool that would have been incredibly complex to build just a few years ago. The tools for on-device AI are more accessible than ever, and I’m excited to see what real-world problems you will solve with them

As this is my first Medium article, your feedback is incredibly valuable. If you found it helpful, please consider sharing it with your teammates or leaving a comment. Thank you for reading! 🙌

📚Resources and further reading

Repository of the Offline Menu Translator: https://github.com/gerfalcon/offline_menu_translator

Fine-Tuning Gemma with LoRA for On-Device Inference (Android, iOS, Web) with Separate LoRA Weights

Prologue

medium.com

Gemma Official Site: https://deepmind.google/models/gemma/
Gemma technical blog post: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/
Gemma 3N Model Overview: https://ai.google.dev/gemma/docs/gemma-3n
Download the Gemma models on Hugging Face and Kaggle.

Another alternative solution:

Repository: Official Flutter packages for MediaPipe
Video: Observable Flutter #43: On-device LLMs with Gemma

Using Gemma for Flutter apps

First steps with on-device AI in Flutter using `flutter_gemma` package

First, what is Gemma?

The `flutter_gemma` package for on-device AI

Accessing the models

Building an offline menu translator

Adding the package to your app

1. Download the Model

2. Creating a chat Instance

3. Generating a response

Next steps

Additional advanced capabilities

Conclusion

📚Resources and further reading

Fine-Tuning Gemma with LoRA for On-Device Inference (Android, iOS, Web) with Separate LoRA Weights

Prologue

Written by Csongor Vogel

Responses (7)

Using Gemma for Flutter apps

First steps with on-device AI in Flutter using flutter_gemma package

First, what is Gemma?

The flutter_gemma package for on-device AI

Accessing the models

Building an offline menu translator

Adding the package to your app

1. Download the Model

2. Creating a chat Instance

3. Generating a response

Next steps

Additional advanced capabilities

Conclusion

📚Resources and further reading

Fine-Tuning Gemma with LoRA for On-Device Inference (Android, iOS, Web) with Separate LoRA Weights

Prologue

Written by Csongor Vogel

Responses (7)

First steps with on-device AI in Flutter using `flutter_gemma` package

The `flutter_gemma` package for on-device AI