Możesz poprosić model Gemini o wygenerowanie i edytowanie obrazów za pomocą promptów zawierających tylko tekst oraz promptów zawierających tekst i obraz. Gdy używasz Firebase AI Logic, możesz wysłać to żądanie bezpośrednio z aplikacji.
Dzięki tej funkcji możesz m.in.:
Iteracyjne generowanie obrazów w ramach rozmowy w języku naturalnym, dostosowywanie obrazów przy zachowaniu spójności i kontekstu.
generować obrazy z wysokiej jakości renderowaniem tekstu, w tym długich ciągów tekstu;
Generowanie przeplatanego tekstu i obrazów. Na przykład post na blogu z tekstem i obrazami w jednej turze. Wcześniej wymagało to połączenia ze sobą wielu modeli.
Generuj obrazy, korzystając z wiedzy o świecie i możliwości rozumowania Gemini.
Pełną listę obsługiwanych trybów i funkcji (wraz z przykładowymi promptami) znajdziesz w dalszej części tej strony.
W przypadku danych wyjściowych w postaci obrazu musisz użyć modelu Geminigemini-2.0-flash-preview-image-generation
i uwzględnić responseModalities: ["TEXT", "IMAGE"]
Przejdź do kodu do generowania obrazów na podstawie tekstu Przejdź do kodu do generowania tekstu i obrazów
Przejdź do kodu edycji obrazu Przejdź do kodu iteracyjnej edycji obrazu
Zobacz inne przewodniki, aby poznać dodatkowe opcje pracy z obrazami Analizowanie obrazów Analizowanie obrazów na urządzeniu Generowanie danych strukturalnych |
Wybór między modelami Gemini i Imagen
Pakiety SDK Firebase AI Logic obsługują generowanie obrazów przy użyciu modelu Gemini lub Imagen. W większości przypadków zacznij od Gemini, a potem wybierz Imagen w przypadku specjalistycznych zadań, w których jakość obrazu ma kluczowe znaczenie.
Pamiętaj, że pakiety SDK Firebase AI Logic nie obsługują jeszcze danych wejściowych w postaci obrazu (np. do edycji) w przypadku modeli Imagen. Jeśli więc chcesz pracować z obrazami wejściowymi, możesz zamiast tego użyć modelu Gemini.
Wybierz Gemini, gdy chcesz:
- wykorzystywać wiedzę o świecie i rozumowanie do generowania obrazów dopasowanych do kontekstu;
- płynnie łączyć tekst i obrazy;
- Aby osadzać dokładne wizualizacje w długich sekwencjach tekstu.
- edytować obrazy w formie rozmowy, zachowując kontekst.
Wybierz Imagen, gdy chcesz:
- Aby nadać priorytet jakości obrazu, fotorealizmowi, szczegółom artystycznym lub określonym stylom (np. impresjonizmowi lub anime).
- Aby wyraźnie określić format wygenerowanych obrazów.
Zanim zaczniesz
Kliknij dostawcę Gemini API, aby wyświetlić na tej stronie treści i kod dostawcy. |
Jeśli jeszcze tego nie zrobisz, zapoznaj się z przewodnikiem dla początkujących, w którym znajdziesz informacje o tym, jak skonfigurować projekt Firebase, połączyć aplikację z Firebase, dodać pakiet SDK, zainicjować usługę backendu dla wybranego dostawcy Gemini API i utworzyć instancję GenerativeModel
.
Do testowania i ulepszania promptów, a nawet uzyskiwania wygenerowanego fragmentu kodu zalecamy używanie Google AI Studio.
Modele obsługujące tę funkcję
Dane wyjściowe obrazu z Gemini są obsługiwane tylko przez gemini-2.0-flash-preview-image-generation
(nie przez gemini-2.0-flash
).
Pamiętaj, że pakiety SDK obsługują też generowanie obrazów za pomocą modeli Imagen.
Generuj i edytuj obrazy
Możesz generować i edytować obrazy za pomocą modelu Gemini.
Generowanie obrazów (tylko tekst)
Zanim wypróbujesz ten przykład, zapoznaj się z sekcją Zanim zaczniesz w tym przewodniku, aby skonfigurować projekt i aplikację. W tej sekcji klikniesz też przycisk wybranego dostawcyGemini API, aby na tej stronie wyświetlały się treści dotyczące tego dostawcy. |
Możesz poprosić model Gemini o wygenerowanie obrazów, podając mu prompta w formie tekstu.
Utwórz instancję GenerativeModel
, uwzględnij responseModalities: ["TEXT", "IMAGE"]
generateContent
.
Swift
import FirebaseAI
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
modelName: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: GenerationConfig(responseModalities: [.text, .image])
)
// Provide a text prompt instructing the model to generate an image
let prompt = "Generate an image of the Eiffel tower with fireworks in the background."
// To generate an image, call `generateContent` with the text input
let response = try await model.generateContent(prompt)
// Handle the generated image
guard let inlineDataPart = response.inlineDataParts.first else {
fatalError("No image data in response.")
}
guard let uiImage = UIImage(data: inlineDataPart.data) else {
fatalError("Failed to convert data to UIImage.")
}
Kotlin
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
modelName = "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig = generationConfig {
responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) }
)
// Provide a text prompt instructing the model to generate an image
val prompt = "Generate an image of the Eiffel tower with fireworks in the background."
// To generate image output, call `generateContent` with the text input
val generatedImageAsBitmap = model.generateContent(prompt)
// Handle the generated image
.candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() }
Java
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(
"gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
new GenerationConfig.Builder()
.setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))
.build()
);
GenerativeModelFutures model = GenerativeModelFutures.from(ai);
// Provide a text prompt instructing the model to generate an image
Content prompt = new Content.Builder()
.addText("Generate an image of the Eiffel Tower with fireworks in the background.")
.build();
// To generate an image, call `generateContent` with the text input
ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
@Override
public void onSuccess(GenerateContentResponse result) {
// iterate over all the parts in the first candidate in the result object
for (Part part : result.getCandidates().get(0).getContent().getParts()) {
if (part instanceof ImagePart) {
ImagePart imagePart = (ImagePart) part;
// The returned image as a bitmap
Bitmap generatedImageAsBitmap = imagePart.getImage();
break;
}
}
}
@Override
public void onFailure(Throwable t) {
t.printStackTrace();
}
}, executor);
Web
import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
// ...
};
// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, {
model: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: {
responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],
},
});
// Provide a text prompt instructing the model to generate an image
const prompt = 'Generate an image of the Eiffel Tower with fireworks in the background.';
// To generate an image, call `generateContent` with the text input
const result = model.generateContent(prompt);
// Handle the generated image
try {
const inlineDataParts = result.response.inlineDataParts();
if (inlineDataParts?.[0]) {
const image = inlineDataParts[0].inlineData;
console.log(image.mimeType, image.data);
}
} catch (err) {
console.error('Prompt or candidate was blocked:', err);
}
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
final model = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.0-flash-preview-image-generation',
// Configure the model to respond with text and images
generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]),
);
// Provide a text prompt instructing the model to generate an image
final prompt = [Content.text('Generate an image of the Eiffel Tower with fireworks in the background.')];
// To generate an image, call `generateContent` with the text input
final response = await model.generateContent(prompt);
if (response.inlineDataParts.isNotEmpty) {
final imageBytes = response.inlineDataParts[0].bytes;
// Process the image
} else {
// Handle the case where no images were generated
print('Error: No images were generated.');
}
Unity
using Firebase;
using Firebase.AI;
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(
modelName: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: new GenerationConfig(
responseModalities: new[] { ResponseModality.Text, ResponseModality.Image })
);
// Provide a text prompt instructing the model to generate an image
var prompt = "Generate an image of the Eiffel Tower with fireworks in the background.";
// To generate an image, call `GenerateContentAsync` with the text input
var response = await model.GenerateContentAsync(prompt);
var text = response.Text;
if (!string.IsNullOrWhiteSpace(text)) {
// Do something with the text
}
// Handle the generated image
var imageParts = response.Candidates.First().Content.Parts
.OfType<ModelContent.InlineDataPart>()
.Where(part => part.MimeType == "image/png");
foreach (var imagePart in imageParts) {
// Load the Image into a Unity Texture2D object
UnityEngine.Texture2D texture2D = new(2, 2);
if (texture2D.LoadImage(imagePart.Data.ToArray())) {
// Do something with the image
}
}
Generowanie tekstu przeplatanego obrazami
Zanim wypróbujesz ten przykład, zapoznaj się z sekcją Zanim zaczniesz w tym przewodniku, aby skonfigurować projekt i aplikację. W tej sekcji klikniesz też przycisk wybranego dostawcyGemini API, aby na tej stronie wyświetlały się treści dotyczące tego dostawcy. |
Możesz poprosić model Gemini o wygenerowanie obrazów przeplatanych z odpowiedziami tekstowymi. Możesz na przykład wygenerować obrazy przedstawiające każdy krok wygenerowanego przepisu wraz z instrukcjami. Nie musisz wysyłać oddzielnych żądań do modelu ani różnych modeli.
Utwórz instancję GenerativeModel
, uwzględnij responseModalities: ["TEXT", "IMAGE"]
generateContent
.
Swift
import FirebaseAI
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
modelName: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: GenerationConfig(responseModalities: [.text, .image])
)
// Provide a text prompt instructing the model to generate interleaved text and images
let prompt = """
Generate an illustrated recipe for a paella.
Create images to go alongside the text as you generate the recipe
"""
// To generate interleaved text and images, call `generateContent` with the text input
let response = try await model.generateContent(prompt)
// Handle the generated text and image
guard let candidate = response.candidates.first else {
fatalError("No candidates in response.")
}
for part in candidate.content.parts {
switch part {
case let textPart as TextPart:
// Do something with the generated text
let text = textPart.text
case let inlineDataPart as InlineDataPart:
// Do something with the generated image
guard let uiImage = UIImage(data: inlineDataPart.data) else {
fatalError("Failed to convert data to UIImage.")
}
default:
fatalError("Unsupported part type: \(part)")
}
}
Kotlin
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
modelName = "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig = generationConfig {
responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) }
)
// Provide a text prompt instructing the model to generate interleaved text and images
val prompt = """
Generate an illustrated recipe for a paella.
Create images to go alongside the text as you generate the recipe
""".trimIndent()
// To generate interleaved text and images, call `generateContent` with the text input
val responseContent = model.generateContent(prompt).candidates.first().content
// The response will contain image and text parts interleaved
for (part in responseContent.parts) {
when (part) {
is ImagePart -> {
// ImagePart as a bitmap
val generatedImageAsBitmap: Bitmap? = part.asImageOrNull()
}
is TextPart -> {
// Text content from the TextPart
val text = part.text
}
}
}
Java
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(
"gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
new GenerationConfig.Builder()
.setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))
.build()
);
GenerativeModelFutures model = GenerativeModelFutures.from(ai);
// Provide a text prompt instructing the model to generate interleaved text and images
Content prompt = new Content.Builder()
.addText("Generate an illustrated recipe for a paella.\n" +
"Create images to go alongside the text as you generate the recipe")
.build();
// To generate interleaved text and images, call `generateContent` with the text input
ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
@Override
public void onSuccess(GenerateContentResponse result) {
Content responseContent = result.getCandidates().get(0).getContent();
// The response will contain image and text parts interleaved
for (Part part : responseContent.getParts()) {
if (part instanceof ImagePart) {
// ImagePart as a bitmap
Bitmap generatedImageAsBitmap = ((ImagePart) part).getImage();
} else if (part instanceof TextPart){
// Text content from the TextPart
String text = ((TextPart) part).getText();
}
}
}
@Override
public void onFailure(Throwable t) {
System.err.println(t);
}
}, executor);
Web
import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
// ...
};
// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, {
model: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: {
responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],
},
});
// Provide a text prompt instructing the model to generate interleaved text and images
const prompt = 'Generate an illustrated recipe for a paella.\n.' +
'Create images to go alongside the text as you generate the recipe';
// To generate interleaved text and images, call `generateContent` with the text input
const result = await model.generateContent(prompt);
// Handle the generated text and image
try {
const response = result.response;
if (response.candidates?.[0].content?.parts) {
for (const part of response.candidates?.[0].content?.parts) {
if (part.text) {
// Do something with the text
console.log(part.text)
}
if (part.inlineData) {
// Do something with the image
const image = part.inlineData;
console.log(image.mimeType, image.data);
}
}
}
} catch (err) {
console.error('Prompt or candidate was blocked:', err);
}
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
final model = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.0-flash-preview-image-generation',
// Configure the model to respond with text and images
generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]),
);
// Provide a text prompt instructing the model to generate interleaved text and images
final prompt = [Content.text(
'Generate an illustrated recipe for a paella\n ' +
'Create images to go alongside the text as you generate the recipe'
)];
// To generate interleaved text and images, call `generateContent` with the text input
final response = await model.generateContent(prompt);
// Handle the generated text and image
final parts = response.candidates.firstOrNull?.content.parts
if (parts.isNotEmpty) {
for (final part in parts) {
if (part is TextPart) {
// Do something with text part
final text = part.text
}
if (part is InlineDataPart) {
// Process image
final imageBytes = part.bytes
}
}
} else {
// Handle the case where no images were generated
print('Error: No images were generated.');
}
Unity
using Firebase;
using Firebase.AI;
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(
modelName: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: new GenerationConfig(
responseModalities: new[] { ResponseModality.Text, ResponseModality.Image })
);
// Provide a text prompt instructing the model to generate interleaved text and images
var prompt = "Generate an illustrated recipe for a paella \n" +
"Create images to go alongside the text as you generate the recipe";
// To generate interleaved text and images, call `GenerateContentAsync` with the text input
var response = await model.GenerateContentAsync(prompt);
// Handle the generated text and image
foreach (var part in response.Candidates.First().Content.Parts) {
if (part is ModelContent.TextPart textPart) {
if (!string.IsNullOrWhiteSpace(textPart.Text)) {
// Do something with the text
}
} else if (part is ModelContent.InlineDataPart dataPart) {
if (dataPart.MimeType == "image/png") {
// Load the Image into a Unity Texture2D object
UnityEngine.Texture2D texture2D = new(2, 2);
if (texture2D.LoadImage(dataPart.Data.ToArray())) {
// Do something with the image
}
}
}
}
Edytowanie obrazów (dane wejściowe w postaci tekstu i obrazu)
Zanim wypróbujesz ten przykład, zapoznaj się z sekcją Zanim zaczniesz w tym przewodniku, aby skonfigurować projekt i aplikację. W tej sekcji klikniesz też przycisk wybranego dostawcyGemini API, aby na tej stronie wyświetlały się treści dotyczące tego dostawcy. |
Możesz poprosić model Gemini o edytowanie obrazów, podając tekst i co najmniej 1 obraz.
Utwórz instancję GenerativeModel
, uwzględnij responseModalities: ["TEXT", "IMAGE"]
generateContent
.
Swift
import FirebaseAI
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
modelName: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: GenerationConfig(responseModalities: [.text, .image])
)
// Provide an image for the model to edit
guard let image = UIImage(named: "scones") else { fatalError("Image file not found.") }
// Provide a text prompt instructing the model to edit the image
let prompt = "Edit this image to make it look like a cartoon"
// To edit the image, call `generateContent` with the image and text input
let response = try await model.generateContent(image, prompt)
// Handle the generated image
guard let inlineDataPart = response.inlineDataParts.first else {
fatalError("No image data in response.")
}
guard let uiImage = UIImage(data: inlineDataPart.data) else {
fatalError("Failed to convert data to UIImage.")
}
Kotlin
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
modelName = "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig = generationConfig {
responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) }
)
// Provide an image for the model to edit
val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.scones)
// Provide a text prompt instructing the model to edit the image
val prompt = content {
image(bitmap)
text("Edit this image to make it look like a cartoon")
}
// To edit the image, call `generateContent` with the prompt (image and text input)
val generatedImageAsBitmap = model.generateContent(prompt)
// Handle the generated text and image
.candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() }
Java
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(
"gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
new GenerationConfig.Builder()
.setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))
.build()
);
GenerativeModelFutures model = GenerativeModelFutures.from(ai);
// Provide an image for the model to edit
Bitmap bitmap = BitmapFactory.decodeResource(resources, R.drawable.scones);
// Provide a text prompt instructing the model to edit the image
Content promptcontent = new Content.Builder()
.addImage(bitmap)
.addText("Edit this image to make it look like a cartoon")
.build();
// To edit the image, call `generateContent` with the prompt (image and text input)
ListenableFuture<GenerateContentResponse> response = model.generateContent(promptcontent);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
@Override
public void onSuccess(GenerateContentResponse result) {
// iterate over all the parts in the first candidate in the result object
for (Part part : result.getCandidates().get(0).getContent().getParts()) {
if (part instanceof ImagePart) {
ImagePart imagePart = (ImagePart) part;
Bitmap generatedImageAsBitmap = imagePart.getImage();
break;
}
}
}
@Override
public void onFailure(Throwable t) {
t.printStackTrace();
}
}, executor);
Web
import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
// ...
};
// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, {
model: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: {
responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],
},
});
// Prepare an image for the model to edit
async function fileToGenerativePart(file) {
const base64EncodedDataPromise = new Promise((resolve) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result.split(',')[1]);
reader.readAsDataURL(file);
});
return {
inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
};
}
// Provide a text prompt instructing the model to edit the image
const prompt = "Edit this image to make it look like a cartoon";
const fileInputEl = document.querySelector("input[type=file]");
const imagePart = await fileToGenerativePart(fileInputEl.files[0]);
// To edit the image, call `generateContent` with the image and text input
const result = await model.generateContent([prompt, imagePart]);
// Handle the generated image
try {
const inlineDataParts = result.response.inlineDataParts();
if (inlineDataParts?.[0]) {
const image = inlineDataParts[0].inlineData;
console.log(image.mimeType, image.data);
}
} catch (err) {
console.error('Prompt or candidate was blocked:', err);
}
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
final model = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.0-flash-preview-image-generation',
// Configure the model to respond with text and images
generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]),
);
// Prepare an image for the model to edit
final image = await File('scones.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);
// Provide a text prompt instructing the model to edit the image
final prompt = TextPart("Edit this image to make it look like a cartoon");
// To edit the image, call `generateContent` with the image and text input
final response = await model.generateContent([
Content.multi([prompt,imagePart])
]);
// Handle the generated image
if (response.inlineDataParts.isNotEmpty) {
final imageBytes = response.inlineDataParts[0].bytes;
// Process the image
} else {
// Handle the case where no images were generated
print('Error: No images were generated.');
}
Unity
using Firebase;
using Firebase.AI;
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(
modelName: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: new GenerationConfig(
responseModalities: new[] { ResponseModality.Text, ResponseModality.Image })
);
// Prepare an image for the model to edit
var imageFile = System.IO.File.ReadAllBytes(System.IO.Path.Combine(
UnityEngine.Application.streamingAssetsPath, "scones.jpg"));
var image = ModelContent.InlineData("image/jpeg", imageFile);
// Provide a text prompt instructing the model to edit the image
var prompt = ModelContent.Text("Edit this image to make it look like a cartoon.");
// To edit the image, call `GenerateContent` with the image and text input
var response = await model.GenerateContentAsync(new [] { prompt, image });
var text = response.Text;
if (!string.IsNullOrWhiteSpace(text)) {
// Do something with the text
}
// Handle the generated image
var imageParts = response.Candidates.First().Content.Parts
.OfType<ModelContent.InlineDataPart>()
.Where(part => part.MimeType == "image/png");
foreach (var imagePart in imageParts) {
// Load the Image into a Unity Texture2D object
Texture2D texture2D = new Texture2D(2, 2);
if (texture2D.LoadImage(imagePart.Data.ToArray())) {
// Do something with the image
}
}
Iteracyjne generowanie i edytowanie obrazów za pomocą czatu wieloetapowego
Zanim wypróbujesz ten przykład, zapoznaj się z sekcją Zanim zaczniesz w tym przewodniku, aby skonfigurować projekt i aplikację. W tej sekcji klikniesz też przycisk wybranego dostawcyGemini API, aby na tej stronie wyświetlały się treści dotyczące tego dostawcy. |
Korzystając z czatu wieloetapowego, możesz wprowadzać zmiany w obrazach wygenerowanych przez model Gemini lub w obrazach, które dostarczysz.
Utwórz instancję GenerativeModel
, uwzględnij responseModalities: ["TEXT", "IMAGE"]
startChat()
i sendMessage()
, aby wysyłać wiadomości do nowych użytkowników.
Swift
import FirebaseAI
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
modelName: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: GenerationConfig(responseModalities: [.text, .image])
)
// Initialize the chat
let chat = model.startChat()
guard let image = UIImage(named: "scones") else { fatalError("Image file not found.") }
// Provide an initial text prompt instructing the model to edit the image
let prompt = "Edit this image to make it look like a cartoon"
// To generate an initial response, send a user message with the image and text prompt
let response = try await chat.sendMessage(image, prompt)
// Inspect the generated image
guard let inlineDataPart = response.inlineDataParts.first else {
fatalError("No image data in response.")
}
guard let uiImage = UIImage(data: inlineDataPart.data) else {
fatalError("Failed to convert data to UIImage.")
}
// Follow up requests do not need to specify the image again
let followUpResponse = try await chat.sendMessage("But make it old-school line drawing style")
// Inspect the edited image after the follow up request
guard let followUpInlineDataPart = followUpResponse.inlineDataParts.first else {
fatalError("No image data in response.")
}
guard let followUpUIImage = UIImage(data: followUpInlineDataPart.data) else {
fatalError("Failed to convert data to UIImage.")
}
Kotlin
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
modelName = "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig = generationConfig {
responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) }
)
// Provide an image for the model to edit
val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.scones)
// Create the initial prompt instructing the model to edit the image
val prompt = content {
image(bitmap)
text("Edit this image to make it look like a cartoon")
}
// Initialize the chat
val chat = model.startChat()
// To generate an initial response, send a user message with the image and text prompt
var response = chat.sendMessage(prompt)
// Inspect the returned image
var generatedImageAsBitmap = response
.candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() }
// Follow up requests do not need to specify the image again
response = chat.sendMessage("But make it old-school line drawing style")
generatedImageAsBitmap = response
.candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() }
Java
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(
"gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
new GenerationConfig.Builder()
.setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))
.build()
);
GenerativeModelFutures model = GenerativeModelFutures.from(ai);
// Provide an image for the model to edit
Bitmap bitmap = BitmapFactory.decodeResource(resources, R.drawable.scones);
// Initialize the chat
ChatFutures chat = model.startChat();
// Create the initial prompt instructing the model to edit the image
Content prompt = new Content.Builder()
.setRole("user")
.addImage(bitmap)
.addText("Edit this image to make it look like a cartoon")
.build();
// To generate an initial response, send a user message with the image and text prompt
ListenableFuture<GenerateContentResponse> response = chat.sendMessage(prompt);
// Extract the image from the initial response
ListenableFuture<@Nullable Bitmap> initialRequest = Futures.transform(response, result -> {
for (Part part : result.getCandidates().get(0).getContent().getParts()) {
if (part instanceof ImagePart) {
ImagePart imagePart = (ImagePart) part;
return imagePart.getImage();
}
}
return null;
}, executor);
// Follow up requests do not need to specify the image again
ListenableFuture<GenerateContentResponse> modelResponseFuture = Futures.transformAsync(
initialRequest,
generatedImage -> {
Content followUpPrompt = new Content.Builder()
.addText("But make it old-school line drawing style")
.build();
return chat.sendMessage(followUpPrompt);
},
executor);
// Add a final callback to check the reworked image
Futures.addCallback(modelResponseFuture, new FutureCallback<GenerateContentResponse>() {
@Override
public void onSuccess(GenerateContentResponse result) {
for (Part part : result.getCandidates().get(0).getContent().getParts()) {
if (part instanceof ImagePart) {
ImagePart imagePart = (ImagePart) part;
Bitmap generatedImageAsBitmap = imagePart.getImage();
break;
}
}
}
@Override
public void onFailure(Throwable t) {
t.printStackTrace();
}
}, executor);
Web
import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
// ...
};
// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, {
model: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: {
responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],
},
});
// Prepare an image for the model to edit
async function fileToGenerativePart(file) {
const base64EncodedDataPromise = new Promise((resolve) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result.split(',')[1]);
reader.readAsDataURL(file);
});
return {
inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
};
}
const fileInputEl = document.querySelector("input[type=file]");
const imagePart = await fileToGenerativePart(fileInputEl.files[0]);
// Provide an initial text prompt instructing the model to edit the image
const prompt = "Edit this image to make it look like a cartoon";
// Initialize the chat
const chat = model.startChat();
// To generate an initial response, send a user message with the image and text prompt
const result = await chat.sendMessage([prompt, imagePart]);
// Request and inspect the generated image
try {
const inlineDataParts = result.response.inlineDataParts();
if (inlineDataParts?.[0]) {
// Inspect the generated image
const image = inlineDataParts[0].inlineData;
console.log(image.mimeType, image.data);
}
} catch (err) {
console.error('Prompt or candidate was blocked:', err);
}
// Follow up requests do not need to specify the image again
const followUpResult = await chat.sendMessage("But make it old-school line drawing style");
// Request and inspect the returned image
try {
const followUpInlineDataParts = followUpResult.response.inlineDataParts();
if (followUpInlineDataParts?.[0]) {
// Inspect the generated image
const followUpImage = followUpInlineDataParts[0].inlineData;
console.log(followUpImage.mimeType, followUpImage.data);
}
} catch (err) {
console.error('Prompt or candidate was blocked:', err);
}
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
final model = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.0-flash-preview-image-generation',
// Configure the model to respond with text and images
generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]),
);
// Prepare an image for the model to edit
final image = await File('scones.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);
// Provide an initial text prompt instructing the model to edit the image
final prompt = TextPart("Edit this image to make it look like a cartoon");
// Initialize the chat
final chat = model.startChat();
// To generate an initial response, send a user message with the image and text prompt
final response = await chat.sendMessage([
Content.multi([prompt,imagePart])
]);
// Inspect the returned image
if (response.inlineDataParts.isNotEmpty) {
final imageBytes = response.inlineDataParts[0].bytes;
// Process the image
} else {
// Handle the case where no images were generated
print('Error: No images were generated.');
}
// Follow up requests do not need to specify the image again
final followUpResponse = await chat.sendMessage([
Content.text("But make it old-school line drawing style")
]);
// Inspect the returned image
if (followUpResponse.inlineDataParts.isNotEmpty) {
final followUpImageBytes = response.inlineDataParts[0].bytes;
// Process the image
} else {
// Handle the case where no images were generated
print('Error: No images were generated.');
}
Unity
using Firebase;
using Firebase.AI;
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a Gemini model that supports image output
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(
modelName: "gemini-2.0-flash-preview-image-generation",
// Configure the model to respond with text and images
generationConfig: new GenerationConfig(
responseModalities: new[] { ResponseModality.Text, ResponseModality.Image })
);
// Prepare an image for the model to edit
var imageFile = System.IO.File.ReadAllBytes(System.IO.Path.Combine(
UnityEngine.Application.streamingAssetsPath, "scones.jpg"));
var image = ModelContent.InlineData("image/jpeg", imageFile);
// Provide an initial text prompt instructing the model to edit the image
var prompt = ModelContent.Text("Edit this image to make it look like a cartoon.");
// Initialize the chat
var chat = model.StartChat();
// To generate an initial response, send a user message with the image and text prompt
var response = await chat.SendMessageAsync(new [] { prompt, image });
// Inspect the returned image
var imageParts = response.Candidates.First().Content.Parts
.OfType<ModelContent.InlineDataPart>()
.Where(part => part.MimeType == "image/png");
// Load the image into a Unity Texture2D object
UnityEngine.Texture2D texture2D = new(2, 2);
if (texture2D.LoadImage(imageParts.First().Data.ToArray())) {
// Do something with the image
}
// Follow up requests do not need to specify the image again
var followUpResponse = await chat.SendMessageAsync("But make it old-school line drawing style");
// Inspect the returned image
var followUpImageParts = followUpResponse.Candidates.First().Content.Parts
.OfType<ModelContent.InlineDataPart>()
.Where(part => part.MimeType == "image/png");
// Load the image into a Unity Texture2D object
UnityEngine.Texture2D followUpTexture2D = new(2, 2);
if (followUpTexture2D.LoadImage(followUpImageParts.First().Data.ToArray())) {
// Do something with the image
}
Obsługiwane funkcje, ograniczenia i sprawdzone metody
Obsługiwane tryby i możliwości
Poniżej znajdziesz obsługiwane tryby i możliwości modelu Gemini w przypadku obrazów wyjściowych. Każda funkcja zawiera przykładowy prompt i przykładowy kod.
Tekst na obraz (tylko tekst na obraz)
- Wygeneruj obraz wieży Eiffla z fajerwerkami w tle.
Tekst na obraz (renderowanie tekstu)
- Wygeneruj zdjęcie filmowe dużego budynku z tym gigantycznym tekstem wyświetlanym na jego fasadzie za pomocą mapowania projekcyjnego.
Tekst do obrazów i tekstu (przeplatany)
Wygeneruj ilustrowany przepis na paellę. Twórz obrazy obok tekstu podczas generowania przepisu.
Wygeneruj opowieść o psie w stylu animacji 3D. Wygeneruj obraz dla każdej sceny.
Obrazy i tekst na obrazy i tekst (przeplatane)
- [zdjęcie umeblowanego pokoju] + Jakie inne kolory sof pasowałyby do mojego pokoju? Czy możesz zaktualizować obraz?
Edytowanie obrazów (tekst i obraz na obraz)
[zdjęcie bułeczek] + Edytuj ten obraz, aby wyglądał jak kreskówka
[image of a cat] + [image of a pillow] + Utwórz haft krzyżykowy przedstawiający mojego kota na tej poduszce.
Wieloetapowa edycja obrazów (czat)
- [image of a blue car] + Zmień ten samochód w kabriolet., a potem Teraz zmień kolor na żółty.
Ograniczenia i sprawdzone metody
Poniżej znajdziesz ograniczenia i sprawdzone metody dotyczące obrazów generowanych przez model Gemini.
W tej publicznej wersji eksperymentalnej Gemini obsługuje:
- Generowanie obrazów PNG o maksymalnym wymiarze 1024 pikseli.
- generowanie i edytowanie obrazów przedstawiających ludzi;
- Używanie filtrów bezpieczeństwa, które zapewniają elastyczność i mniejsze ograniczenia dla użytkowników.
Aby uzyskać najlepsze wyniki, używaj tych języków:
en
,es-mx
,ja-jp
,zh-cn
,hi-in
.Generowanie obrazów nie obsługuje danych wejściowych audio ani wideo.
Generowanie obrazów nie zawsze może się uruchamiać. Oto niektóre znane problemy:
Model może generować tylko tekst.
Spróbuj wyraźnie poprosić o wygenerowanie obrazów (np. „wygeneruj obraz”, „przesyłaj obrazy na bieżąco”, „zaktualizuj obraz”).Model może przestać generować w trakcie procesu.
Spróbuj jeszcze raz lub użyj innego prompta.Model może wygenerować tekst jako obraz.
Spróbuj wyraźnie poprosić o wyniki tekstowe. Na przykład „generuj tekst narracyjny wraz z ilustracjami”.
Podczas generowania tekstu do obrazu Gemini działa najlepiej, jeśli najpierw wygenerujesz tekst, a potem poprosisz o obraz z tym tekstem.