Conversational AI-Powered Image Editing with Gemini 2.0 Flash

Image editing has traditionally required specialized software and technical expertise. With Google's Gemini 2.0 Flash, we can now edit images through natural language conversations. Let's explore how to build a modern image editing application that feels like chatting with a creative assistant.
For our implementation, we'll be using Node.js and TypeScript. With these two, we can quickly create type-safe and efficient image processing logic. Let's dive in! ๐
Setting Up the Project
Let's get our development environment ready! The setup process is straightforward and takes about 5 minutes.
First, install the required packages:
npm init -y
npm install @google/generative-ai dotenv
Before running the code, set up your API key:
- Visit Google AI Studio
- Create a new API key
- Add it to your environment:
export GEMINI_API_KEY=your-api-key-here
Or create a .env
file:
GEMINI_API_KEY=your-api-key-here
Complete Code
Here's the implementation:
const {
GoogleGenerativeAI,
HarmCategory,
HarmBlockThreshold,
} = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");
const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);
async function uploadToGemini(path, mimeType) {
const uploadResult = await fileManager.uploadFile(path, {
mimeType,
displayName: path,
});
const file = uploadResult.file;
console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
return file;
}
const model = genAI.getGenerativeModel({
model: "gemini-2.0-flash-exp-image-generation",
});
const generationConfig = {
temperature: 1,
topP: 0.95,
topK: 40,
maxOutputTokens: 8192,
responseMimeType: "text/plain",
};
async function editImage(imagePath, editInstructions) {
const file = await uploadToGemini(imagePath, "image/jpeg");
const chatSession = model.startChat({
generationConfig,
history: [
{
role: "user",
parts: [
{
fileData: {
mimeType: file.mimeType,
fileUri: file.uri,
},
},
{ text: editInstructions },
],
},
],
});
const result = await chatSession.sendMessage(editInstructions);
return result.response;
}
// Example usage
async function run() {
try {
const result = await editImage(
"croissant.jpg",
"Add some honey to the croissants"
);
console.log("Edited image generated successfully");
} catch (error) {
console.error("Error editing image:", error);
}
}
run();
Understanding Chat History
The magic of conversational image editing lies in how Gemini maintains context through the chat history. Let's break down how it works:
const chatSession = model.startChat({
generationConfig,
history: [
{
role: "user",
parts: [
{
fileData: {
mimeType: file.mimeType,
fileUri: file.uri,
},
},
{ text: editInstructions },
],
},
{
role: "model",
parts: [
{
fileData: {
mimeType: "image/jpeg",
fileUri: "generated_image_1.jpg",
},
},
],
},
{
role: "user",
parts: [{ text: "add some honey too" }],
},
{
role: "model",
parts: [
{
fileData: {
mimeType: "image/jpeg",
fileUri: "generated_image_2.jpg",
},
},
],
},
],
});
The history
array is crucial for maintaining context:
- Initial Request: The first entry combines the original image and user's instruction
- Model Response: Each model response includes the generated image
- Follow-up Edits: Subsequent user messages reference previous changes
- Contextual Understanding: The model remembers all previous edits and applies new changes while maintaining earlier modifications
This structure allows the model to:
- Remember previous edits (e.g., the chocolate drizzle)
- Apply new changes while preserving existing ones
- Understand relative changes ("make it warmer")
- Maintain visual consistency across multiple edits
For example, when you say "add some honey too", the model knows to:
- Keep the chocolate drizzle from the previous edit
- Add honey as a new element
- Maintain the overall composition and style
Real-World Applications
This latest integration to the Flash model enables various use cases:
- Product Photography - Quickly edit product shots for e-commerce
- Content Creation - Generate variations of marketing materials
- Design Iteration - Rapidly prototype visual ideas
- Photo Enhancement - Fix and improve personal photos
- Creative Exploration - Experiment with different styles and effects
Best Practices
To get the best results:
- Be Specific - Clear descriptions yield better outcomes
- Iterate Gradually - Make changes in small steps
- Maintain Context - Reference previous edits when needed
- Use Natural Language - Describe changes as you would to a person
- Consider Composition - Think about the overall image impact
Conclusion
Gemini 2.0 Flash is transforming how we think about image editing. Whether you're a professional designer or just someone who wants to make their photos look better, this technology makes it accessible and fun.
๐ซ DM Me for consulting inquiries and professional work.