Conversational AI-Powered Image Editing with Gemini 2.0 Flash

AIDevelopment
MahmudMahmud2025-03-14 ยท 5 min read

Image editing has traditionally required specialized software and technical expertise. With Google's Gemini 2.0 Flash, we can now edit images through natural language conversations. Let's explore how to build a modern image editing application that feels like chatting with a creative assistant.

For our implementation, we'll be using Node.js and TypeScript. With these two, we can quickly create type-safe and efficient image processing logic. Let's dive in! ๐Ÿš€

Setting Up the Project

Let's get our development environment ready! The setup process is straightforward and takes about 5 minutes.

First, install the required packages:

npm init -y
npm install @google/generative-ai dotenv

Before running the code, set up your API key:

  1. Visit Google AI Studio
  2. Create a new API key
  3. Add it to your environment:
export GEMINI_API_KEY=your-api-key-here

Or create a .env file:

GEMINI_API_KEY=your-api-key-here

Complete Code

Here's the implementation:

const {
  GoogleGenerativeAI,
  HarmCategory,
  HarmBlockThreshold,
} = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

const model = genAI.getGenerativeModel({
  model: "gemini-2.0-flash-exp-image-generation",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 40,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

async function editImage(imagePath, editInstructions) {
  const file = await uploadToGemini(imagePath, "image/jpeg");

  const chatSession = model.startChat({
    generationConfig,
    history: [
      {
        role: "user",
        parts: [
          {
            fileData: {
              mimeType: file.mimeType,
              fileUri: file.uri,
            },
          },
          { text: editInstructions },
        ],
      },
    ],
  });

  const result = await chatSession.sendMessage(editInstructions);
  return result.response;
}

// Example usage
async function run() {
  try {
    const result = await editImage(
      "croissant.jpg",
      "Add some honey to the croissants"
    );
    console.log("Edited image generated successfully");
  } catch (error) {
    console.error("Error editing image:", error);
  }
}

run();

Understanding Chat History

The magic of conversational image editing lies in how Gemini maintains context through the chat history. Let's break down how it works:

const chatSession = model.startChat({
  generationConfig,
  history: [
    {
      role: "user",
      parts: [
        {
          fileData: {
            mimeType: file.mimeType,
            fileUri: file.uri,
          },
        },
        { text: editInstructions },
      ],
    },
    {
      role: "model",
      parts: [
        {
          fileData: {
            mimeType: "image/jpeg",
            fileUri: "generated_image_1.jpg",
          },
        },
      ],
    },
    {
      role: "user",
      parts: [{ text: "add some honey too" }],
    },
    {
      role: "model",
      parts: [
        {
          fileData: {
            mimeType: "image/jpeg",
            fileUri: "generated_image_2.jpg",
          },
        },
      ],
    },
  ],
});

The history array is crucial for maintaining context:

  1. Initial Request: The first entry combines the original image and user's instruction
  2. Model Response: Each model response includes the generated image
  3. Follow-up Edits: Subsequent user messages reference previous changes
  4. Contextual Understanding: The model remembers all previous edits and applies new changes while maintaining earlier modifications

This structure allows the model to:

  • Remember previous edits (e.g., the chocolate drizzle)
  • Apply new changes while preserving existing ones
  • Understand relative changes ("make it warmer")
  • Maintain visual consistency across multiple edits

For example, when you say "add some honey too", the model knows to:

Chocolate drizzled croissants with honey

  1. Keep the chocolate drizzle from the previous edit
  2. Add honey as a new element
  3. Maintain the overall composition and style

Real-World Applications

This latest integration to the Flash model enables various use cases:

  • Product Photography - Quickly edit product shots for e-commerce
  • Content Creation - Generate variations of marketing materials
  • Design Iteration - Rapidly prototype visual ideas
  • Photo Enhancement - Fix and improve personal photos
  • Creative Exploration - Experiment with different styles and effects

Best Practices

To get the best results:

  1. Be Specific - Clear descriptions yield better outcomes
  2. Iterate Gradually - Make changes in small steps
  3. Maintain Context - Reference previous edits when needed
  4. Use Natural Language - Describe changes as you would to a person
  5. Consider Composition - Think about the overall image impact

Conclusion

Gemini 2.0 Flash is transforming how we think about image editing. Whether you're a professional designer or just someone who wants to make their photos look better, this technology makes it accessible and fun.

๐Ÿ“ซ DM Me for consulting inquiries and professional work.

#Gemini#Image Generation#Node.js#TypeScript