Hi @s_dimaio,
The flow you created here is exactly how I set it up as well. The problem you run into is the following: the camera image (photo 1) with the prompt gets sent, then a few seconds later Gemini responds, after which a new snapshot is taken (photo 2 in the notification card), the response from Gemini is added, and everything is sent via the notification.
Because there is a time difference between photo 1 and photo 2—sometimes a few seconds—the analyzed image (photo1) doesn’t match the second photo (photo 2). Example: photo 1 shows a person a few seconds later, photo 2 is taken for the notification card, and by then the person is already out of the frame.
That’s why the image sent to Gemini should be retrievable as a tag, so that exactly that same image can be included in the notification together with Gemini’s response.