[APP][Pro] Gemini AI - Control your Smart Home with Google's Gemini

Hi @DimitriEbben, @J273 !

are you referring to the “send a prompt with image” action card:

or to the action card “send a command to smart home” with a command like “check if the entrance cam frames a person”?

Hi yeah that’s right.

For me, I have an advanced flow which sends me a notification to my phone telling me who’s at my front door from my doorbell camera (send a prompt with image) The image doesn’t match the analyzed answer.

Hi @s_dimaio,

I’m referring to the “send a prompt with image” action card.

Dimitri

Tried the app. With my first question from Gemini via flow I tried to get a device listing, but got more than fifty notifications in a row. When I read them, the notifications formed a flash fiction story that made no sense. I must have done something wrong :thinking:

Hi @DimitriEbben, it’s certainly possible to modify the ‘send a prompt with image’ action card to return a second token with the image parsed by Gemini. However, I don’t think it’s strictly necessary. Wouldn’t a Flow like this already solve the problem? Or have I misunderstood?

Hi @J273, this is strange. Try using a more advanced model like the Gemini 3.1 Flash-Lite, although the 2.5 Flash-Lite should be more than adequate for analyzing images. When you’re done, send me a diagnostic report and I’ll try to take a look.

Hi @Jari_Peuhkurinen , to get a list of devices connected to Homey (or more generally for any smart home-related command), you must use the “send a command to smart home” action card:

Furthermore, managing your smart home is more complex than simply generating text or images, so (i) you must use a more sophisticated model for it to respond properly (I recommend Gemini 3.1 Flash-Lite, which offers the best value for money) and (ii) you must associate billing with your API Key for extended use of this feature.

Hi @s_dimaio,

The flow you created here is exactly how I set it up as well. The problem you run into is the following: the camera image (photo 1) with the prompt gets sent, then a few seconds later Gemini responds, after which a new snapshot is taken (photo 2 in the notification card), the response from Gemini is added, and everything is sent via the notification.

Because there is a time difference between photo 1 and photo 2—sometimes a few seconds—the analyzed image (photo1) doesn’t match the second photo (photo 2). Example: photo 1 shows a person a few seconds later, photo 2 is taken for the notification card, and by then the person is already out of the frame.

That’s why the image sent to Gemini should be retrievable as a tag, so that exactly that same image can be included in the notification together with Gemini’s response.

In my case, every time the cam’s action card “request a new camera snapshot” is called, it generates an image, and this same image is passed to gemini and the notification (so the image is generated once at the start of the flow, and this same image is taken within the flow as many times as needed). I checked https://tools.developer.homey.app/ in the images section.

Anyway, if you think it might be useful, I’ll work on it. I’ll publish the test version this week!

Bye!

@s_dimaio

That would be great. I think this is caused by the ONVIF camera app taking a new snapshot. Maybe you could also look into the possibility of adding an “if” card (see my earlier response), so we can trigger other flows once Gemini has given a response, including the response tag and photo tag within it.

I’m going to test it when it’s ready and will also make a donation for your work. Thanks again!

Hi @s_dimaio

Yeah what @DimitriEbben describes here is what happens for me as well.

@DimitriEbben @J273
Here’s the promised test version with the image token. Verify that everything now works properly for your configuration. Let me know!

Hi @s_dimaio,

Thank you for the quick update.

I’ve been testing with the analyzed image tag, but there still seems to be a difference of a few seconds. This is visible on the photo i attached. I asked Gemini what time it sees in the image it receives. I then added its response with the analyzed image to the notification card, but there is still a 5-second difference.

For example:

09:44:13 on the image send to Gemini and 09:44:18 send in the notification.

I don’t know if this helps, but AI gives the following tips:

Create a temporary Homey image token in the trigger (or at the photo input).

const imageToken = await this.homey.images.createImage();
imageToken.setStream(() => bufferToStream(imgBuffer));

Pass it to the action card via the trigger:

trigger.trigger({ imageToken: imageToken });

Important: do not rely on args.droptoken.

The imageToken object remains available for the duration of the flow (~3 minutes).

In the action card, use the token:

const imageStream = await args.imageToken.getStream();
const imageBuffer = await GeminiClient.streamToBuffer(imageStream);

Optionally, unregister after 3 minutes or after use:

setTimeout(() => imageToken.unregister(), 180000);

This keeps memory clean while still making the token temporarily reusable within the flow.

Result

  • The photo can be used directly in the flow.

  • Multiple cards can use the same token object.

  • The photo remains available for up to 3 minutes. (in this time the notification incl. image has already been send)

Hi, try the test version 2.9.1 I just released. This should fix the problem. Unfortunately, I don’t have a cam to test it on. Let me know if everything works properly now.

Hi @s_dimaio,

In version 2.9.1 it works perfectly! This is very powerful. Thanks so much for the adjustment!

What do you think about my other proposal regarding the IF card? I can also send you some examples showing why the IF card with Gemini providing both an answer and an analyzed image is so valuable.

Ciao Dimitri, I’m glad it works now. For the if card, give me some examples because I actually don’t understand how it’s supposed to work.

Ciao @s_dimaio,

Attached is an example, don’t pay attention to the linked tags; they are only meant as an example:
In situation 1 (the current working), you have to link separate THEN cards for each camera. This is a lot of work and also error-prone. In my case, for example, there are 16 cameras, so I have to create this for each one individually. This makes the flow extremely large.

In situation 2, I only send the photo with the prompt per camera. In one place in my flow, I receive the trigger that a response has been received. Based on that response—with the response and analyzed image—I only need to link the THEN cards once. This keeps the flow small and well-organized. What do you think about this?

Situation 1 for every camera seperate THEN cards:

Situation 2 with IF card:

Gemini has a response –> Logic card –> THEN cards (saves a lot of THEN cards)

I suggest you use a Gemini prompt like “Is there a person in this image. answer only with true or false” and after a logic “then” card like this (convert [string/number] in tag yes/no):
image

returning to your suggestion, so you’re thinking about a trigger card, right (screenshot in italian)?

But if I understand correctly, this way your flow is much simpler, but you wouldn’t have the possibility of knowing which of the 16 cameras recorded a person. Am I wrong?

That’s correct — it should be the trigger card like in your screenshot (with the response en analyzed image tags.)

Normally, my prompt looks a bit different, and I also include the name of the camera in the prompt so it appears in Gemini’s response along with the analyzed image. Based on that, I know which camera has detected a person.The prompt in my screenshots was just simplified to keep things clear.

The idea of ​​an async trigger makes sense. I’m working on it this weekend. Have a good evening!