Loss of Zigbee connectivity after some time, reboot of SHS restores connectivity

TLDR: zigbee network becomes unresponsive at times and a SHS reboot fixes it, Zigbee is on Ch21 and runs from a Homey Bridge

Hi, I have recently switched from Smartthings to Homey self hosted server with a Homey Bridge, I have 20 Zigbee devices and usually it seems mostly stable on Zigbee channel 21, I tried changing the channel several times in hope it would switch to a “better” channel like 24 but after several attempts 21 is the highest number I could manage. My Hue lights are on channel 25 already. It appears my Zigbee devices just stop responding (yes my 2.4 Ghz Wifi neighbours are busy on most channels already but I did a survey and the upper channels were “not as crowded”). This morning seeing the lights not turning on I looked at the Developer Tools Zigbee dashboard to realize it “lost communication” last night (aka the Last Seen column for Zigbee devices was many hours ago), so instead of rebooting the Homey Bridge like I did previous times I did just reboot the SHS and Zigbee communications resumed and now everything is back to normal. Will I have to start rebooting the SHS daily in hopes of “fixing” this unreliable communications? This seems rather odd. My SHS is running with 2 cores and 8GB of RAM assigned on a QNAP TS-673A with 32GB of RAM and only another Pihole container. The NAS server is not a busy server, CPU and RAM averages are pretty low. The SHS dashboard doesn’t report any issues, any ideas of what could be the issues and how to debug it? Check the image for a list of all 20 devices from the Zigbee panel. SHS version is 13.1.5

In your Developer Tools screenshot, I only see the End Devices were last seen many hours ago. This is completely normal for Zigbee, since those battery-powered devices enter a power saving mode which means they disconnect from the Zigbee network.

Have you followed the Zigbee best practices for Homey Bridge?

https://support.homey.app/hc/en-us/articles/4414431713682-Creating-a-stable-Zigbee-network-with-Homey-Bridge

In short: pair routers (mains powered devices) first, then pair end devices after pairing all the routers

My screenshot is after I rebooted the SHS and I know the leak sensors don’t report very often, same for the Symfonisk Ikea Sound controler which only communicates when a button is pressed, my problem was ALL wall powered devices were also last seen many hours back, the whole Zigbee network was without communications which a SHS reboot fixed. This is the second time I build the whole Zigbee network following the Zigbee best practices outlined presenting the devices plugged to the wall socket first and also in the location where they will be permanently, then the battery powered devices like motion and leak sensors last so it meshes properly and not overwhelm the Homey Bridge with direct connections. When I first transitioned to SHS from ST my initial build was not optimal and I still see some TX errors with devices at the very extremity of the apartment even now, but I had more TX errors before because of noisy Wifi neighbours using 2.4 Ghz (most are on or around ch1 and ch6 on wifi with many on 11 but with a weaker signal) From what I understand of Zigbee vs Wifi frequencies my ZB ch21 is around ch9 of Wifi. See attached pic for Wifi survey here.

My Zigbee channels to wifi quick reference (from Metageek)

That’s not a lot of WiFi interference really (at least not compared to my area)

Do you have an ESP32C6? With this Arduino sketch, you can check nearby Zigbee networks/interference:

How would not having done this explain that the Zigbee network comes back to life after an SHS reboot? It sounds more like an issue with SHS losing connection to the Bridge.

Is the SHS logging this (loss of connection), where can I go about checking logs to see if that is the case?

No, logs are not accessible to end users. Only Athom can view logs.

Do you have a Mesh WiFi system like TP-Link Deco?

Yes actually I have an Asus ZenWiFi Pro ET12 with a secondary AP meshed.

On SHS? Surely everything is accessible to end users there?

In that case, the Bridge might hop between APs, causing a short disconnect. Does the ASUS system have a feature to “lock” the device to a specific node like Deco?

My bad, I didn’t notice this post was about SHS

Yes the ASUS ZenWIFI can actually lock a device to a specific AP, I will do that right away and see if it gets better.

Any suggestions on where to look for the log files on the SHS container that could help get more information?

For instance
https://support.homey.app/hc/en-us/articles/24010537261980-How-to-install-Homey-Self-Hosted-Server-with-Docker-on-Linux

Or

Strange, the container doesn’t run systemd so no journalctl there, my AI responses are different.

I’m running SHS in a container on a QNAP NAS, so yeah I looked here already and besides pointing to this forum at the end there are no troubleshooting steps mentioned for a QNAP Container run SHS here https://support.homey.app/hc/en-us/articles/24010241545756-How-to-install-Homey-Self-Hosted-Server-on-QNAP

I also did a search on the online help for “self hosted log” and looked through all possible responses, none mention where to read or access logs for the SHS unfortunately.

Possibly in user/log/, and/or in var/log/

Those are the two log directories on the actual hardware, I don’t run SHS so not sure if those also exist inside containers/VM’s.

The container is really stripped down even /usr/log doesn’t exist:

But there is a /homey/user/logs which I will start looking through