I got my hub repaired and returned to service last night, after being down for almost 3 days. And, what I learned is IMPORTANT evidence to the nature of this problem. I have purchased two HP23, and both have failed after experiencing the same problem. With the first hub, I manually added every one of the approximately 200 devices, which included crawling around under the house, in the attic, the outdoor shed, and so forth. I was very happy after the first hub was established and coded, and it was running smoothly, with great performance, and no indication of problems. Then one morning, after a reboot for an automatically installed firmware update (it’s a bad idea to allow auto updates), it failed, and went into an unending reboot cycle. I tried factory reset with restore of a cloud backup, factory reset with a local backup, and dozens of other options, but nothing worked. I could factory reset the hub and do a setup but any attempt to use a backup caused the restart loop to continue. I worked with technical support, but they never reached a solution and my patience was limited after a week of trying to recover.
This experience led me to believe that my user data had somehow damaged the firmware/storage environment. With no other options for recovery, I purchased a second HP23, and to avoid potentially damaging it with my suspect user data, I manually reinstalled every single device, replaced old devices with new ones, and once again completed a lengthy setup and coding process. The result was excellent, with no indications of a potential issue. I had faithfully created frequent PC backups during the setup of this second HP23, but didn’t expect the issue to return. Then last Friday morning, after doing a routine local backup to my PC, the restart loop started again.
I tried MANY different recovery techniques that I had learned during the previous experience, and this second HP23 behaved EXACTLY as the first HP23. Yes, I could factory reset it, but no cloud or local backup could be restored; not even a very old backup. I was extremely disappointed after spending hundreds of hours setting up my smart home, and failing twice.
However, I formulated some theories on what had happened and how to test those theories. Those theories included the idea that (a) the CM4 module where the processor and memory live were defective, OR (b) the factory reset process is NOT actually returning the storage to an original state and/or corrupting the file system. Both of these theories could be tested by replacing the CM4 module with a new module and restoring a local backup that was taken AFTER the failure. That was my solution, and that brought my system online. Everything is running smoothly, no evidence of a user data problem.
Now, I am happy to be back online, but I don’t feel safe about whether or not this issue will return. If it was a faulty CM4 module, then there are hubs that need to have the CM4 replaced before failure, and we need to know which hubs will fail based on the modules that were used. If it is a bug in the firmware or firmware reset code, then it needs to be found quickly. I lean towards this issue being a firmware/reset coding issue. I hope the development team will do something that the end-users cannot do, investigate the logs, the files, and the partitions on a failed CM4 module and diagnose the root cause of the problem. I believe a software engineer would likely see the issue in plain sight within a few minutes of research.
I have offered my two failed CM4 modules to the development team, and hopefully they will accept my offer and research the problem. I believe they can easily plug the module into an IO board or another homey pro to study the issue. I still believe HP23 is the BEST smarthome option by far compared to other sophisticated smart home platforms, but this failure mode has the potential to impact developer acceptance.