[Solved] Status monitoring/hang at boot

miguelangel.nubla · Wed Dec 18, 2024 8:12 pm

Today, my Unimus server failed to start because the database server was down during its initialization.

I assumed Unimus would periodically retry connecting to the database until it became available, but that’s not how it behaves. Instead, it halts at the /#!boot page with the message: "Failed to start Unimus."

To monitor this issue, I attempted to set up basic endpoint monitoring based on the HTTP response code. Unfortunately, the response returns a 200 status even when the server is in a failed state. Adding to the challenge, the error message isn’t visible when JavaScript is disabled, so I can’t even search the page body for keywords like "failed".

It would make much more sense for the web server to return a 4xx or other non-successful status code in this scenario.
Ideally, a metrics endpoint with per-job monitoring to improve visibility and troubleshooting would be great.

Wed Dec 18, 2024 8:23 pm

Hi,

Here is how Unimus behaves during DB connection issues:

1) At application cold start, if the DB is unavailable, Unimus halts the start and fails - the internal Boot component fails. A message is logged into the log file, and the appropriate application start failure reason is displayed in the web interface. We designed it this way with the assumption that if there is a cold boot event, the administrator should check if the application is up or not after boot.

2) At runtime, if DB connection is lost, Unimus will try to reconnect periodically (off top of my head every 10 seconds). Log messages are logged constantly into the log file with connection attempt and its result. The web interface also reports on the retries. If the DB comes back, the application continues working.

As for monitoring, there is a "health" API endpoint you can hit to get the actual state of the application: https://wiki.unimus.net/display/UNPUB/F ... ealthcheck

Unless you get "OK" as the response inside the json payload, something is wrong.

miguelangel.nubla · Wed Jan 08, 2025 11:36 pm

Is there any reason behind the do not retry on cold start? It is really inconvenient in power loss events.