[Solved] Status monitoring/hang at boot
Posted: Wed Dec 18, 2024 8:12 pm
Today, my Unimus server failed to start because the database server was down during its initialization.
I assumed Unimus would periodically retry connecting to the database until it became available, but that’s not how it behaves. Instead, it halts at the /#!boot page with the message: "Failed to start Unimus."
To monitor this issue, I attempted to set up basic endpoint monitoring based on the HTTP response code. Unfortunately, the response returns a 200 status even when the server is in a failed state. Adding to the challenge, the error message isn’t visible when JavaScript is disabled, so I can’t even search the page body for keywords like "failed".
It would make much more sense for the web server to return a 4xx or other non-successful status code in this scenario.
Ideally, a metrics endpoint with per-job monitoring to improve visibility and troubleshooting would be great.
I assumed Unimus would periodically retry connecting to the database until it became available, but that’s not how it behaves. Instead, it halts at the /#!boot page with the message: "Failed to start Unimus."
To monitor this issue, I attempted to set up basic endpoint monitoring based on the HTTP response code. Unfortunately, the response returns a 200 status even when the server is in a failed state. Adding to the challenge, the error message isn’t visible when JavaScript is disabled, so I can’t even search the page body for keywords like "failed".
It would make much more sense for the web server to return a 4xx or other non-successful status code in this scenario.
Ideally, a metrics endpoint with per-job monitoring to improve visibility and troubleshooting would be great.