I used Home Assistant’s built-in command_line integration and dylandoamaral’s uptime-card to setup the monitoring graphs below, for the rclone OneDrive bidirectional sync jobs that run once every 15 minutes.
Although the rclone monitoring is the focus of this note, I also show the simpler curl return code probing (for web services) and systemctl examples.
The Linux machine in question runs rclone to sync subsets of my OneDrive, syncthing to sync some of those (my notes database to be specific) with my iPhone, and the amazing docker-nginx-webdav-nononsense for the synchronization of my Zotero attachment database.
rclone sync scripts
The script below is invoked by cron every 15 minutes, for different subsets of my OneDrive.
The --resilient --recover --max-lock 2m --conflict-resolve newer switches were required (as warned about by the rclone bisync docs) to get the regular sync running reliably.
Another important detail is that I store an iso8601 formatted timestamp and the rclone return code in a .status file
Next we have the Python script that is used by Home Assistant to read the status file. The reason for the timestamp is to detect the case when the last status message was OK, but it was logged more than 75 minutes (4500 seconds) ago meaning that the service probably died.
Home Assistant configuration
I’ve set it up so that the Home Assistant docker container can ssh to the host via key-based authentication.
For the rclone services, it simply invokes the Python script above which returns OK, TEMP FAIL, FAIL or STOPPED.
For the webdav service, it uses a neat little curl trick to get the HTTP response code, which should be 401 permission denied. You can use this for any other response code that you expect.