I used Home Assistant’s built-in command_line integration and dylandoamaral’s uptime-card to setup the monitoring graphs below, for the rclone OneDrive bidirectional sync jobs that run once every 15 minutes.

Although the rclone monitoring is the focus of this note, I also show the simpler curl return code probing (for web services) and systemctl examples.

The Linux machine in question runs rclone to sync subsets of my OneDrive, syncthing to sync some of those (my notes database to be specific) with my iPhone, and the amazing docker-nginx-webdav-nononsense for the synchronization of my Zotero attachment database.

rclone sync scripts

The script below is invoked by cron every 15 minutes, for different subsets of my OneDrive.

The --resilient --recover --max-lock 2m --conflict-resolve newer switches were required (as warned about by the rclone bisync docs) to get the regular sync running reliably.

Another important detail is that I store an iso8601 formatted timestamp and the rclone return code in a .status file

#!/bin/bash
 
# first argument must be relative path e.g. notes/pkb4000, which we append to /home/cpbotha/sync/
rel_path=$1
abs_path="/home/cpbotha/sync/$rel_path/"
# get last part of path
# ARGH: make sure that basename is also on cron's path, e.g. with PATH=... at top of crontab
last_part=$(basename $abs_path)
 
# we default to OneDrive as Path1
# with --resync, Path1 will take pref over Path2 which is what we usually want, because we mostly receive updates
# resilient, recover, max-lock, conflict-resolve as per https://rclone.org/bisync/#max-lock for robust syncing setup
rclone bisync --resilient --recover --max-lock 2m --conflict-resolve newer --verbose OneDrive:/$rel_path/ $abs_path >> /home/cpbotha/.local/logs/rclone-$last_part.log 2>&1
RET=$?
 
# append iso8601 timestamp and exit code to status file
# exit code: 0 OK, 1 TEMP FAIL, 2 FAIL
echo "$(date --iso-8601=seconds) $RET" >> /home/cpbotha/.local/logs/rclone-$last_part.status

Next we have the Python script that is used by Home Assistant to read the status file. The reason for the timestamp is to detect the case when the last status message was OK, but it was logged more than 75 minutes (4500 seconds) ago meaning that the service probably died.

#!/bin/python3
 
# it started with this oneliner:
# import sys,datetime; l=open("rclone-configs.status").readlines()[-1].strip().split(); ts=datetime.datetime.fromisoformat(l[0]).timestamp(); print("STOPPED" if (datetime.datetime.now().timestamp()-ts) > 4500 else {0:"OK",1:"TEMP FAIL",2:"FAIL"}[int(l[1])])
 
import datetime
import sys
from pathlib import Path
 
status = Path(f"/home/cpbotha/.local/logs/rclone-{Path(sys.argv[1]).stem}.status")
 
status_line = status.open().readlines()[-1].strip().split()
ts = datetime.datetime.fromisoformat(status_line[0]).timestamp()
print(
    "STOPPED"
    if (datetime.datetime.now().timestamp() - ts) > 4500
    else {0: "OK", 1: "TEMP FAIL", 2: "FAIL"}[int(status_line[1])]
)

Home Assistant configuration

I’ve set it up so that the Home Assistant docker container can ssh to the host via key-based authentication.

For the rclone services, it simply invokes the Python script above which returns OK, TEMP FAIL, FAIL or STOPPED.

For the webdav service, it uses a neat little curl trick to get the HTTP response code, which should be 401 permission denied. You can use this for any other response code that you expect.

# after activating the command_line for the first time, you have to actually restart home assistant
command_line:
  - binary_sensor:
      name: syncthing
      command: 'ssh cpbotha@host.docker.internal systemctl is-active syncthing@cpbotha.service'
      # payload is the string the command returns
      payload_on: "active"
      payload_off: "inactive"
  - sensor:
      name: rclone-pkb4000
      command: 'ssh cpbotha@host.docker.internal /home/cpbotha/sync/configs/rclone_cpbotha_to_ha_sensor.py pkb4000'
      scan_interval: 300
  - sensor:
      name: rclone-configs
      command: 'ssh cpbotha@host.docker.internal /home/cpbotha/sync/configs/rclone_cpbotha_to_ha_sensor.py configs'
      scan_interval: 300
  - binary_sensor:
      name: dav.vxlabs.com
      command: "curl -Li https://dav.vxlabs.com/ -o /dev/null -w '%{http_code}\n' -s"
      device_class: connectivity
      payload_on: "401"