*Watchdog timer ** - A hardware-implemented system hang-up control scheme. It is a timer that is periodically reset by the monitored system. If it is not reset within a certain period of time, the system is forced to reboot. In some cases the watchdog can send a signal to the system to reboot ("soft" reboot), in other cases the reboot is done by hardware (by closing the signal wire RST or similar).

Installation in Linux Ubuntu/Debian:

sudo apt-get install watchdog  

A list of some of the files that will be installed on the system:

  • /etc/init.d/watchdog
  • /etc/init.d/wd_keepalive
  • /etc/watchdog.conf
  • /etc/default/watchdog
  • /dev/watchdog
  • /usr/sbin/watchdog
  • /usr/sbin/wd_identify
  • /usr/sbin/wd_keepalive
  • /usr/share/doc/watchdog/
  • /usr/share/man/man5/watchdog.conf.5.gz
  • /usr/share/man/man8/watchdog.8.gz
  • /usr/share/man/man8/wd_identify.8.gz
  • /usr/share/man/man8/wd_keepalive.8.gz

Possible config parameters for /etc/watchdog.conf:

interval =  

Interval between two write operations to the watchdog. The default value is 10 seconds. An interval longer than a minute can only be used with the -f parameter from the command line.

logtick =  

If you write logs, you can skip recording events every specified number of intervals. For example, if logtick = 60 and interval 10, you get 600 seconds, so there will be no more than one entry in the log file every 10 minutes.

max-load-1 =  

The maximum allowed value of system load for 1 minute, above which the system will restart. 0 - disables the check.

max-load-5 =  

The maximum allowed value of system load for 5 minutes, above which the system will restart. 0 - disables the check.

max-load-15 =  

The maximum allowed value of system load for 15 minutes, above which the system will restart. 0 - disables the check.

min-memory =  

Set the minimum amount of virtual memory which must be free. 0 - check disabled.

max-temperature =  

Set the maximum temperature allowed.

watchdog-device =  

Setting the device name.

temperature-device =  

Setting the temperature device name.

file =  

File mode, file check.

change =  

Time interval for file mode.

pidfile =  

The name of the pid file. You can add a monitored process, for example "pidfile = /var/run/apache2.pid". If the process cannot be started, watchdog will constantly reboot the system.

ping =  

Ping mode, to check network connections. The option can be used more than once.

interface =  

Set the name of the network interface.

test-binary =  

Running a user test.

test-timeout =  

The test can run the specified number of seconds. 0 - unlimited.

repair-binary =  

Executed when the system cannot be rebooted.

admin =  

Email address for notifications, you can leave the value blank to disable.

realtime =  

Yes to make it impossible to unload watchdog from RAM.

priority =  

Set the priority for realtime mode.

Example setting with Intel TCO Watchog Timer.
Module load:

sudo modprobe iTCO_wdt  

In /etc/watchdog.conf it must be edited/added:

watchdog-device = /dev/watchdog  
interval = 10  

In /etc/default/watchdog specify the module name:

watchdog_module="iTCO_wdt"  

You can add a debug option so that debugging information is written to the syslog:

watchdog_options="-v"  

Restarting watchdog:

sudo /etc/init.d/watchdog restart  

You can monitor real-time syslog entries with a command:

tail -f /var/log/syslog  
Updated July 30, 2018