Installing and configuring the Watchdog

Instructions for installing and configuring your server security software - Watchdog

Watchdog Timer is a hardware-based system designed to monitor and prevent system hangs. It is commonly used to avoid prolonged crashes and freezes on dedicated servers and VPS. A watchdog is essentially a timer that the monitored system periodically resets. If the timer is not reset within a specified interval, it triggers a forced system reboot. In some cases, this can be a “soft” reboot via an OS signal, while in others it’s a hardware-level reset, such as by shorting the RST signal line or a similar mechanism.

Installation on Linux Ubuntu/Debian:

sudo apt-get install watchdog  

After installation, the following files and directories are added to the system:

  • /etc/init.d/watchdog
  • /etc/init.d/wd_keepalive
  • /etc/watchdog.conf
  • /etc/default/watchdog
  • /dev/watchdog
  • /usr/sbin/watchdog
  • /usr/sbin/wd_identify
  • /usr/sbin/wd_keepalive
  • /usr/share/doc/watchdog/
  • /usr/share/man/man5/watchdog.conf.5.gz
  • /usr/share/man/man8/watchdog.8.gz
  • /usr/share/man/man8/wd_identify.8.gz
  • /usr/share/man/man8/wd_keepalive.8.gz

Main configuration options in /etc/watchdog.conf:

interval =  

The interval between two write operations to the watchdog device. The default is 10 seconds. Intervals longer than one minute can only be used with the -f command-line option.

logtick =  

If logging is enabled, this parameter specifies how many intervals to skip between log entries. For example, with logtick = 60 and interval = 10, events will be logged no more than once every 10 minutes.

max-load-1 =  
max-load-5 =  
max-load-15 =  

The maximum allowed system load over 1, 5, and 15 minutes, respectively. If exceeded, the system will reboot. Setting 0 disables the check.

min-memory =  

Minimum amount of free virtual memory. Setting 0 disables the check.

max-temperature =  

Maximum allowed system temperature.

watchdog-device =  
temperature-device =  

The name of the watchdog device and the temperature sensor device.

file =  
change =  

File monitoring mode. change sets the interval for checking files.

pidfile =  

The PID file of a process to monitor. For example, pidfile = /var/run/apache2.pid. If the process is not running, the watchdog will trigger a reboot.

ping =  
interface =  

Ping-based network check. interface specifies which network interface to use.

test-binary =  
test-timeout =  
repair-binary =  

Parameters for running custom tests or repair programs. test-timeout sets the maximum duration of the test in seconds (0 means unlimited).

admin =  

Email address for notifications. Leave blank to disable notifications.

realtime =  
priority = 

Real-time mode settings. realtime = Yes prevents unloading the watchdog module from memory, and priority sets its execution priority.

Example setup with Intel TCO Watchdog Timer:

Load the module:

sudo modprobe iTCO_wdt

In /etc/watchdog.conf, uncomment or add the following lines:

watchdog-device = /dev/watchdog  
interval = 10  

In /etc/default/watchdog, specify the module name:

watchdog_module="iTCO_wdt"  

To enable debugging and detailed logging to syslog:

watchdog_options="-v"  

Restart the watchdog service:

sudo /etc/init.d/watchdog restart

Monitor logs in real time:

tail -f /var/log/syslog  
Need help?Our engineers will help you free of charge with any question in minutesContact us