Installing and configuring the Watchdog

How to protect your server from hangs and freezes with a watchdog timer.

A watchdog timer is a hardware-backed failsafe that keeps your server from getting stuck indefinitely. The idea is simple: your system periodically sends a "heartbeat" signal to reset the timer. If that signal stops coming — because the system hung, crashed, or got stuck in a loop — the timer fires and triggers a reboot. Depending on the setup, that reboot can be a clean software-level restart or a hard reset at the hardware level (e.g. pulling the RST line).

On dedicated servers and VPS where you can't physically walk up to the machine, this kind of automatic recovery is genuinely valuable.

Installing on Ubuntu / Debian

sudo apt-get install watchdog

The package installs the following key files:

  • /etc/init.d/watchdog — service init script
  • /etc/watchdog.conf — main configuration file
  • /etc/default/watchdog — startup options
  • /dev/watchdog — the watchdog device
  • /usr/sbin/watchdog — the watchdog binary

Key parameters in /etc/watchdog.conf

Timing and logging:

  • interval — how often the watchdog writes to the device. Defaults to 10 seconds. Values over 60 seconds require the -f flag at startup.
  • logtick — controls how frequently events are written to the log. With logtick = 60 and interval = 10, events are logged at most once every 10 minutes.

System load:

  • max-load-1, max-load-5, max-load-15 — maximum acceptable system load averages over 1, 5, and 15 minutes. If any threshold is exceeded, watchdog triggers a reboot. Set to 0 to disable a check.

Memory and temperature:

  • min-memory — minimum acceptable free virtual memory. Set to 0 to disable.
  • max-temperature — maximum acceptable temperature before a reboot is triggered.
  • watchdog-device — path to the watchdog device (typically /dev/watchdog).
  • temperature-device — path to the temperature sensor device.

File and process monitoring:

  • file and change — monitor a file for changes. change sets the check interval.
  • pidfile — path to the PID file of a process you want to keep alive. Example: pidfile = /var/run/apache2.pid. If the process isn't running, watchdog will reboot the system.

Network:

  • ping and interface — check network connectivity by pinging a host. interface specifies which network interface to use.

Custom tests:

  • test-binary — path to a custom test script or program.
  • test-timeout — maximum execution time for the test in seconds (0 for no limit).
  • repair-binary — a program to run automatically when a problem is detected, before resorting to a reboot.

Notifications and priority:

  • admin — email address for event notifications. Leave blank to disable.
  • realtime = Yes — locks the watchdog module in memory so it can't be swapped out.
  • priority — real-time scheduling priority for the watchdog process.

Our products and services

Web HostingReliable hosting services for websites of any scale.
Order
VPSFlexible cloud infrastructure with full root access.
Order
Dedicated ServersBare metal servers for maximum performance.
Order

Example setup with Intel TCO Watchdog

Load the kernel module:

sudo modprobe iTCO_wdt

In /etc/watchdog.conf, uncomment or add:

watchdog-device = /dev/watchdog
interval = 10

In /etc/default/watchdog, specify the module name:

watchdog_module="iTCO_wdt"

To enable verbose logging to syslog for debugging:

watchdog_options="-v"

Restart the service:

sudo /etc/init.d/watchdog restart

Watch the logs in real time to confirm everything's working:

tail -f /var/log/syslog

Help

If you have any questions or need assistance, please contact us through the ticket system — we're always here to help!

Need help?Our engineers will help you free of charge with any question in minutesContact us