Watchdog timer * - A hardware-implemented system hang-up control scheme. It is a timer that is periodically reset by the monitored system. If it is not reset within a certain period of time, the system is forced to reboot. In some cases the watchdog can send a signal to the system to reboot ("soft" reboot), in other cases the reboot is done by hardware (by closing the signal wire RST or similar).

    Installation in Linux Ubuntu/Debian:

    sudo apt-get install watchdog  
    

    A list of some of the files that will be installed on the system:

    • /etc/init.d/watchdog
    • /etc/init.d/wd_keepalive
    • /etc/watchdog.conf
    • /etc/default/watchdog
    • /dev/watchdog
    • /usr/sbin/watchdog
    • /usr/sbin/wd_identify
    • /usr/sbin/wd_keepalive
    • /usr/share/doc/watchdog/
    • /usr/share/man/man5/watchdog.conf.5.gz
    • /usr/share/man/man8/watchdog.8.gz
    • /usr/share/man/man8/wd_identify.8.gz
    • /usr/share/man/man8/wd_keepalive.8.gz

    Possible config parameters for /etc/watchdog.conf:

    interval =  
    

    Interval between two write operations to the watchdog. The default value is 10 seconds. An interval longer than a minute can only be used with the -f parameter from the command line.

    logtick =  
    

    If you write logs, you can skip recording events every specified number of intervals. For example, if logtick = 60 and interval 10, you get 600 seconds, so there will be no more than one entry in the log file every 10 minutes.

    max-load-1 =  
    

    The maximum allowed value of system load for 1 minute, above which the system will restart. 0 - disables the check.

    max-load-5 =  
    

    The maximum allowed value of system load for 5 minutes, above which the system will restart. 0 - disables the check.

    max-load-15 =  
    

    The maximum allowed value of system load for 15 minutes, above which the system will restart. 0 - disables the check.

    min-memory =  
    

    Set the minimum amount of virtual memory which must be free. 0 - check disabled.

    max-temperature =  
    

    Set the maximum temperature allowed.

    watchdog-device =  
    

    Setting the device name.

    temperature-device =  
    

    Setting the temperature device name.

    file =  
    

    File mode, file check.

    change =  
    

    Time interval for file mode.

    pidfile =  
    

    The name of the pid file. You can add a monitored process, for example "pidfile = /var/run/apache2.pid". If the process cannot be started, watchdog will constantly reboot the system.

    ping =  
    

    Ping mode, to check network connections. The option can be used more than once.

    interface =  
    

    Set the name of the network interface.

    test-binary =  
    

    Running a user test.

    test-timeout =  
    

    The test can run the specified number of seconds. 0 - unlimited.

    repair-binary =  
    

    Executed when the system cannot be rebooted.

    admin =  
    

    Email address for notifications, you can leave the value blank to disable.

    realtime =  
    

    Yes to make it impossible to unload watchdog from RAM.

    priority =  
    

    Set the priority for realtime mode.

    Example setting with Intel TCO Watchog Timer.
    Module load:

    sudo modprobe iTCO_wdt  
    

    In /etc/watchdog.conf it must be edited/added:

    watchdog-device = /dev/watchdog  
    interval = 10  
    

    In /etc/default/watchdog specify the module name:

    watchdog_module="iTCO_wdt"  
    

    You can add a debug option so that debugging information is written to the syslog:

    watchdog_options="-v"  
    

    Restarting watchdog:

    sudo /etc/init.d/watchdog restart  
    

    You can monitor real-time syslog entries with a command:

    tail -f /var/log/syslog