Watchdog

Introduction

The watchdog is a functionality that can help the system recover from various possible malfunctions that could cause the system to enter a state where it is not able to operate correctly. Anything from a hardware fault or program errors may cause the watchdog to trigger.

The watchdog is essentially just a timer that will cause the system to reset if it runs out. In order to prevent the watchdog from triggering, the system will periodically “kick” the watchdog in order to inform it that the system is still operating within established parameters.

Overview

WeOS allows users to perform adjustments to the watchdog timeout along with the kick interval. The watchdog timeout is not allowed to be set to a value equal or smaller than the kick interval.

In addition, the system also allows for disabling the watchdog. However, this is not something that is recommended. It is suggested that if adjustments to the watchdog is deemed necessary for any reason, tweak the timers instead.

Reset Information

Further, the watchdog in the system can also provide some useful information to the user. One of the more prominent of these is information on the last reset of the system. The information will be provided with a reset reason and cause. The following are the possible reset reasons and causes that the watchdog is able to provide:

Reset Cause Reset Reason
0 None
1 System OK
2 Failed subscription
3 Failed kick
4 Failed unsubscription
5 Failed to meet deadline
6 Forced reset
8 Descriptor leak
9 Memory leak
10 CPU overload

Monitors

The watchdog also provides three different types of monitors, that are enabled by default as well. These monitors are configured to log information at provided intervals. The following are the three monitors supported by the watchdog:

  • loadavg: Monitoring the average load on the system.

  • meminfo: Monitoring the memory in order to detect memory leaks.

  • filenr: Monitoring file descriptors in order to detect file descriptor leaks.

Currently only these three default monitors are configurable for the watchdog.

Configuration

Watchdog Settings

The watchdog can be configured from the top-level configuration context in the CLI.

example:/#> configure
example:/config/#> watchdog
example:/config/watchdog/#>

Syntax

[no] enable

Enable or disable watchdog daemon.

Default: enabled.

no
Disable the watchdog.
[no] interval [SEC]

Set watchdog kick interval, seconds.

The time between kicks from watchdogd to the kernel wdt driver. Recommended to set to 1/3 of the kernel wdt timeout.

Min: 1 sec, Max: 128 sec, Default: 20 sec.

no
Reset to the default value: 20.
SEC
Time in seconds within the allowed range: 1-128.
[no] timeout [SEC]

Set kernel wdt timeout, seconds.

The timeout before the kernel watchdog timer elapses. This can be an external hardware watchdog, depending on the device. Recommended to set to 10 seconds or greater.

Min: 1 sec, Max: 128 sec, Default: 60 sec.

no
Reset to the default value: 60.
SEC
Time in seconds within the allowed range: 1-128.
[show] monitor [loadavg | meminfo | filenr]

Handle watchdog monitors.

Note

Enters a sub-configuration context.

show
Show the configured monitors.
loadavg
Monitor for average CPU load.
meminfo
Monitor for memory usage.
filenr
Monitor for file descriptor count.

Monitor Settings

The monitors can be configured from the watchdog configuration context in the CLI. When entering the monitor context a existing monitor name must be provided.

example:/#> configure
example:/config/#> watchdog
example:/config/watchdog/#> monitor NAME
example:/config/watchdog/monitor-NAME/#>

Syntax

[no] enable

Enable or disable the monitor.

Default: Enabled.

no
Disable the monitor.
[no] interval [SEC]

Set the monitor interval.

Default: 3600 seconds.

no
Reset to the default value..
SEC
Time value in seconds.
[no] description [STRING]

Free form description of this watchdog monitor.

no
Remove any defined description.
STRING
Free from string.
[no] logmark

Enable or disable logging to at the configured interval.

Default: Enabled.

no
Disable logging.
[no] warning [PERCENTAGE]

Set the percentage when to trigger a warning.

If this threshold is exceeded, a warning will be logged at the next interval invocation.

no
Disable the warning.
PERCENTAGE

Percentage value in the range 1 - 100.

Note

For a loadavg monitor, the maximum percentage value is dependent on the number of CPU cores in the product. As an example, two cores would produce a max value of 200%.

Status

The current status of the watchdog can be observed from the top level in the CLI:

example:/#> show watchdog
Status              : Enabled
Timeout (sec)       : 60
Kick interval       : 20
Reset counter       : 821
Reset date          : 2019-06-18T09:05:12Z
PID                 : 1
Watchdog ID         : 0
Label               : init
Reset cause         : 6
Reset reason        : Forced reset