Watchdog
Introduction
The watchdog is a functionality that can help the system recover from various possible malfunctions that could cause the system to enter a state where it is not able to operate correctly. Anything from a hardware fault or program errors may cause the watchdog to trigger.
The watchdog is essentially just a timer that will cause the system to reset if it runs out. In order to prevent the watchdog from triggering, the system will periodically “kick” the watchdog in order to inform it that the system is still operating within established parameters.
Overview
WeOS allows users to perform adjustments to the watchdog timeout along with the kick interval. The watchdog timeout is not allowed to be set to a value equal or smaller than the kick interval.
In addition, the system also allows for disabling the watchdog. However, this is not something that is recommended. It is suggested that if adjustments to the watchdog is deemed necessary for any reason, tweak the timers instead.
Reset Information
Further, the watchdog in the system can also provide some useful information to the user. One of the more prominent of these is information on the last reset of the system. The information will be provided with a reset reason and cause. The following are the possible reset reasons and causes that the watchdog is able to provide:
Reset Cause | Reset Reason |
---|---|
0 | None |
1 | System OK |
2 | Failed subscription |
3 | Failed kick |
4 | Failed unsubscription |
5 | Failed to meet deadline |
6 | Forced reset |
8 | Descriptor leak |
9 | Memory leak |
10 | CPU overload |
Monitors
The watchdog also provides three different types of monitors, that are enabled by default as well. These monitors are configured to log information at provided intervals. The following are the three monitors supported by the watchdog:
-
loadavg: Monitoring the average load on the system.
-
meminfo: Monitoring the memory in order to detect memory leaks.
-
filenr: Monitoring file descriptors in order to detect file descriptor leaks.
Currently only these three default monitors are configurable for the watchdog.
Configuration
Watchdog Settings
The watchdog can be configured from the top-level configuration context in the CLI.
example:/#> configure example:/config/#> watchdog example:/config/watchdog/#>
Syntax
[no] enable
-
Enable or disable watchdog daemon.
Default: enabled.
- no
- Disable the watchdog.
[no] interval [SEC]
-
Set watchdog kick interval, seconds.
The time between kicks from watchdogd to the kernel wdt driver. Recommended to set to 1/3 of the kernel wdt timeout.
Min: 1 sec, Max: 128 sec, Default: 20 sec.
- no
- Reset to the default value: 20.
- SEC
- Time in seconds within the allowed range: 1-128.
[no] timeout [SEC]
-
Set kernel wdt timeout, seconds.
The timeout before the kernel watchdog timer elapses. This can be an external hardware watchdog, depending on the device. Recommended to set to 10 seconds or greater.
Min: 1 sec, Max: 128 sec, Default: 60 sec.
- no
- Reset to the default value: 60.
- SEC
- Time in seconds within the allowed range: 1-128.
[show] monitor [loadavg | meminfo | filenr]
-
Handle watchdog monitors.
Note
Enters a sub-configuration context.
- show
- Show the configured monitors.
loadavg
- Monitor for average CPU load.
meminfo
- Monitor for memory usage.
filenr
- Monitor for file descriptor count.
Monitor Settings
The monitors can be configured from the watchdog configuration context in the CLI. When entering the monitor context a existing monitor name must be provided.
example:/#> configure example:/config/#> watchdog example:/config/watchdog/#> monitor NAME example:/config/watchdog/monitor-NAME/#>
Syntax
[no] enable
-
Enable or disable the monitor.
Default: Enabled.
- no
- Disable the monitor.
[no] interval [SEC]
-
Set the monitor interval.
Default: 3600 seconds.
- no
- Reset to the default value..
- SEC
- Time value in seconds.
[no] description [STRING]
-
Free form description of this watchdog monitor.
- no
- Remove any defined description.
- STRING
- Free from string.
[no] logmark
-
Enable or disable logging to at the configured
interval
.Default: Enabled.
- no
- Disable logging.
[no] warning [PERCENTAGE]
-
Set the percentage when to trigger a warning.
If this threshold is exceeded, a warning will be logged at the next
interval
invocation.- no
- Disable the warning.
- PERCENTAGE
-
Percentage value in the range
1
-100
.Note
For a
loadavg
monitor, the maximum percentage value is dependent on the number of CPU cores in the product. As an example, two cores would produce a max value of 200%.
Status
The current status of the watchdog can be observed from the top level in the CLI:
example:/#> show watchdog Status : Enabled Timeout (sec) : 60 Kick interval : 20 Watchdog ID : 0 Boot counter : 821 Reset counter : 1 Reset date : 2019-06-18T09:05:12Z Reset reason : 6 - Forced reset PID : 1 Label : init
The boot counter will be incremented on all boots, both Power on Reset (cold boot) and after a watchdog reset. The reset counter is only incremented on watchdog reset. And set to 0 on a Power on Reset. A reboot from cli will also increment the reset counter since it uses the watchdog for power cycling. The reset reason will differ though. A reboot will be “6 - Forced reset” and real watchdog reset will show as “7 - Unknown failure”. See Reset Information for reset reason for watchdogd.