4.4 KiB
4.4 KiB
Watchdog Reboot Debug Report - 2026-03-26
Problem
Cerbo GX (einstein) triggered watchdog reboot on Mar 24 20:13:43 due to sustained high load average (11/9/7) exceeding thresholds (0/10/6).
Root Cause
Cumulative CPU pressure from 7 custom Python D-Bus services plus dbus-raymarine-publisher (multicast decoder) running simultaneously on a dual-core ARM Cortex-A7.
Services Running at Time of Reboot
- dbus-anchor-alarm - 1 Hz, 10 D-Bus reads/sec, circle fitting on 1800 points, JSON serializing 5000 track points every 5s
- dbus-generator-ramp - 2 Hz (500ms), multiple D-Bus reads + regression math
- dbus-tides - 1 Hz, SQLite writes + harmonic calculations
- dbus-meteoblue-forecast - periodic HTTP API calls
- dbus-no-foreign-land - periodic GPS uploads
- dbus-windy-station - periodic sensor uploads
- dbus-raymarine-publisher - continuous multicast protobuf decoding (12% CPU sustained)
Key Findings
- dbus-daemon: 13% CPU (bottleneck from ~20 services making synchronous GetValue() calls)
- dbus-raymarine-publisher: 12% CPU (multiple threads continuously parsing multicast packets)
- Total Python CPU: ~50% aggregate across all custom services
- Memory: OK (609MB available of 1GB)
- No crash loops: All services had 152K+ second uptimes
Optimizations Applied (v2.1.0)
1. dbus-generator-ramp
- Changed: Main loop from 500ms → 1000ms (2Hz → 1Hz)
- File:
dbus-generator-ramp/config.pyline 257 - Impact: 50% reduction in D-Bus polling and math operations
- Version: 2.0.0 → 2.1.0
2. dbus-anchor-alarm
- Changed: JSON update interval from 5s → 20s
- File:
dbus-anchor-alarm/anchor_alarm.pyline 78 - Impact: 75% reduction in large JSON serializations
- Changed: Track buffer from 5000 → 2000 points
- File:
dbus-anchor-alarm/track_buffer.pyline 16 - Impact: 60% less data to serialize and transmit over MQTT
- Version: 2.0.0 → 2.1.0
Load Average Results
Before optimizations:
14:40:30 load average: 1.04, 2.30, 2.71
14:44:10 load average: 3.91, 2.50, 2.65 (after anchor-alarm restarted)
14:52:37 load average: 1.29, 3.93, 3.77
14:55:10 load average: 7.04, 6.21, 4.76 (critical)
After optimizations (v2.1.0):
15:05:21 load average: 1.69, 3.95, 4.24
15:06:01 load average: 0.99, 3.48, 4.07
15:06:41 load average: 0.64, 3.08, 3.91
15:07:42 load average: 1.35, 2.87, 3.78 (trending down)
Status: 15-minute load declining from 4.76 → 3.78, should continue dropping below watchdog threshold (6.0) over next 15 minutes.
Remaining Concerns
High-Risk Service: dbus-raymarine-publisher (12% sustained CPU)
- Continuous multicast parsing with multiple threads
- Running at 1Hz D-Bus update but packet decoding is continuous
- Recommendation: Monitor this service closely; consider adding
--update-interval 2000(2Hz → 0.5Hz) if load remains elevated
System-Wide D-Bus Pressure
dbus-daemonat 13% CPU indicates bus saturation- 20+ services making synchronous calls
- Future optimization: Implement D-Bus signal subscriptions instead of polling where possible
Monitoring Commands
Check load every minute:
ssh cerbo "watch -n 60 uptime"
Monitor Python service CPU:
ssh cerbo "while true; do top -b -n 1 | grep python3 | head -n 10; sleep 30; done"
Check service health:
ssh cerbo "svstat /service/dbus-* 2>/dev/null | grep -v 'up.*seconds'"
Next Steps if Load Remains High
- Reduce raymarine publisher update rate to 2000ms
- Consider disabling debug logging on anchor-alarm (SQLite writes every 15s)
- Evaluate if all 7 services need to run continuously (some could be on-demand)
- Long-term: consolidate low-frequency services (meteoblue, windy, nfl) into a single process
Files Modified
dbus-generator-ramp/config.py(main_loop_interval_ms: 500 → 1000)dbus-generator-ramp/dbus-generator-ramp.py(VERSION: 2.0.0 → 2.1.0)dbus-generator-ramp/build-package.sh(VERSION: 1.0.0 → 2.1.0)dbus-anchor-alarm/config.py(VERSION: 2.0.0 → 2.1.0)dbus-anchor-alarm/anchor_alarm.py(_JSON_UPDATE_INTERVAL_SEC: 5.0 → 20.0)dbus-anchor-alarm/track_buffer.py(MAX_POINTS: 5000 → 2000)dbus-anchor-alarm/build-package.sh(VERSION: 2.0.0 → 2.1.0)
Deployed Packages
dbus-generator-ramp-2.1.0.tar.gz(installed and running)dbus-anchor-alarm-2.1.0.tar.gz(installed and running)