# Watchdog Reboot Debug Report - 2026-03-26 ## Problem Cerbo GX (einstein) triggered watchdog reboot on Mar 24 20:13:43 due to sustained high load average (11/9/7) exceeding thresholds (0/10/6). ## Root Cause Cumulative CPU pressure from **7 custom Python D-Bus services** plus **dbus-raymarine-publisher** (multicast decoder) running simultaneously on a dual-core ARM Cortex-A7. ### Services Running at Time of Reboot 1. **dbus-anchor-alarm** - 1 Hz, 10 D-Bus reads/sec, circle fitting on 1800 points, JSON serializing 5000 track points every 5s 2. **dbus-generator-ramp** - 2 Hz (500ms), multiple D-Bus reads + regression math 3. **dbus-tides** - 1 Hz, SQLite writes + harmonic calculations 4. **dbus-meteoblue-forecast** - periodic HTTP API calls 5. **dbus-no-foreign-land** - periodic GPS uploads 6. **dbus-windy-station** - periodic sensor uploads 7. **dbus-raymarine-publisher** - continuous multicast protobuf decoding (12% CPU sustained) ### Key Findings - **dbus-daemon**: 13% CPU (bottleneck from ~20 services making synchronous GetValue() calls) - **dbus-raymarine-publisher**: 12% CPU (multiple threads continuously parsing multicast packets) - **Total Python CPU**: ~50% aggregate across all custom services - **Memory**: OK (609MB available of 1GB) - **No crash loops**: All services had 152K+ second uptimes ## Optimizations Applied (v2.1.0) ### 1. dbus-generator-ramp - **Changed**: Main loop from 500ms → 1000ms (2Hz → 1Hz) - **File**: `dbus-generator-ramp/config.py` line 257 - **Impact**: 50% reduction in D-Bus polling and math operations - **Version**: 2.0.0 → 2.1.0 ### 2. dbus-anchor-alarm - **Changed**: JSON update interval from 5s → 20s - **File**: `dbus-anchor-alarm/anchor_alarm.py` line 78 - **Impact**: 75% reduction in large JSON serializations - **Changed**: Track buffer from 5000 → 2000 points - **File**: `dbus-anchor-alarm/track_buffer.py` line 16 - **Impact**: 60% less data to serialize and transmit over MQTT - **Version**: 2.0.0 → 2.1.0 ## Load Average Results **Before optimizations:** ``` 14:40:30 load average: 1.04, 2.30, 2.71 14:44:10 load average: 3.91, 2.50, 2.65 (after anchor-alarm restarted) 14:52:37 load average: 1.29, 3.93, 3.77 14:55:10 load average: 7.04, 6.21, 4.76 (critical) ``` **After optimizations (v2.1.0):** ``` 15:05:21 load average: 1.69, 3.95, 4.24 15:06:01 load average: 0.99, 3.48, 4.07 15:06:41 load average: 0.64, 3.08, 3.91 15:07:42 load average: 1.35, 2.87, 3.78 (trending down) ``` **Status**: 15-minute load declining from 4.76 → 3.78, should continue dropping below watchdog threshold (6.0) over next 15 minutes. ## Remaining Concerns ### High-Risk Service: dbus-raymarine-publisher (12% sustained CPU) - Continuous multicast parsing with multiple threads - Running at 1Hz D-Bus update but packet decoding is continuous - **Recommendation**: Monitor this service closely; consider adding `--update-interval 2000` (2Hz → 0.5Hz) if load remains elevated ### System-Wide D-Bus Pressure - `dbus-daemon` at 13% CPU indicates bus saturation - 20+ services making synchronous calls - **Future optimization**: Implement D-Bus signal subscriptions instead of polling where possible ## Monitoring Commands Check load every minute: ```bash ssh cerbo "watch -n 60 uptime" ``` Monitor Python service CPU: ```bash ssh cerbo "while true; do top -b -n 1 | grep python3 | head -n 10; sleep 30; done" ``` Check service health: ```bash ssh cerbo "svstat /service/dbus-* 2>/dev/null | grep -v 'up.*seconds'" ``` ## Next Steps if Load Remains High 1. Reduce raymarine publisher update rate to 2000ms 2. Consider disabling debug logging on anchor-alarm (SQLite writes every 15s) 3. Evaluate if all 7 services need to run continuously (some could be on-demand) 4. Long-term: consolidate low-frequency services (meteoblue, windy, nfl) into a single process ## Files Modified - `dbus-generator-ramp/config.py` (main_loop_interval_ms: 500 → 1000) - `dbus-generator-ramp/dbus-generator-ramp.py` (VERSION: 2.0.0 → 2.1.0) - `dbus-generator-ramp/build-package.sh` (VERSION: 1.0.0 → 2.1.0) - `dbus-anchor-alarm/config.py` (VERSION: 2.0.0 → 2.1.0) - `dbus-anchor-alarm/anchor_alarm.py` (_JSON_UPDATE_INTERVAL_SEC: 5.0 → 20.0) - `dbus-anchor-alarm/track_buffer.py` (MAX_POINTS: 5000 → 2000) - `dbus-anchor-alarm/build-package.sh` (VERSION: 2.0.0 → 2.1.0) ## Deployed Packages - `dbus-generator-ramp-2.1.0.tar.gz` (installed and running) - `dbus-anchor-alarm-2.1.0.tar.gz` (installed and running)