Network down! Help!
I have an issue with one of my Raspberry Pis. It will not survive if Wi-Fi network is down for an extended period of time.
I have not been able to figure out the root cause, or how long the network needs to be unreachable before the Raspberry Pi just drops the network connection and never connects again. This happens occasionally since the Pi is located just a little too far away from my Wi-Fi access point. Due to real estate fire prevention regulation reasons I do not have a cable route available so wireless is the way to go.
Now this wouldn’t be such a big issue, but the Raspberry Pi is a headless one (i.e. it does not have a keyboard or monitor), and is only connected via Wi-Fi network. I have some sensors connected to this Pi, which in turn update a web page. Losing network connectivity causes web page data to be out of date and is quite inconvenient.
Today after work I noticed that the Pi had lost network connectivity (again) at 04:40. I had to make my way to the garage (again) and reboot the device (which has no keyboard or network connectivity) – and my frustration level rose to “I’ll fix this now!” level.
Right. After a few half-assed attempts at ping + reboot scripts I figured there must be a more elegant solution, and I was sure I’m not the first one with this problem.
Enter Watchdog.
Searching the Internet I found a couple of examples, none of which was immediately copy/paste ready for my needs.
They did lead me to right track and after reading some manpages, forums and gist.github.com code snippets, getting my Pi to a constant reboot loop etc. I finally came up with what seems to be a working solution.
The solution
First things first: I used a Raspberry Pi 2 with a USB Wi-Fi module and Raspbian Wheezy (4.1.19-v7+ #858 SMP Tue Mar 15 15:56:00 GMT 2016 armv7l GNU/Linux).
Start by installing watchdog:
sudo apt-get install watchdog
Make a backup copy of /etc/watchdog.conf file just in case:
sudo cp /etc/watchdog.conf /etc/watchdog.conf.backup
Edit the /etc/watchdog.conf file to contain the following. There is a short comment on each line about what they do:
$ sudo nano /etc/watchdog.conf # Watchdog ping: if unresponsive, reboot: interface = wlan0 # use interface wlan0 ping-count = 5 # ping 5 times ping = 192.168.1.1 # ping test destination IP address # Change default interval from 1 second to 20: interval = 20 # perform watchdog checks every 20 seconds
then reboot (e.g. sudo reboot).
The above will ping five (5) times for destination address 192.168.1.1 every 20 seconds. I’m not sure if the interface command is actually needed, but it did not do any harm so I left it there.
192.168.1.1 is my default gateway, and I really do want to test connectivity against this address instead of some random host in the Internet, since I don’t want my Pi to reboot in case the Internet connection is down. If you insist on pinging a host in the Internet (not recommended), good choice might be Google public DNS servers (8.8.8.8 and 8.8.4.4) or any other host of your choice.
I did use Google DNS server for testing purposes, since it was easier to cut the connection to the Internet but maintain local area network (LAN) connectivity for management purposes.
Watchdog writes log to syslog (/var/log/syslog), and when the ping test fails, this is what it will look like (note the target here is Google DNS 8.8.8.8, not my internal network default gw):
Oct 24 20:35:42 localhost watchdog[2640]: ping: 8.8.8.8 Oct 24 20:35:42 localhost watchdog[2640]: no response from ping (target: 8.8.8.8)
When there is no response to any of the five (5) ping echo requests, the Raspberry will reboot. I will forget this, so I inserted the following to /home/pi/.profile:
echo "Warning: If network is down, this system will reboot in 20 seconds. Comment out ping = 192.168.1.1 line from /etc/watchdog.conf to avoid reboots."
That will print a warning message plus info which file to configure if needed – every time I log in. Even I should be now able to remember where Watchdog is configured 🙂
Further reading
Below you can find a list of some resources which did help me with the solution:
- Watchdog man page: https://www.systutorials.com/docs/linux/man/8-watchdog/
- Watchdog.conf man page: https://www.systutorials.com/docs/linux/man/5-watchdog.conf/
- Good explanation of watchdog.conf ping variable: http://www.sat.dundee.ac.uk/psc/watchdog/watchdog-configure.html#Network_ping
- Reddit, a solution which almost worked: https://www.reddit.com/r/raspberry_pi/comments/4ih9xo/id_like_my_routeronastick_vpn_to_autorestart/d2y3yj4/?context=3
- Non-watchdog script to solve the same issue (not mine): https://gist.github.com/SandroMachado/87e591fc42f368636b251b566485ae46
- Another non-watchdog script (again, not mine): http://weworkweplay.com/play/rebooting-the-raspberry-pi-when-it-loses-wireless-connection-wifi/
Improvements
I have only had this setup running now for a couple of hours, so it’s unclear if it will really work over time. Hope so, I’ll update this article if needed.
What else I could do with Watchdog? Well, I could certainly improve this to check that my Python programs do not die – or at least react and restart them automatically when they do die. Or if the system bogs down and stays unresponsive for extended periods of time.
What do you think? If you have suggestions, improvements or indeed more experience than I do, please do leave a comment below.
Follow-up
Update 30.3.2018: Reminded by Will in the comments (thanks!), I noticed I have not followed up on my promise to update this article.
I have not had any problems with performance or reboot loops etc. with the script. It just works as intended. The 20 second interval is a bit tight, but it’s entirely doable to stop the script in that time if needed.
How would I improve this? Well, I’d create a separate reboot log instead of relying on syslog, but it’s really not necessary. It would help for statistics collection over a longer time period but like said, not necessary.