Computer networking. Sometimes I think I understand it all, then I realize there was yet another thing I hadn’t learned yet. This was one of those times.
I was swapping out somebody’s router with a brand-new, shiny one – specifically, the Ubiquiti Edgerouter X.
I had a number of missteps while setting up this router, mostly because I wanted to be da smahtest and did it all manually (not using the setup wizards), because I figured I would learn the most that way. And I did learn the most that way, I just underestimated how many ways there are to misconfigure a router.
So, several hours into the process, I had it all setup. I was done. Except one last thing – I wanted to setup OpenVPN, so I could remote into the network later on.
Fortunately, there is a guide for that which makes it straightforward, though admittedly still time consuming.
I tested out the remote connection over a cellular hotspot, and I could ping all the IPs. Great, it worked. So I was packing up to leave.
I got a notification on my phone. Wifi has no internet access, it said.
This was strange, because wifi was certainly working before, and I tested it pretty thoroughly. But now, suddenly, it wasn’t working.
I tested the connection on my laptop. I could ping 184.108.40.206 (Google) consistently, so I did have an actual internet connection. As it turned out, the actual problem my phone was detecting was that DNS resolution wasn’t working.
For a while I thought this was specific to the guest network SSID, as it is on a different subnet and has firewalling in place. However, I eventually realized that the secure LAN SSID also had no DNS resolution. If I tested DNS a different way – either using nslookup, or setting static DNS servers for my laptop – then DNS resolution worked fine. So for some reason DNS lookups sent to the router (the DNS server for this network) weren’t working.
More specifically, a few domains resolved – google.com in particular – but most others timed out.
I had a problem like this earlier, but it was only with the guest SSID. This was due to the firewall being too restrictive. The Edgerouter has two components that make up each firewall ruleset:
Interface: The physical (or virtual) interface a packet comes in on
Direction: If the packet is coming in or leave out of that interface.
Direction actually has three options: in, out, and local. Local is for traffic headed to the actual router’s IP interface. For example, let’s say we have a router at 10.0.0.1. If a computer at 10.0.0.5 tried accessing the web interface on that router, that would fall under the local rules. However, if it tried pinging, say, 10.0.1.3 (some device on a different subnet), that would go in and out the interface, but not trigger the local rule, because the local rule only applies to traffic directed at the router itself.
The guest wireless network was setup so guests could access the internet and nothing else. To do this, traffic from that subnet to any of the other local subnets gets dropped by the firewall without exception, so only traffic with a destination address on the WAN side will get through. That particular rule was applied on the in direction for the guest wireless network interface on the router.
I also didn’t want guest network users to access the router’s web interface, so I also dropped all traffic from the guest subnet on the local direction.
However, as it turns out, a DNS request from a client on the guest network also goes directly to the router, which was acting as the DNS server – and so that rule blocks DNS requests. Similarly, DHCP requests are blocked. That isn’t going to work.
So I added two exceptions for that rule for DNS and DHCP. After that it worked as expected – the clients could get DNS and DHCP from the router and couldn’t touch anything else on the local network.
And when I say I figured this out, what I really mean is I followed this guide to a T.
So, later on when I was getting the same symptoms again, I assumed I had done something with the firewall config. But it all seemed fine.
I decided to see what this looked like on the router itself. I SSHed into it, and planned to use nslookup from the router to make sure things worked fine from that point on. Apparently the command to do this on the Edgerouter is host.
So I do this, and the DNS lookup on the router itself times out. Strange.
I try pinging 220.127.116.11 from the router. And… get no response.
So, to recap, I could ping 18.104.22.168 from my laptop going through the router, but the router itself couldn’t ping 22.214.171.124.
The problem, as it turns out, was related to my firewall config – though it took me a while to find it. As I mentioned before, there are special firewall rules that can target the router interface itself. The problem was with one of those rules.
If you follow the OpenVPN guide linked above, it states to add several rules to WAN_LOCAL so that OpenVPN can connect. Without these rules in place, the incoming OpenVPN connections would get blocked by the firewall and therefore not work. However, I didn’t see a firewall ruleset named WAN_LOCAL Maybe it didn’t generate these default rules because I didn’t follow the setup wizard, so it wouldn’t know which interface was WAN. That’s my guess, anyway. Regardless, I opted to create a new ruleset on the WAN interface for direction local. I added the rules in the guide.
The OpenVPN connection worked fine, so the firewall wasn’t blocking it. What I didn’t realize is that there is another default rule that should normally exist in WAN_LOCAL that I didn’t have.
The way I had configured the policy, it was supposed to drop any traffic that wasn’t OpenVPN that was headed to the router’s WAN interface directly. One small problem. That means that any other packet sent from the router out the WAN interface that expects a response – ping, DNS, and so on – would get rejected by the firewall on the return trip because I didn’t specifically allow it.
Most firewalls are stateful firewalls. For example, on a home router, the way it works is that the router performs NAT on your packets as they go out, and it keeps track of what you connected to so that when you get a return packet, it knows it is in response to an internal request and allows it in. If it doesn’t recognize it as a response from a packet that came from the inside, it gets blocked by the firewall.
However, I, in my big brain play, didn’t have any firewall rules that did that. The Edgerouter gives you a lot of fine-grained control compared to a home router, which includes the freedom to turn of that feature. Or, in my case, to not accept any defaults that would include it.
The fix for this problem, then, is to add a firewall rule that allows packets that are part of established connections to reenter the network. Once I added this firewall rule, the router once again could ping and get DNS responses, and so did everything else using the router for DNS.
Bam, I fixed the internet again.
Overall, I learned a lot by setting up the Edgerouter manually. It made me realize a lot of what I assumed about networking is based on how home routers work by default, but ultimately at the end of the day network has a bunch of primitive features that are all thrown together and happen to look cohesive from the outside, but when you have individual control of those features then you can break stuff.
5/5 would break the internet again.