I have always had mental block when networking computers due to the lack of
knowledge of how computer networks operate in general. This includes
understanding of how the TCP/IP stack works as well as the services that operate
on this layer like DHCP, DNS, Firewalls etc. My first experience of setting up a
“proper” network arose when I was put in charge of setting up the LAN for a
local college Counter-Strike tournament including the dedicated server and the
10 clients connecting to it and it was not a pleasant experience. With an off
shelf el-cheepo router, some rather loosely crimpped CAT-5 cables and 11
computers that needed to be talking to each other all the while maintaining a
low latency. But that was 12 years ago.
My curiosity of trying to understand networking only grew with the passage of
time. Fast-forward to the present day, I have begun to understand things a
little bit better thanks to people interacting with people like Tod, Cherry
(cherry@) and Trouble (philip@FreeBSD.org).
Armed with new knowledge and better equipments like the Unifi[1] enterprise WiFi
systems, things got easier and more interesting. I occasionally volunteer for
HasGeek[2], where I get to play with these enterprise WiFi systems when setting
it up for their conferences. Unlike the Counter-Strike LAN party, this was a
proper network where 100s of conference attendees connect their devices to
HasGeek’s network to access the internet. In addition to this, HasGeek does live
stream of their talks on the Internet. So maintaining quaility of connection and
uptimes are important. Up until RootConf 2017, I was mostly shadowing the
experienced people to see how the network was being setup.
And with The Fifth Elephant 2017, I was handed over the responsibility of
setting up the conference network. What better way to learn networking than
actually doing it.
List of equipment
- UniFi USG Router x 1
- UniFi Switch 8 150W x 2
- UniFi Switch 60W x 1
- UniFi AP-AC-PR0 x 9
- UniFi AP-AC-LITE x 1
- UniFi AP-PRO x 2
- UniFi AP x 3
- UniFi AP Outdoor5 x 1
- UniFi AP Outdoor x 1
- EdgeRouter PoE x 2
- ASUS Eee PC X101H (UniFi Controller) x 1
The link to the various items mentioned above can be found in the references
section.
The Setup
-
The Network Operations Center (NOC) usually has the AC-LITE, UniFi Controller,
USG and the two Switch 8 150W. -
The upstream connections from the ISPs terminate at the USG Router. The USG
router load balances the connections. -
The two Switch 8 150W is were all the connections between various devices are
made. -
The EdgeRouter PoE (Switch Mode) and Switch 8 60W are used for extending
connectivity to areas which are far away from the NOC. -
Almost always every single AP is powered by the PoE facility, in some rare
exceptions we may have to use the PoE injector. -
The preference for deploying APs are as follows, AC-PRO > AP-PRO > AP >
Outdoor. -
AP / Outdoor is generally avoided since it only supports 2.4GHz radio. And the
5GHz Outdoor does not have dual band. -
The UniFi Controller software is setup in the Eee PC. Which runs FreeBSD
(10.4-RELEASE). -
The Eee PC also acts as the DHCP (dhcpd(8)) and DNS (unbound(8)) server. The
USG’s funtionality to do this has been disabled.
The network setup process usually happens a day before the actual start of the
event in the venue. The networking team coordinates with Audio / Video team
early on so that they can test their streaming capabilities and make sure the
network is not choking the connection. And on the last day of the conference the
network is all torn down and packed up until the next conference.
The trick to setting up a good network is to make sure that you have your APs
placed in a non-overlapping areas to get maximum coverage, generally the UniFi
APs are supposed to be ceiling mount, but since we do not have the luxury of
doing permanent mounts, most of the time we have the APs hanging by the their
cable causing it to be slightly less omnidirectional than designed. Most of the
other optimizations are taken care by the UniFi Controller software.
To distinguish between 5GHz and 2.4GHz networks we provide two SSIDs “HasGeek”
and “HasGeek Legacy” which is again configured using the settings options via
the UniFi Controller.
The available SSIDs for the WLAN groups.
The AP confiugration forth WLAN groups.
Network diagram
This basically shows the logical diagram of how the various components connect
together.
Plug and play pray
Typically once the network is up and running it is pretty much plug’n play(tm),
and it was running all smooth on the setup day. Until the crowd hit on Day 1,
Fifth Elephant is one of the biggest HasGeek conferences with an expected
footfall about 1000+ attendees. Once the crowd started to increase the amount of
devices connecting to the network increased and then suddenly random devices
stopped getting access to the internet.
-
My initial instinct was to check if the connectivity went down, but with
redundant connections that was not the point of failure. -
Then I went onto check if the devices were being authenticated correctly, and
some of the sample devices I picked up did not show any issues with connection
to the network. -
After this I checked if the DHCP leases were being granted to the device doing
atail -f /var/db/dhcpd.leases
showed that leases were being granted and a
simpleping
on the IP showed that the device was reponding in the network. -
Interestingly not all devices were experiencing this issue, some devices on
the network did and others did not, likewise for new devices. -
The only thing left to check was the DNS, even though I could not find any
obvious issue with withunbound(8)
, a restart of the service fixed the issue
for many of the devices.
But wait…
It was not long before things came to a grinding halt. Restarting unbound(8)
to fix the problem temporarily worked, but this was annoying for most of the
people using the network.
So I decided to seek the help of the one individual who initially helped to
establish this Trouble. Here is where my inexperience in setting up and
understanding networks really started to show. It is not that I feel ashamed
about asking about things I do not know about to some one who knows them, but
the fact that how do you communicate the problem with right words.
Now computery people are highly pedantic when it comes to definitions and
descriptions of problems. Here is an example illustrating this behavior.
-
The EdgeRouter PoE has 5 Ethernet ports and one console port. The console port
is aRJ 45
port which basically does a RS-232 (Serial) over
RJ 45. Accidentially calling this an “Ethernet” port can result in one of the
3 reactions-
Rage / Anger / Hate / Explosive and say that they do not understand the
problem or state that your problem statement is incorrect and there is a
good chance most newbies will get intimidated by this and have no clue on
what to do next. -
The typical troll behavior that would result in an interjecting correction
and leave you hanging in mid air and sometimes it takes a bit of time to
get the orientation back. These people are “mosty harmless”, but it can
get on your nerves due to the dangling nature conversation. In a
conversation like “What if I connect a cable in this ether…”,
interject, “That is not an ethernet port.” -
The patient one, who waits for the question to finish, correct the
mistake, and explain why.
-
Most of my learning from programming came from writing code and then finding
errors during compile. If you are one of the handful who does not need a
compiler to tell you what is wrong and eye ball the assembly and see the
execution states of the program the following reasoning may not make sense to
you.
If the compiler, just decided to delete your code or insult you everytime a
syntax error occurred your motivation to ever learn to program may just die
off. Atleast mine would have and coming from a hard path of learning these
things, I do not intentionally turn to type 1 or type 2 reaction when someone
asks me a stupid question regarding these things. And I hope this statement is
true about myself.
Describing the problem as a layman to someone who is experienced in the field
can result in rather interesting consequences. Thinking this was a DNS issue I
poked Trouble with the description of the issue saying that “I cannot seem to
access a website by the address and it looks it is timing out. But restarting
unbound(8) seems to cause it to work.”
Naturally, this was the dumbest way for me to explain a DNS issue. I should have
said that “I cannot drill(1)
a given hostname but I am able to ping(1)
the
IP”, now that is a precise description of the problem at hand. It specifically
focuses on saying that the DNS may be having issues. But alas my vocabulary was
not as much developed.
Trouble’s natural response was “Why would http(1) issue cause DNS to fail?”,
after a bit of back and forth, almost at the brink of Trouble’s patience, I
managed to put together drill(1)
, ping(1)
into one coherent sentence for
which finally he finally acknowledged that it could be an issue with the
unbound(8)
states being kept pf(4)
.
Looking at the /var/log/messages/
I saw the message
[pf states] PF states limit reached.
A bit of poking around with pfctl -sm
and asking for advice, I figured out
that the states limit reached with all the DNS queries that were being
generated. I added the following lines to /etc/pf.conf
set limit { states 50000, frags 25000, src-nodes 50000 }
Which made the problem go away for good. Despite the constant nagging by from
the HasGeek Team to shift services of DHCP and DNS back to the USG’s to solve
the issues with the network, I am glad that I did not succumb to pressure and
was able to identify the root cause of the problem and fix it.
Feature request
Once the network was up and running I asked the Audio / Video team regarding the
network connectivity. Everything seemed fine, but they wanted a separate
“Network” to keep their streaming PCs and Chomecast on. It made sense since a
Chromecast device in a network can be easily accessed if a person knows the PIN
to the device (which is usually advertised on the screen).
So back to the drawing board, I was no longer just maintaining the network, I
needed to tweak settings to accomplish this task.
Segregating the network
-
Configured the UniFi Controller to create a network
192.168.30.1/24
. -
Assigned the VLAN number 10 to this new network.
-
Created a wireless SSID “HasGeek Chromecast” and assigned VLAN number 10 to
this. Basically this took care of chromecast devices being able to be in their
on separate wireless LAN which is only accessed by the HasGeek Team. -
The UniFi controller also took care of DHCP and DNS, to handle this I went and
edited the/etc/rc.conf
and added a network configuration
# Chromecast network
vlans_alc0="10"
ifconfig_alc0_10="inet 192.168.30.2/24"
- Next was to configure
dhcpd(8)
to serve IPs for two different
networks. After some tweaking around and clean up, I managed to add a new
configuration block allocating IP to this new network segment.
shared-network chromecast-network {
subnet 192.168.30.0 netmask 255.255.255.0 {
option domain-name "chromecast.hasgeek.com";
option domain-name-servers 192.168.30.2;
option routers 192.168.30.1;
# Regular users get assigned this pool.
pool {
range 192.168.30.11 192.168.30.100;
deny members of "ubnt";
}
# Access points get assigned this pool.
pool {
allow members of "ubnt";
range 192.168.30.101 192.168.30.200;
# Point access points at the controller.
option ubnt.unifi-ip-address 192.168.30.2;
}
}
}
- After that I had to configure
unbound(8)
to allow DNS lookup through this
new IP. This wasby far the easiest.
interface: 192.168.30.2
access-control: 192.168.30.0/24 allow
- Finally add the new IP of the VLAN to
/etc/pf.conf
so that the traffic gets
through.
The new configuration being propagated over the access points.
Restared the services and voila we have a working separate network for the Audio
/ Video Team.
Dragons of NOC keep watch
… of the high bandwidth consumers. There are always devices which misbehaves and
start leeching all of the bandwidth due to some updates or a forgotten torrent. To
keep such leeching devices at bay we now have a new “User Group” to which a device
is put in, in case of huge data upload / downloads happens.
This should keep the HBC devices at bay.
Conclusion
The next couple of days went fine, the coffee was great, had some great food,
attended some nice talks, the tear down was smooth (except for one LAN cable on
to which some rat managed to kill itself).
Special thanks to all the volunteers who helped with pulling the cable and
placing them in the most awkward positions, so that the APs are pointing the
right direction. Trouble for giving help and support during the troubleshooting
(no pun intended) and of course the HasGeek Team for trusting their network
responsibilities on to someone who they did not know a lot about.
All in all, EVERYTHING IS FINE!!!
References
- UniFi Homepage - https://unifi-sdn.ubnt.com/
- HasGeek Homepage - https://hasgeek.com/
- UniFi AP-AC-PRO - https://www.ubnt.com/unifi/unifi-ap-ac-pro/
- UniFi AP / AP-PRO - https://www.ubnt.com/unifi/unifi-ap/
- UniFi Security Gateway - https://www.ubnt.com/unifi-routing/usg/
- UniFi Switch 8 60W - https://www.ubnt.com/unifi-switching/unifi-switch-8/
- UniFi Switch 8 150W - https://www.ubnt.com/unifi-switching/unifi-switch-8-150w/
- UniFi AP Outdoor - https://www.ubnt.com/unifi/unifi-ap-outdoor/
- EdgeRouter PoE - https://www.ubnt.com/edgemax/edgerouter-poe/
- ASUS Eee PC X101H - https://www.asus.com/Laptops/Eee_PC_X101H/