Automating 802.1x part I
or: small steps to automation bliss
We’d like to be able to automate our network deployment and management from a single source of truth, but before we get there from a running (enterprise, campus!) network, you’ll have to take some small steps first.
The scope of these posts is not 802.1x per se, but it’ll serve as a use case in which we’ll show you how automation can save time and bring some consistency and uniformity to the network (device) configuration. This might not be the sexiest side of automation, but it gets the job done and prepares your environment for more automation coolness later. And let’s face it: if you need to reconfigure hundreds of switches and tens of thousands of interfaces, automation literally saves (the) day(s).
Implementing 802.1x, and talking to all parties involved, enables us to convert a switch, location and interface specific configuration into a more generic configuration where specific items get pushed to the switch from a central RADIUS server (e.g. Cisco ISE). Also part of a kind of single source of truth if you like.
We’ve uploaded the Ansible playbooks involved to gitlab. The 802.1x playbooks where used on Cisco IOS switches, but shouldn’t be too difficult to adapt for other vendors/models.
The playbooks follow a similar setup:
- Gather information (from the source of truth or lacking that, the switch).
- Apply changes.
- Validate changes: does the current active state match the desired state (or intent if you like).
First off: configure vlan group names
Unless you’re lucky and every switch is already a small L2 domain with L3 boundaries, and the same VLAN numbers are re-used everywhere, you’re going to need named vlan groups. With these, the RADIUS server will be able to push a name to the switch instead of an ID. The switch in turn decides which VLAN ID(‘s) will be used based on the vlan group name it received.
Chances are these vlan groups are not yet configured on all your switches. Our first step will then be to automate configuring the required vlan groups everywhere.
Because most switches are not quite there yet with regard to being able to exchange well formed structured data, quite a large part of network automation still centers around properly building the right CLI commands and parsing command output pretty printed for humans. Using Ansible to create and verify vlan groups is no exception:
- Gather imformation:
pull a list of vlans from the switch by parsing the output of the ‘show vlan’ command (lacking a single source of truth, we depend on the configuration of every single switch.) and filter out the ID’s of the vlan names in which we’re interested. - Build configuration commands and apply changes:
build vlan group configuration commands to apply the changes. - Validate:
parse the output of the ‘show vlan group’ command to see if our changes have been applied the way we intended.
Some points of possible interest:
-
I tend to use a
which_hosts
variable with a sensible default in thehosts
line. This enables me to specify a different Ansible inventory group from the commandline using e.g.-e "which_hosts='other_group'"
-
Likely
-e
arguments are defined and documented in a vars file with the same filename as the playbook. -
parse_cli()
is used to hammer the show command output into structured data. And very useful it is too. If you haven’t used it yet, be sure to give it a try. - The Cisco CLI command syntax isn’t always consistent. Unfortunately for
vlan groups, there are no ‘add’ or ‘remove’ commands like e.g. for 802.1q
trunk VLAN lists. This means we have to:
- first remove all VLAN’s from a possible previously defined vlan group.
As not all Cisco switches accept a 1-4095 range, we need to try two versions
of the ‘no’ command.
Mind you, as a consequence of classic Cisco IOS not supporting transactions, there is a possible small race condition here: if an interface is to be configured with the VLAN ID from a vlan group right at the same time as we’re in the middle of removing and re-adding a (previously present) vlan group, the vlan assignment for that interface might fail. A possible workaround is to first check which switches are missing the vlan group is question (use
--check
with this very same playbook), then only apply the changes to these switches. - then build a list of individual VLAN ID vlan group commands as it seems
that using the Cisco number range syntax in Ansible confuses the Cisco
IOS versions I applied the playbooks on.
I’ve not yet discovered exactly why, but it seems a comma in the range causes the configuration command sent to fail.
- first remove all VLAN’s from a possible previously defined vlan group.
As not all Cisco switches accept a 1-4095 range, we need to try two versions
of the ‘no’ command.
- Speaking of the number range syntax (e.g. 1,2,5-8), to validate the
show vlan group output, we need to converts the list of ID’s into
a number range. I searched the interwebs for a suitable filter, didn’t find
anything, wrote my own Ansible filter, then for something unrelated
browsed through the sources of
Ansible network engine
and discovered they include a vlan range filter called
vlan_compress
. Turns out I tried out various queries, but apparently neither me nor Google thought of the term compress. Although the network engine github page states they like you to use the engine as a foundation for building your own role, and you might have done, it’s not a problem to specify the core network engine as a role to be able to use vlan_compress().
And finally, how to use the playbook
- Compliance check or dry-run, discover which switches need changes
vlan group named vlan_group_name should contain a list of IDs for VLANs actually configured on interfaces whose name matches regular expression search_vlan_pattern :
case insensitive names optionally starting with one or two letters, followed by one or more digits, followed by zero or more letters, a dash, the string ‘voip’, again a dash and one or more digits.
(i.e. b22a-voip-555 or 22-VoIP-555 or b22-VOIP-555)ansible-playbook vlan_group.yaml --check -v \ -e "search_vlan_pattern='(?i)^([a-z]{1,2})?\d+\w*-voip-\d+'" \ -e "vlan_group_name='DATA'"
- After possibly having filtered out only the switches that need changes, and
having added these to the inventory as a new group, apply changes to switches
in the inventory group change_these :
ansible-playbook vlan_group.yaml -e "which_hosts='change_these'" \ -e "search_vlan_pattern='(?i)^([a-z]{1,2})?\d+\w*-voip-\d+'" \ -e "vlan_group_name='DATA'"
If the list of switches to fix is small, it’s fine to use
--limit
of course. - If you don’t want to match a VLAN name, but want to configure a vlan group
with a fixed VLAN ID everywhere, and if you don’t care if switches might not
(yet) have ports in said VLAN :
ansible-playbook vlan_group.yaml \ -e "force_vlan_id=705" \ -e "vlan_group_name='VG_705'" \ -e "only_vlans_with_configured_ports=false"
Bonus round
As an added bonus here’s a bash script to clean up the Ansible playbook output.
The script assumes you’re using the yaml stdout callback. Either specify this in your ansible.cfg
:
[defaults]
stdout_callback = yaml
bin_ansible_callbacks = True
or use the environment variables:
export ANSIBLE_STDOUT_CALLBACK=yaml
export ANSIBLE_BIN_ANSIBLE_CALLBACKS=1
If you | tee vlan_group_check.log
your ansible-playbook output or use
ANSIBLE_LOG_PATH
, you can use this script to show the relevant
portions of the output:
#!/bin/bash
for log in *vlan_group_*check*.log; do
echo $log
printf "=%.0s" $(seq 1 ${#log})
echo
awk '/TASK \[Check if VLANs found\]/,/^$/' $log \
| fgrep -A 2 fatal | egrep -o '\[.*\]| No ports in vlans.*' \
| sed ':a; N; $!b a; s/\n\s\{1,\}/ /g'
awk '/TASK \[Show config commands to be executed\]/,/^$/' $log \
| egrep -v '^[[:space:]]*$' \
| awk 'BEGIN { RS="ok:"; FS="\n" } { if (NF > 4) print $0 }'
awk '/TASK \[.*compliance check #1\]/,/^$/' $log \
| fgrep -A 2 fatal | egrep -o '\[.*\]| Missing vlan group.*' \
| sed ':a; N; $!b a; s/\n\s\{1,\}/ /g'
awk '/TASK \[.*compliance check #2\]/,/^$/' $log \
| fgrep -A 4 fatal | egrep -o '\[.*\]|- .*' \
| sed ':a; N; $!b a; s/\n-\s\{1,\}/ /g'
echo
done
Finishing note
Although not quite an issue with this Playbook, parse_cli() might miss some information because essentially we’re screen scraping pretty printed command output. The default line width on most switches is 80. Ansible’s fix to try and expand this on switches that support it will still cause e.g. parse_cli() to miss items if the line length exceeds 512 characters which isn’t all too uncommon if you have e.g. more than 60 interfaces in one vlan. I’ve submitted a fix which the Ansible team merged. If your Ansible version hasn’t merged this from upstream yet, you might want to check this out.