Automating 802.1x part I<br/>or: small steps to automation bliss

We’d like to be able to automate our network deployment and management from a single source of truth, but before we get there from a running (enterprise, campus!) network, you’ll have to take some small steps first.

The scope of these posts is not 802.1x per se, but it’ll serve as a use case in which we’ll show you how automation can save time and bring some consistency and uniformity to the network (device) configuration. This might not be the sexiest side of automation, but it gets the job done and prepares your environment for more automation coolness later. And let’s face it: if you need to reconfigure hundreds of switches and tens of thousands of interfaces, automation literally saves (the) day(s).

Implementing 802.1x, and talking to all parties involved, enables us to convert a switch, location and interface specific configuration into a more generic configuration where specific items get pushed to the switch from a central RADIUS server (e.g. Cisco ISE). Also part of a kind of single source of truth if you like.

We’ve uploaded the Ansible playbooks involved to gitlab. The 802.1x playbooks where used on Cisco IOS switches, but shouldn’t be too difficult to adapt for other vendors/models.

The playbooks follow a similar setup:

Gather information (from the source of truth or lacking that, the switch).
Apply changes.
Validate changes: does the current active state match the desired state (or intent if you like).

First off: configure vlan group names

Unless you’re lucky and every switch is already a small L2 domain with L3 boundaries, and the same VLAN numbers are re-used everywhere, you’re going to need named vlan groups. With these, the RADIUS server will be able to push a name to the switch instead of an ID. The switch in turn decides which VLAN ID(‘s) will be used based on the vlan group name it received.

Chances are these vlan groups are not yet configured on all your switches. Our first step will then be to automate configuring the required vlan groups everywhere.

Because most switches are not quite there yet with regard to being able to exchange well formed structured data, quite a large part of network automation still centers around properly building the right CLI commands and parsing command output pretty printed for humans. Using Ansible to create and verify vlan groups is no exception:

Gather imformation:
pull a list of vlans from the switch by parsing the output of the ‘show vlan’ command (lacking a single source of truth, we depend on the configuration of every single switch.) and filter out the ID’s of the vlan names in which we’re interested.
Build configuration commands and apply changes:
build vlan group configuration commands to apply the changes.
Validate:
parse the output of the ‘show vlan group’ command to see if our changes have been applied the way we intended.

Some points of possible interest:

I tend to use a which_hosts variable with a sensible default in the hosts line. This enables me to specify a different Ansible inventory group from the commandline using e.g. -e "which_hosts='other_group'"
Likely -e arguments are defined and documented in a vars file with the same filename as the playbook.
parse_cli() is used to hammer the show command output into structured data. And very useful it is too. If you haven’t used it yet, be sure to give it a try.
The Cisco CLI command syntax isn’t always consistent. Unfortunately for vlan groups, there are no ‘add’ or ‘remove’ commands like e.g. for 802.1q trunk VLAN lists. This means we have to:
- first remove all VLAN’s from a possible previously defined vlan group. As not all Cisco switches accept a 1-4095 range, we need to try two versions of the ‘no’ command.
  
  Mind you, as a consequence of classic Cisco IOS not supporting transactions, there is a possible small race condition here: if an interface is to be configured with the VLAN ID from a vlan group right at the same time as we’re in the middle of removing and re-adding a (previously present) vlan group, the vlan assignment for that interface might fail. A possible workaround is to first check which switches are missing the vlan group is question (use --check with this very same playbook), then only apply the changes to these switches.
- then build a list of individual VLAN ID vlan group commands as it seems that using the Cisco number range syntax in Ansible confuses the Cisco IOS versions I applied the playbooks on.
  
  I’ve not yet discovered exactly why, but it seems a comma in the range causes the configuration command sent to fail.
Speaking of the number range syntax (e.g. 1,2,5-8), to validate the show vlan group output, we need to converts the list of ID’s into a number range. I searched the interwebs for a suitable filter, didn’t find anything, wrote my own Ansible filter, then for something unrelated browsed through the sources of Ansible network engine and discovered they include a vlan range filter called vlan_compress. Turns out I tried out various queries, but apparently neither me nor Google thought of the term compress. Although the network engine github page states they like you to use the engine as a foundation for building your own role, and you might have done, it’s not a problem to specify the core network engine as a role to be able to use vlan_compress().

And finally, how to use the playbook

Compliance check or dry-run, discover which switches need changes
vlan group named vlan_group_name should contain a list of IDs for VLANs actually configured on interfaces whose name matches regular expression search_vlan_pattern :
case insensitive names optionally starting with one or two letters, followed by one or more digits, followed by zero or more letters, a dash, the string ‘voip’, again a dash and one or more digits.
(i.e. b22a-voip-555 or 22-VoIP-555 or b22-VOIP-555)
```
  ansible-playbook vlan_group.yaml --check -v \
    -e "search_vlan_pattern='(?i)^([a-z]{1,2})?\d+\w*-voip-\d+'" \
    -e "vlan_group_name='DATA'"
```
After possibly having filtered out only the switches that need changes, and having added these to the inventory as a new group, apply changes to switches in the inventory group change_these :
```
  ansible-playbook vlan_group.yaml -e "which_hosts='change_these'" \
    -e "search_vlan_pattern='(?i)^([a-z]{1,2})?\d+\w*-voip-\d+'" \
    -e "vlan_group_name='DATA'"
```
If the list of switches to fix is small, it’s fine to use --limit of course.
If you don’t want to match a VLAN name, but want to configure a vlan group with a fixed VLAN ID everywhere, and if you don’t care if switches might not (yet) have ports in said VLAN :
```
  ansible-playbook vlan_group.yaml \
    -e "force_vlan_id=705" \
    -e "vlan_group_name='VG_705'" \
    -e "only_vlans_with_configured_ports=false"
```

Bonus round

As an added bonus here’s a bash script to clean up the Ansible playbook output. The script assumes you’re using the yaml stdout callback. Either specify this in your ansible.cfg :

[defaults]
stdout_callback = yaml
bin_ansible_callbacks = True

or use the environment variables:

export ANSIBLE_STDOUT_CALLBACK=yaml
export ANSIBLE_BIN_ANSIBLE_CALLBACKS=1

If you | tee vlan_group_check.log your ansible-playbook output or use ANSIBLE_LOG_PATH, you can use this script to show the relevant portions of the output:

#!/bin/bash

for log in *vlan_group_*check*.log; do
  echo $log
  printf "=%.0s" $(seq 1 ${#log})
  echo
  awk '/TASK \[Check if VLANs found\]/,/^$/' $log \
   | fgrep -A 2 fatal | egrep -o '\[.*\]| No ports in vlans.*' \
   | sed ':a; N; $!b a; s/\n\s\{1,\}/ /g'

  awk '/TASK \[Show config commands to be executed\]/,/^$/' $log \
    | egrep -v '^[[:space:]]*$' \
    | awk 'BEGIN { RS="ok:"; FS="\n" } { if (NF > 4) print $0 }'

  awk '/TASK \[.*compliance check #1\]/,/^$/' $log \
   | fgrep -A 2 fatal | egrep -o '\[.*\]| Missing vlan group.*' \
   | sed ':a; N; $!b a; s/\n\s\{1,\}/ /g'


  awk '/TASK \[.*compliance check #2\]/,/^$/' $log \
   | fgrep -A 4 fatal | egrep -o '\[.*\]|- .*' \
   | sed ':a; N; $!b a; s/\n-\s\{1,\}/ /g'

  echo
done

Finishing note

Although not quite an issue with this Playbook, parse_cli() might miss some information because essentially we’re screen scraping pretty printed command output. The default line width on most switches is 80. Ansible’s fix to try and expand this on switches that support it will still cause e.g. parse_cli() to miss items if the line length exceeds 512 characters which isn’t all too uncommon if you have e.g. more than 60 interfaces in one vlan. I’ve submitted a fix which the Ansible team merged. If your Ansible version hasn’t merged this from upstream yet, you might want to check this out.

Git repo

thefriendlynet/ansible_automation/8021x

the . friendly . net

scalable | network | automation | software defined | cloud

Automating 802.1x part I
or: small steps to automation bliss

First off: configure vlan group names

And finally, how to use the playbook

Bonus round

Finishing note

Git repo

Copyright Notice

Recent Posts

Categories

Tags

Blogroll

Archives