We’d like to be able to automate our network deployment and management from a single source of truth, but before we get there from a running (enterprise, campus!) network, you’ll have to take some small steps first.

The scope of these posts is not 802.1x per se, but it’ll serve as a use case in which we’ll show you how automation can save time and bring some consistency and uniformity to the network (device) configuration. This might not be the sexiest side of automation, but it gets the job done and prepares your environment for more automation coolness later. And let’s face it: if you need to reconfigure hundreds of switches and tens of thousands of interfaces, automation literally saves (the) day(s).

Implementing 802.1x, and talking to all parties involved, enables us to convert a switch, location and interface specific configuration into a more generic configuration where specific items get pushed to the switch from a central RADIUS server (e.g. Cisco ISE). Also part of a kind of single source of truth if you like.

We’ve uploaded the Ansible playbooks involved to gitlab. The 802.1x playbooks where used on Cisco IOS switches, but shouldn’t be too difficult to adapt for other vendors/models.

The playbooks follow a similar setup:

  • Gather information (from the source of truth or lacking that, the switch).
  • Apply changes.
  • Validate changes: does the current active state match the desired state (or intent if you like).

 

First off: configure vlan group names

Unless you’re lucky and every switch is already a small L2 domain with L3 boundaries, and the same VLAN numbers are re-used everywhere, you’re going to need named vlan groups. With these, the RADIUS server will be able to push a name to the switch instead of an ID. The switch in turn decides which VLAN ID(‘s) will be used based on the vlan group name it received.

Chances are these vlan groups are not yet configured on all your switches. Our first step will then be to automate configuring the required vlan groups everywhere.

Because most switches are not quite there yet with regard to being able to exchange well formed structured data, quite a large part of network automation still centers around properly building the right CLI commands and parsing command output pretty printed for humans. Using Ansible to create and verify vlan groups is no exception:

  • Gather imformation:
    pull a list of vlans from the switch by parsing the output of the ‘show vlan’ command (lacking a single source of truth, we depend on the configuration of every single switch.) and filter out the ID’s of the vlan names in which we’re interested.
  • Build configuration commands and apply changes:
    build vlan group configuration commands to apply the changes.
  • Validate:
    parse the output of the ‘show vlan group’ command to see if our changes have been applied the way we intended.

Some points of possible interest:

  • I tend to use a which_hosts variable with a sensible default in the hosts line. This enables me to specify a different Ansible inventory group from the commandline using e.g. -e "which_hosts='other_group'"

  • Likely -e arguments are defined and documented in a vars file with the same filename as the playbook.

  • parse_cli() is used to hammer the show command output into structured data. And very useful it is too. If you haven’t used it yet, be sure to give it a try.

  • The Cisco CLI command syntax isn’t always consistent. Unfortunately for vlan groups, there are no ‘add’ or ‘remove’ commands like e.g. for 802.1q trunk VLAN lists. This means we have to:
    • first remove all VLAN’s from a possible previously defined vlan group. As not all Cisco switches accept a 1-4095 range, we need to try two versions of the ‘no’ command.

      Mind you, as a consequence of classic Cisco IOS not supporting transactions, there is a possible small race condition here: if an interface is to be configured with the VLAN ID from a vlan group right at the same time as we’re in the middle of removing and re-adding a (previously present) vlan group, the vlan assignment for that interface might fail. A possible workaround is to first check which switches are missing the vlan group is question (use --check with this very same playbook), then only apply the changes to these switches.

    • then build a list of individual VLAN ID vlan group commands as it seems that using the Cisco number range syntax in Ansible confuses the Cisco IOS versions I applied the playbooks on.

      I’ve not yet discovered exactly why, but it seems a comma in the range causes the configuration command sent to fail.

  • Speaking of the number range syntax (e.g. 1,2,5-8), to validate the show vlan group output, we need to converts the list of ID’s into a number range. I searched the interwebs for a suitable filter, didn’t find anything, wrote my own Ansible filter, then for something unrelated browsed through the sources of Ansible network engine and discovered they include a vlan range filter called vlan_compress. Turns out I tried out various queries, but apparently neither me nor Google thought of the term compress. Although the network engine github page states they like you to use the engine as a foundation for building your own role, and you might have done, it’s not a problem to specify the core network engine as a role to be able to use vlan_compress().

 

And finally, how to use the playbook

  • Compliance check or dry-run, discover which switches need changes
    vlan group named vlan_group_name should contain a list of IDs for VLANs actually configured on interfaces whose name matches regular expression search_vlan_pattern :
    case insensitive names optionally starting with one or two letters, followed by one or more digits, followed by zero or more letters, a dash, the string ‘voip’, again a dash and one or more digits.
    (i.e. b22a-voip-555 or 22-VoIP-555 or b22-VOIP-555)
      ansible-playbook vlan_group.yaml --check -v \
        -e "search_vlan_pattern='(?i)^([a-z]{1,2})?\d+\w*-voip-\d+'" \
        -e "vlan_group_name='DATA'"
    
  • After possibly having filtered out only the switches that need changes, and having added these to the inventory as a new group, apply changes to switches in the inventory group change_these :
      ansible-playbook vlan_group.yaml -e "which_hosts='change_these'" \
        -e "search_vlan_pattern='(?i)^([a-z]{1,2})?\d+\w*-voip-\d+'" \
        -e "vlan_group_name='DATA'"
    

    If the list of switches to fix is small, it’s fine to use --limit of course.

  • If you don’t want to match a VLAN name, but want to configure a vlan group with a fixed VLAN ID everywhere, and if you don’t care if switches might not (yet) have ports in said VLAN :
      ansible-playbook vlan_group.yaml \
        -e "force_vlan_id=705" \
        -e "vlan_group_name='VG_705'" \
        -e "only_vlans_with_configured_ports=false"
    

 

Bonus round

As an added bonus here’s a bash script to clean up the Ansible playbook output. The script assumes you’re using the yaml stdout callback. Either specify this in your ansible.cfg :

[defaults]
stdout_callback = yaml
bin_ansible_callbacks = True

or use the environment variables:

export ANSIBLE_STDOUT_CALLBACK=yaml
export ANSIBLE_BIN_ANSIBLE_CALLBACKS=1

If you | tee vlan_group_check.log your ansible-playbook output or use ANSIBLE_LOG_PATH, you can use this script to show the relevant portions of the output:

#!/bin/bash

for log in *vlan_group_*check*.log; do
  echo $log
  printf "=%.0s" $(seq 1 ${#log})
  echo
  awk '/TASK \[Check if VLANs found\]/,/^$/' $log \
   | fgrep -A 2 fatal | egrep -o '\[.*\]| No ports in vlans.*' \
   | sed ':a; N; $!b a; s/\n\s\{1,\}/ /g'

  awk '/TASK \[Show config commands to be executed\]/,/^$/' $log \
    | egrep -v '^[[:space:]]*$' \
    | awk 'BEGIN { RS="ok:"; FS="\n" } { if (NF > 4) print $0 }'

  awk '/TASK \[.*compliance check #1\]/,/^$/' $log \
   | fgrep -A 2 fatal | egrep -o '\[.*\]| Missing vlan group.*' \
   | sed ':a; N; $!b a; s/\n\s\{1,\}/ /g'


  awk '/TASK \[.*compliance check #2\]/,/^$/' $log \
   | fgrep -A 4 fatal | egrep -o '\[.*\]|- .*' \
   | sed ':a; N; $!b a; s/\n-\s\{1,\}/ /g'

  echo
done

 

Finishing note

Although not quite an issue with this Playbook, parse_cli() might miss some information because essentially we’re screen scraping pretty printed command output. The default line width on most switches is 80. Ansible’s fix to try and expand this on switches that support it will still cause e.g. parse_cli() to miss items if the line length exceeds 512 characters which isn’t all too uncommon if you have e.g. more than 60 interfaces in one vlan. I’ve submitted a fix which the Ansible team merged. If your Ansible version hasn’t merged this from upstream yet, you might want to check this out.

 

Git repo

thefriendlynet/ansible_automation/8021x

by Albert


In which we select items from a dict (instead of a list)

and will touch upon using loop versus when to keep stdout clean

instead of using the vendor specific *_facts modules, I’d recommend looking into Ansible Napalm, on which more in a future post. Napalm returns a similar interfaces dict, i.e. this applies to Napalm as well.

nxos_facts, if asked nicely, returns the list of interfaces on a switch in a dict with items like this:

{
  "Ethernet1/1": {
      "bandwidth": "10000000",
      "description": "Server1",
      "duplex": "full",
      "macaddress": "1234.5678.9abc",
      "mode": "trunk",
      "mtu": "1500",
      "speed": "10 Gb/s",
      "state": "up",
      "type": "100/1000/10000 Ethernet"
  },

Ansible’s selectattr() and friends have been written with lists in mind, hence in order to search a dict we first convert it into a list by using dict2items :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
- hosts: datacenter
  gather_facts: no
  serial: 1

  vars:
    re_switch_description: "regular expression match on this"
    newline: "\n "

  tasks:

    - name: "Get interface facts"
      nxos_facts:
        gather_subset: interfaces
      register: facts_nxos
      when: not facts_nxos is defined

    - name: "Find interfaces"
      set_fact:
        prompt_msg: "{{ prompt_msg|default([]) + [
            item.key ~ ' (' ~item.value.mode ~ '/' ~ item.value.state ~ ')'
            ~ ' : ' ~ item.value.description
          ] }}"
        l2_aggr: "{{ l2_aggr | default([]) + [{'name': item.key}] }}"
      loop: "{{ facts_nxos.ansible_facts.ansible_net_interfaces
        | dict2items
        | selectattr('value.description', 'defined')
        | rejectattr('value.description', 'none')
        | selectattr('value.description', 'search', re_switch_description)
        | list
       }}"

    - name: "Continue with these interfaces ?"
      pause:
        prompt: "{{ (prompt_msg + ['Enter to ontinue, ctrl-C to stop ']) | join(newline) }}"

    - name: "Now you can use aggregated list"
      debug:
        msg: "{{ l2_aggr }}"
  1. our dict has been converted into a key/value list (see line 25). The interface name can now be found in item.key and the interface properties are item.value.<property>
  1. some nxos_/ios_ modules are able to work on an aggregate of interfaces, which speeds up your playbook, so let’s build that list shall we ?

  2. loop keeps the stdout output as clear as possible by selecting the relevant interfaces only. If possible try to use loop instead of when as the latter will clutter stdout with skip messages.

  1. we convert the dictionary into a list. See 20 and below.
  2. to 27 luckily selectattr() supports multi dimensional adressing by seperation with dots. As we only want interfaces with an actual description we’ll select defined and reject none.
  3. finally, we can use the regular expression search test on value.description.

For a complete list of all available tests in e.g. selectattr()/rejectattr() see: Jinja builtin tests and Ansible tests

And here’s how to, for instance, select all interfaces whose description contains FREE:

ansible-playbook find_ports_by.yaml -e "re_switch_description='FREE'"

by Albert


Let’s say you don’t want to trawl through all the skipped tasks in your Ansible output. The skippy stdout callback comes to mind. However, if you’re already using another stdout callback you’re out of lack as the callbacks can’t be chained or stacked.

awk can help you out here. mawk and gawk both did the job when used on various Debian and Ubuntu Ansible hosts. Although with mawk I had to wait until the playbook played out in it’s entirety :

ANSIBLE_FORCE_COLOR=1 ansible-playbook my_playbook.yaml \
  | tee my.log \
  | awk 'BEGIN { RS="\n\n"; ORS="\n\n"; } !/skipping|skip_reason/ { print $0; fflush(); }'

We’re basically telling awk to divide the input into sections seperated by double newlines, have awk then filter out sections which contain either the text skipping or skip_reason.

You’ll end up with a colorized unfiltered my.log file and stdout without the skipped tasks.

In case you’re wondering about the fflush(): -W interactive doesn’t work as it forces RS to a single line, and stdbuf -o0 doesn’t seem to work with (m)awk.

If you want to strip the coloring from the logfile: sed 's/\x1B\[[0-9;]*m//g' my.log

I think the awk version is (way) easier to grok then a sed version, but I’ll let you decide :

ansible-playbook my_playbook.yaml -v \
 | tee my.log \
 | stdbuf -o0 sed -nr '/^TASK/{h;n;/^skipping:/{n;b};H;x};p' | sed 'N;/^\n$/D;P;D;'

And to add some color, we need to change the sed version into:

ANSIBLE_FORCE_COLOR=1 ansible-playbook my_playbook.yaml -v \
 | sed -nr "/^TASK/{h;n;/^\x1B\[0;36mskipping:/{n;b};H;x};p"

by Albert


Ansible ad-hoc mode is your friend if you need to gather, well, ad-hoc information from your networking infrastructure, or servers for that matter.

Of course, if you have a (single) source of truth, you’d be better off searching there :-)

Let’s say you want to know where to find a device with a certain MAC address :

ansible -c network_cli \
  -m nxos_command -a "commands='show mac address-table address 0011.0022.0033'" \
  datacenter_switches \
  | egrep -B 3 'Eth'

If you’re not (yet) using a centralized dynamic inventory, this assumes you’re in the directory where your inventory file lives and what’s more it’s rather a lot to type innit ?

To make life a little less unpleasant, let’s add a Unix/Linux shell ‘shortcut’ by defining a couple of (bash) functions in either system wide /etc/profile.d or your ~/.bash_profile :

function net_cli() {
  local _inventory_directory="/home/ops/ansible"
  limit=${4:+--limit "$4"}
  pushd . &> /dev/null
  cd "$_inventory_directory" && \
  ansible "$3" -c network_cli -m "$1" -a "commands='$2'" $limit
  popd &> /dev/null
}
function nxos_cmd() { net_cli "nxos_command" "$1" "$2" "$3"; }
function ios_cmd() { net_cli "ios_command" "$1" "$2" "$3"; }
function datacenter() { nxos_cmd "$1" 'datacenter_switches'${2:+,"$2"} "${@:3}"; }

 

After sourcing this (or logging out/in), you can enter: datacenter 'show ver' , or include more host patterns: datacenter 'show ver' 'pattern1,pattern2' , or limit to certain host(s): datacenter ',' 'my_inventory_host'

If you’ve changed the Ansible stdout callback in ansible.cfg or your environment, you may need to add -v to show the stdout results. Either that, or change the call to Ansible in net_cli() into:

  cd "$_inventory_directory" && \
   ANSIBLE_STDOUT_CALLBACK='default' ANSIBLE_BIN_ANSIBLE_CALLBACKS='1' \
    ansible -i switches.yaml "$3" -c network_cli -m "$1" -a "commands='$2'" $limit

by Albert


Want to quickly see the effects of various tests, comparisons, logic, conditionals, filters or other Jinja constructs ? Use Ansible’s ad-hoc mode:

function ta() { ansible localhost -m debug -a "msg=\"$1\""; }
ta "{{ [1,2,3] | zip(['a','b','c','d']) | list }}"

Or quickly experiment with the various test and ‘hidden’ Python methods:

ta "{{ 'Hello World' == 'Hello World' }}"

ta "{{ ('Hello' ~ ' ' ~ 'World') is eq('Hello World') }}"

ta "{{ 'World' in 'Hello World' }}"

ta "{{ 'Hello World' is search('World') }}"

ta "{{ 'Hello World'.find('World') }}"

ta "{{ 'Hello World'.split() }}"

ta '{{ "\n".join(["hello", "world"]).capitalize() }}'

ta "{{ 'Hello World'.endswith('World') }}"

Quick recap

  • tests are used to test values without changing them. There are simple Jinja tests like testing for equality == / is eq() and even tests like is version('18.04', '>='). Tests in turn can be used in certain filters, e.g. selectattr('defined')

  • Ansible filters and Jinja filters change values. Values are ‘piped’ through filters and multiple filters can be chained, like the example at the top of this post.

  • conditionals are expressions, which enable more dynamic task execution, e.g. conditionally running a task by using when or conditionally including roles and playbooks. Conditionals in turn use tests and optionally filters.

by Albert