Always check your arguments

Quite a while back, RS wrote a comprehensive ansible role for handling Let's Encrypt certificate issuance and renewal. We both use this role extensively, which is why it was a significant issue when it suddenly started throwing type errors deep inside of the dnspython library during an nsupdate call in a critical part of the script.

A cursory examination of the component parts indicated that the most likely cause was a change to the dnspython library, which had recently been upgrade from 1.16 to 2.0. Although there wasn't anything we could find online indicating other people had suffered this breakage (which should have been a clue), it hadn't been out very long, it crashed in a module that indicated it was checking something with IPv6, we use a lot of IPv6 on our systems, many people use no IPv6, and well, we hadn't changed anything...

This was an annoyance, but relatively easy to avoid in one of the following ways:

  • Pin the dnspython libraries to <2.0 in pip
  • On the Mac, use brew's ansible and manually roll back the dnspython libraries in the installed version[1]

I used both, as we ran ansible on both SmartOS and macOS.

Taking brew hackery up a notch

After maintaining this for a while, I needed to upgrade some modules in ansible, and needed to keep my CI environment (running on Macs under Jenkins in sync with what we were running on my desktop, laptop, and servers; and that lead me to create my own tap in homebrew by cloning the standard ansible formula and using my own repository.

The addition of this tap meant that I could configure this and test it once, but I could deploy it on all of my Macs (and anyone else who had access to the tap on my private git server could do the same).

One thing leads to another

After a few more months of using this tap on my Macs (and slowly moving ahead the ansible version on the SmartOS machines, but keeping dnspython pinned), I needed to upgrade the version of ansible at home (due to a project that I'll likely write about later, using ansible to configure my Jenkins agents). The driver here was the need to execute homebrew commands on an M1 mac, something that didn't work out of the box with ansible 2.9, which is what I was pinned to.

Ever-hopeful, I first decided to see if my aforementioned problem was "fixed" by unlinking[2] my private tap's version of ansible, and installing homebrew's version.

Sadly, running the ansible playbook just resulted in the familiar crash. I looked at it for a few minutes, decided the bug that was introduced in summer 2020 was still there and set about building a new tap for version 3.2.0 of ansible. This went smoothly, but after updating my formula, installing took a long time, on the order of a few minutes. Why was the standard homebrew install so much faster?

A bottle for monsieur?

Quick investigation lead to the fact that most brew taps are installed these days using bottles, or pre-built versions of the entire subdirectory that ends up in the Cellar. That seemed like it was a significant win, especially since I was going to install this at least 5 times each update, so I decided to figure out how to create my own custom bottles for my custom tap.

Thanks to a good article on Custom Tap and Bottles with Homebrew by Yehowshua Immanuel, I was on my way quickly after rebuilding from my tap formula once for each platform of Mac that I run (Intel Catalina, Intel Big Sur, and ARM Big Sur at this time).

The final verdict

After all this work, and getting a great solution in place for working around the perceived bug in dnspython, I took another quick look at the bug that was popping up in our role. I'd contributed to random python projects in the past and also contributed to ansible directly, so I was familiar with the process and figured I could track the problem down. I fired up pycharm to get a little better perspective on the particular bugs and settled in to reproduce a minimal set of the problem with the nsupdate command in ansible.

A few minutes (literally) into the investigation and I found myself looking at the what seemed like completely reasonable arguments to the dns.query.tcp method which were raising exceptions due to not being able to determine whether my hostname was an IPv4 or IPv6 address. I immediately checked the current docs for nsupdate in ansible and, indeed, the server argument is now designated an IP address (v4 or v6). Checking whether we'd just been lucky and ignoring this all along, I went back to the ansible 2.9 documentation and verified that it was mute on the issue of what was in the string argument.

At some point between 2.9 of ansible and 3.0, they documented the change caused by the the underlying library and I missed that change.

A few take-aways:

  1. Once again, a reminder that checking your arguments against current documentation is often time well spent.
  2. Assuming a behavior that goes against your expectations is a bug when nobody else is complaining about it is often a recipe for a lot of work.
  3. Homebrew is a really well thought out package and if you have a need to maintain your own tools, it may be well worth it to use private taps and bottles, they're easy to create and super-easy to use.

Every once in a while, it's good to have your own assumptions challenged. I made a point of commenting on the bug report for ansible regarding this filed by someone else. Hopefully they're find my information useful.

  1. This experience lead me to a nifty thing about brew, which is that many installations have every dependency installed in the Cellar directory for that specific package, including (for most python tools), it's own copy of site-packages. This makes it very easy to pin specific versions of dependencies and be able to run a number of python tools with different libraries and even interpreters. ↩︎

  2. Everyone who uses ansible should be familiar with the link and unlink commands, which allow you to keep a version or command installed while switching to another one. In my case, since I was using a tap that had named versions (the best example of this I can think of is Postgresql, which has separate versions for current, 12, 11, 10, 9.6 and even some of the deprecated versions--use at your own peril). So, I could brew unlink ansible@2.9.13 and brew install ansible and get my private copy to move out of the way and use the brew-standard version for testing. ↩︎