Separating Ansible roles for fun and profit

At ClueTrust, we use a lot of automation to run our systems. It's mostly how just a couple of us can manage hundreds of virtual servers and keep them up-to-date and operational.

A few years back, I moved from using Puppet to Ansible, mostly at the suggestion of RS, who was finding Ansible a solid choice for both network and server automation.

At a high level, my problems with Puppet were that my code tended to rot pretty quickly due to changing dependencies, and the Ruby-based DSLs were overly complex and hard to debug a few months after using them last.

Ansible is based on Python (a language I'm very comfortable with), and although there's a bit less elegance in an ordered series of steps in a configuration file (a playbook in ansible), there's a whole lot less difficulty in figuring out what the system is doing.

After years of working with Puppet, I wanted to move to deliberately move to Ansible in a way that would maximize ease of re-use in similar situations. This lead me to the following guiding principles:

Scope and general nature of Ansible code should get more specialized and opinionated as it moves from commands to roles to plays to playbooks to environments.
At every level, use generalized code from the previous level when reasonable.
As you move to more specialized code, the code becomes less useful to other groups. By the time you reach a playbook, the code should mostly be tying together reusable lower- level modules (especially roles and commands) and applying inventory data to them.
Wherever possible, parameterize items that are easy to parameterize. Especially when writing roles, use of parameters that can be defaulted or automatically set in the role can make it much easier to expand the use of a role across operating systems. In our case although most of our roles are aimed at SmartOS, there are a growing number that can also be applied to Linux environments when necessary.
Discrete units of code should be treated as separate and therefore stored in their own repositories. For example, since Roles are expected to be highly reusable, they're always each stored in their own individual git repository and included using Galaxy-style requirements.yml files. (Note: ansible-galaxy can point at arbitrary git and hg repositories as well as the marketplace. This includes private git repos, which is exactly what we do. The format is - src: git+user@server.fqdn/repo) Similarly, separating environments containing playbooks into separate repos also provides advantages.

Why so many Repos?

Looking at number 5 above might make you think that we have a lot of repositories for our ansible roles, and you'd be right. Not only does this promote reuse across the organization, but it also provides an easy point at which to make a role public if appropriate. Further, eventually you may need to make a breaking change in a role. If you do this in a separate repo, you can just set the specific version in the requirements.yml file for playbooks that don't need, or can't use the newer version just yet. However, this can be done on a role-by-role basis. In contrast, using a single omnibus repository for all roles will make this difficult if you need different versions of 2 roles in a single playbook.

Ansible environments

Similarly, when we build playbooks, we tend to put them in what I call environments. An ansible environment is a location that contains playbooks, inventories, and requirements. Because they contain inventories, they also frequently contain customization data related to similar hosts. Although these can be administrative domains, they're commonly also functional. However, due to the level of reuse of roles, the environments can be split in any way that makes sense. At a minimum, the environment contains:

at least one inventory
at least one playbook
a requirements.yml file
various vars directories (group_vars and host_vars)
various files directories (files and templates are well known, but we also use host_files for common host-specific files)
a README.md file

It's not uncommon for us to begin with an environment that contains one or more simple playbooks for related systems, and then evolve those playbooks over time into roles. This is effectively the next obvious step in refactoring ansible for us. If inside of an environment a series of steps becomes so common that it exists across many playbooks, it likely is moved to an included play. From there, if it's usable across other environments it is then turned into a role and gets its own repository. In this way, we can control the complexity of the playbooks and environments and standardize our roles to minimize configuration drift between similar systems.

Stage vs Prod

Everyone has their own way of doing inventories, but since I'm discussing our environments here, I figured I'd also touch on how I do inventories to manage stage vs production. For the most part, I tend to work with 1:1 stage and production systems. This way, whenever we need to validate a specific system, we have a way to do so without having to cobble something together by hand. Not that I necessarily test every individual host every time, but by keeping the mechanism standardized, it's easy to do so when necessary.

Generally speaking, I have 2 inventories in each environment, prod and stage. By specifying these on the command line explicitly (-i prod or -i stage), it is always clear whether you're going to affect crucial systems or not. At the base of these inventories, I include every host in a group named for inventory. As such, we can use stage.yml and prod.yml in the group_vars directory to specify items that are specific to the two environments.

Since all.yml in group_vars will be at the bottom of the priority, it's easy to do things like temporarily change the location or version of a binary in stage by putting the stage value in stage.yml. However, it's generally safer to define all of the vars that might be shared in all.yml. Obvious exceptions to this are items that, if shared, would cause problems, such as database server addresses. In these cases, put those explicitly only in the stage.yml and prod.yml, so that they're undefined if left out.

Host-specific files

For the most part, it makes sense that an environment's key configuration files would be in files or templates, and that items like nginx configurations for specific hosts or classes of system would be called out at the top level and configured using variables.

However, there are cases where we either use roles to auto-generate files or have files in a particular environment that are always different for each machine in the environment and might be too cumbersome to put in the configuration. Especially in the first case, we need to be able to check for existence quickly, and to create or modify the data as necessary, without affecting any other configuration parameters. In this case, we use our host-specific files directory host_files. Accessed as host_specific_files in all.yml, it is defined thusly:

host_specific_files: "{{ inventory_dir }}/host_files/{{ inventory_hostname }}"

And, thus for every individual inventory host, it specifies a particular directory in the environment (relative to the inventory). Files can then be placed inside this directory (or in sub-directories) that are created the first time a host is provisioned and saved when the environment is committed to the repository. Obviously, these should only be public files unless they are encrypted using ansible-vault. However, in cases of some persistent private keys for machines or services, we will frequently encrypt those and store them in the repo, with the encryption keys kept locally on the user's machine.

Note: this isn't for high-security items. Obviously, those should be much more limited in access. However, this mechanism it does provide for a good way to limit re-provisioning of certain keys that might otherwise cause people to become complacent about seeing changes frequently. For example, ssh host keys.