Deploying with Gitlab | Gaige's Pages

In June, I mentioned in an article about Docker on SmartOS that we are doing some work with GitLab these days as a replacement for my venerable Gitolite server (and, to an increasing extent Jenkins).

Deploying from Pelican

I'm likely going to write more on GitLab in the near future, but for now, I'd like to document some things I've learned about deploying with Gitlab.

This blog is deployed in a semi-automated fashion. As mentioned previously, it is compiled using pelican and served as static pages using nginx.

As such, once modifications are made, I'm ready to verify that they look OK and work correctly on the Stage Server; once I'm happy with that deployment, it's time to push to production.

Historically, I started out by doing a complete rebuild of the server serving up the pages. THat got tedious if I was writing a lot of posts (or, at least if I was writing posts more frequently than OS releases and nginx releases). Eventually, I modified my ansible scripts so that they had a tag for publish which would skip the re-provisioning process and the process of building new certs, etc. and just deploy the latest pelican, build the pages, and reset the cache. In fact, it would do so in a separate directory, so that it would flash-cut the web pages.

While rolling out GitLab, I started playing with the CI tools and realized there was a lot I could do with it, much of it more easily than I could with Jenkins. As such, an automatic build to stage followed by a manually-triggered build to production was simple to configure.

So, I set out on my next automation journey with GitLab...

Access control

One nice thing about running the CI under the rubric of the SCM is that you can grant permissions to do source-related things just from the SCM. This makes it simple to pull from multiple repositories and perform other SCM-specific tasks.

However, this doesn't specifically extend beyond the CI and SCM and into the deployment. So, my next question was how to control access to the hosts and make sure that I could control them, and retrieve the code without trouble.

Further, I wanted to re-use the ansible playbooks that I used to deploy the systems (albeit with tags to reduce the plays), while limiting access to the stage and production servers (not the SmartOS global zones they're deployed from). Since I was reusing these mechanisms, I wanted to leave the existing ssh-based access controls in place.

As an aside, I could now switch my deployment method for git repositories to using deployment or personal access tokens, but I'd rather not right now.

SSH solution

My existing deployment pattern automatically deals with what I refer to as ssh_access_keys, which are SSH keys that are used for root access to the servers. These are generally used infrequently (there are separate deployment keys that are multi-server), but when accessing only the VM, the ssh_access_keys are precisely the right tool.

When running on the CI server, I need to adopt the ssh key as part of the CI process, and I use ssh-agent to do that (one agent per running CI process, segregated by the socket/pid combination). It's simple to start this by using:

eval $(ssh-agent -s)

This creates the agent and sets the shell variables so the agent is reachable.

Then comes the real trick: loading the ssh key into the agent. I had a vague recollection that it was possible to load a key from a shell variable, and here's how to do it:

echo "$HOST_DEPLOY_KEY" | tr -d '\r' | ssh-add -

Ansible ssh control paths

While getting this put together, I ran across an issue with the length of the path for ANSIBLE_SSH_CONTROL_PATH, which is used by SSH to persist connections (in our configuration). Especially on Solaris (and derivatives, like SmartOS), there's a path limit to the control file and caused a problem with the relatively-deep nesting that gitlab runners do for their paths. The solution was to define a bespoke path:

export ANSIBLE_SSH_CONTROL_PATH_DIR=/tmp/${CI_JOB_ID}-${CI_COMMIT_SHORT_SHA}/.ansible/cp

Not that this path is in /tmp, not in ~ and certainly not in the build directory, however, it does change for every job and repo.

Final gitlab script

  script:
    - eval $(ssh-agent -s)
    - echo "$HOST_DEPLOY_KEY" | tr -d '\r' | ssh-add -
    - export HOME=$(pwd)
    - export ANSIBLE_SSH_CONTROL_PATH_DIR=/tmp/${CI_JOB_ID}-${CI_COMMIT_SHORT_SHA}/.ansible/cp
    - 'git config --global url."https://gitlab-ci-token:${CI_JOB_TOKEN}@your.git.server/".insteadOf git@your.git.server:'
    - git clone https://gitlab-ci-token:${CI_JOB_TOKEN}@your.git.server/playbooks/ansible-web.git
    - cd ansible-web
    - ansible-galaxy install -r requirements.yml -f
    - ansible-playbook -i stage -t publish --vault-password-file $VAULT_SECRET -e cert_renew_days=0 pelican.yml

Putting it all together:

set up SSH (lines 1 & 2)
set HOME so that we're not stomping on another cache; this may not be necessary if you can guarantee that only one runner will be running at a time in each account (line 3)
set the ansible control path (line 4)
rewrite our git URLs globally (line 5)
check out our ansible playbook repository (line 6)
update the ansible galaxy requirements (lines 7 & 8)
Run the playbook to our stage sever (line 9)

You might be wondering about line 5, where we use an interesting feature of git to rewrite the URLs. This might not be explicitly necessary if I were to allow the ssh key that I use for deployment access to all of my dependencies in my git repo. However, I've left it that way for future compatibility and because it confines this particular script to being run by the CI server.

So, for those keeping score, the gitlab server runs the gitlab script on a SmartOS host which runs the gitlab agent, and thus the ansible runs on SmartOS. Theoretically the ansible could run on basically anything (my Jenkins versions of this ran on macOS Jenkins nodes), but our provisioning is done from SmartOS these days, so keeping things the same is a good thing.

Manually-triggered releases

I mentioned in the beginning that I was going to be manually trigging the release to production. This is done using a rule in the GitLab CI configuration:

deploy-prod:
  tags: [ansible]
  stage: prod
  environment:
    name: production
    url: https://${SERVER}
  script:
    - "... see above ..."
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      when: manual

This job definition requires that the runner be tagged ansible, names the stage prod, sets up an environment for prod with the URL pointing at our final server, includes the script above, and then conditionally (and only on the main branch) holds for manual release.

Script locations

One additional note I'll make is that I made some potentially interesting decisions on where to place the gitlab scripts. Since I tend to have multiple hosts (or groups) using the same ansible plays, I knew I wanted a place to share the scripts calling them. Their requirements tend to be more aligned with the ansible playbooks than the code that is deployed. As such, I placed the gitlab-ci jobs as templates in my ansible playbook repositories in a gitlab-deploy directory. I aligned the names with the playbooks.

To call these, I use the include directive in the .gitlab-ci.yml files for the repositories I'm deploying:

# This will work, but not on the python runner (yet)
include:
  - project: 'playbooks/ansible-web'
    file: 'gitlab-deploy/pelican.yml'

variables:
  SERVER_GROUP: gaiges_pages
  SERVER: www.gaige.net

Note the additional variables. Since this deployment script is used by both the Gaige's Pages and Cartographica blogs, I needed a way to pass in the server and group names.