Moving Selenium tests in-house

Ed. Note: I started this article nearly a year ago, but got stuck on the Kubernetes piece. Now that I've resolved that, I'm publishing it.

I've been a very happy user of SauceLabs for testing for many years. However, I don't make a lot of use of it, and recently I've been trying to figure out how to cut down on my dependence on external SaaS services (and, for that matter, some external paid-for software). As part of this move, I decided to look at how I could run my selenium tests without any third-party services.

I've beeen running the Selenium tests locally as part of the development test for years, which has worked reasonably well (although be careful doing automated tests with Safari, it's not well suited for testing on a machine that's also in active use). However, using Chrome has been fine, and that works great when I need to run a quick re-test before committing code to the repo and letting the CI run.

SauceLabs as a workhorse

However, for CI, I've had to put together some pretty complex cases. Generally, I don't like exposing test systems to the internet when not necessary (at least prior to the stage phase). As such, I the django test jig on one of my systems (frequently in docker) and used SauceLab's Sauce Connect Proxy to proxy back to my docker container and access the server running on localhost. Despite the network delays involved in round-tripping to California for every test command and going through a bespoke VPN, the process worked well and, save occasional errors in communication, was a reliable testing environment.

Now, SauceLabs has a lot to recommend it, and it's got a really nice UI for teams (or for that matter individuals), but the pricing model has changed over the years. When I first signed up, I was paying $50/month for access, and that's remained the same as I'm on a Legacy plan. However, I've been hesitant to move off of it because the Legacy plans aren't available any longer and the remaining plans are a bit pricey for my use case, starting at $149/month. To be fair, their pricing model changed and they're now offering unlimited minutes for each parallel test. However, I don't need unlimited minutes. In fact, I looked around and noticed that there are some other providers that handle per-minute pricing, so originally I looked at that. However, the problem came back to having to expose my system to the internet, unless I wanted to put up a proxy and authentication, which just seemed unnecessarily annoying.

Enter the Selenium Docker container

I hadn't really looked at this in previous times because I wasn't running much docker nor any kubernetes. Generally, my only docker was run locally on my Mac as a useful way to fire up a linux environment when necessary.

This all changed last summer when I moved to GitLab, leading me not only to run docker containers directly on my SmartOS infrastructure, but also opening the door to Debian-based docker environments for testing.

With docker already running and GitLab already spinning up docker containers (and containers in Kubernetes as well these days), I had a chance to look at the Selenium Docker containers as a tool once again.

All told, there was a bit of experimentation to get it working, but I managed to get it functioning in the docker environment with some small tweaks. In particular, I needed to add:

--docker-shm-size 2000000000

to my runner (setting up for 2GB of shared memory), and setting a feature flag during the run in order to ensure networking was happy (below). Once that was running, things worked very well.

I also tried getting it running in my k8s runner environment, where I had less success, until I finally set out to finish this article (nearly a year later). In the intervening time, there were posts about the chrome shared memory exhaustion that was plaguing my execution.

I need to keep tweaking this, but, from this issue on docker-selenium, I got the pointer I needed to stack overflow which recommended:

spec:
  volumes:
  - name: dshm
    emptyDir:
      medium: Memory
  containers:
  - image: gcr.io/project/image
    volumeMounts:
      - mountPath: /dev/shm
        name: dshm

Which effectively translates into the following runner requirement in gitlab:

  [[runners.kubernetes.volumes.empty_dir]]
    name = "empty-dir"
    mount_path = "/dev/shm"
    medium = "Memory"

In the end, once I was past the shared memory and adjusted for networking in the pod required localhost instead of the bridged network with local DNS, I was able to stabilize the tests. And, it turns out they're a bit faster than the pure docker environment tests.

In the end the tests were sufficiently unstable that I won't use them in a production environment.

The CI/CD configuration is similar to the docker configuration with a few changes. This will be detailed in the next section.

Selenium-test configuration

selenium-test:
  tags: [docker,selenium]
  stage: test
  interruptible: true
  image: python:3.11
  variables:
    FF_NETWORK_PER_BUILD: 1
    GRID_URL: "http://selenium__standalone-chrome:4444"
  services:
    - name: selenium/standalone-chrome:4
#      alias: selenium
  script:
    # following is used for internal/sauceconnect use
    - apt-get update
    - apt-get install -y --no-install-recommends curl
    - curl -sSL https://install.python-poetry.org | python3 -
    - export PATH="~/.local/bin:$PATH"
    - poetry install
    - poetry run coverage erase
    - mkdir -p output
    - curl $GRID_URL
    - poetry run coverage run --branch ./manage.py test selenium_tests
    - mv .coverage .coverage-selenium-${CI_JOB_NAME}
  artifacts:
    when: always
    paths:
      - .coverage-selenium-${CI_JOB_NAME}
      - output
    reports:
      junit: reports/junit.xml

Using the python:3.11 container to run the tests, along with a service container using the selenium/standalone-chrome image as a sidecar in the docker environment to run the

The FF_NETWORK_PER_BUILD setting provides a private network, which prevents problems from interaction between multiple builds theoretically running on the same machine (in my case, this would only happen in kubernetes).

The GRID_URL is created automatically by docker and results in the hostname for the sidecar being selenium__standalone-chrome at port 4444.

Beyond the variables, the rest is bootstrapping and running the tests that require selenium. Since I'd been using it with Saucelabs, I don't have much to do to modify the tests except make sure they're only using standard testing code (no need to send telemetry information about builds to SauceLabs nowadays).

In this case, I load up poetry from the standard location, install it, set the path, install my poetry app poetry install followed by poetry run coverage erase to make sure there's no old image data.

Finally, I run the test through coverage, and rename the coverage results, so that they don't overwrite the coverage results from my unit test step and can be combined.

Once the run is complete (regardless of the results), the coverage, output, and testing reports are uploaded.

Kubernetes modifications

Since the kubernetes environment is a bit different, it uses a slightly modified configuration, directly extending the docker configuration above:

kube-test:
  extends: selenium-test
  tags: [docker,kubernetes]
  variables:
    FF_NETWORK_PER_BUILD: 1
    KUBERNETES_SERVICE_CPU_REQUEST: 2
    KUBERNETES_SERVICE_MEMORY_REQUEST: 4Gi
    GRID_URL: "http://localhost:4444"

The key configuration change here is the change in the GRID_URL, due to the difference in network handling.

Bonus round: combined coverage

Since I've now got my unit tests and my UI tests running in the same pipeline, I wanted to combine coverage, since the GitHub method of averaging doesn't really represent full test coverage. If it were the same code, it'd give too much credit, but if it were too little coverage, it'd not reresent it either. So what we really need to do is pull both sets of coverage information and merge them. I do this using the coverage combine command which is intended to combine multiple coverage files into the same report. By doing this, the actual coverage of code is represented.

combine-coverage:
  tags: [docker]
  stage: report
  interruptible: true
  image: python:3.11
  needs:
    - selenium-test
    - django-test
  script:
    - python3 -m venv venv
    - source venv/bin/activate
    - pip3 install coverage
    - coverage combine .coverage-*
    - mkdir -p reports
    - coverage xml -o reports/coverage.xml
    - coverage html -d public
    - >
       grep ^\<coverage reports/coverage.xml
       | sed -n -e 's/.*line-rate=\"\([0-9.]*\)\".*/\1/p'
       | awk '{print "CodeCoverageOverall =" $1*100}'
       || true
  coverage: '/^CodeCoverageOverall =(\d+\.?\d*)$/'
  artifacts:
    when: always
    paths:
      - public
    reports:
      coverage_report:
        coverage_format: cobertura
        path: reports/coverage.xml

Overall results

Reliability of running in pure docker has been about as good as I've seen with SauceLabs over the years, with much better control, and substantially lower costs. If your environment allows, I'd encourage making use of it.