I’ve been trying some light contributions to OSM’s Chef repository. In the OpenAddresses project we learned early that reliable and responsive continuous testing and integration make it easier for contributors to approach our project, and I’m hoping to build similar tests for OSM Chef. We already do a basic syntax lint, but these new tests would run each complete cookbook on a clean disposable host and notify Github of the results:

kitchen test --parallel --destroy=always all && notify-passed || notify-failed

Contributors would see an additional green check-mark in their pull requests, and OSM admins would be able to accept contributions confident that they’ve been fully tested.

Why Mess With Chef?

I’ve been in a long conversation with Andy Allan about small ways to help with OSM’s operational infrastructure. He nudged me in the direction of OSM’s Chef configuration, which shouldn’t be a surprise: Chef is how OSM manages the configuration of all the servers run by the OpenStreetMap Foundation’s Operations Working Group. Contributions to Chef are specifically cited in Andy’s Getting Involved post and mentioned in policies for both the Operations Working Group (OWG) and the Sysadmins group.

Andy recommended that I pay special attention to the Wiki cookbook: it’s the system that has the most outside interest from non-sysadmins over the last three years. For people who would like to change the configuration of a working cookbook would make it easier to test locally with test-kitchen and offer contributions that are known to work prior to deployment. Today, “we only find out if the changes actually work when we run them on the live servers.”

This PR is my initial pass at fixing some bugs in Chef, and there’s feedback there from Andy, Tom Hughes, and Grant Slater about OSM’s Chef expectations. My hope is to see this ultimately become functional, trusted, and automated enough that OWG repo admins are comfortable defaulting to “Yes” and accepting any change with community support that passes tests without technical debate. Chef is not a widely-understood technology and I think a lot of the casual DevOps world has moved on to container-based approaches, so it’s critical for OSM’s Chef to work in an automated and well-understood way to welcome new contributors.

Getting To Headless

Contributions to open projects are encouraged by providing a smooth entry path and headless continuous integration is an excellent way to make this happen. Github provides strong support for automated status updates that we’ve used for the OpenAddresses project throughout the past four years. We use it to automated feedback to users and generate screenshots of their contributions so they know they’re doing the right thing and we can safely rubber-stamp their input:

Screenshot showing Github user experience in OpenAddresses

For OSM Chef, we could run all cookbooks under test-kitchen. I’ve tested the PR above in a few environments that might potentially be used as part of an automated flow: on my local Mac where a developer might check their work prior to publishing, under the current long-term supported Ubuntu 18.04 that might be part of an AWS EC2 setup, and under the previous Ubuntu 16.04 that might be part of a Github Actions setup.

Results So Far

Host OS Test Driver Result
Mac OS Vagrant 🔶 cookbooks/mediawiki/resources/site.rb line 528: Column 'cuc_user' cannot be null
Ubuntu 16.04 Vagrant 🚫 Timed out while waiting for the machine to boot
Ubuntu 18.04 Vagrant 🚫 Timed out while waiting for the machine to boot
Ubuntu 16.04 Docker 🔶 cookbooks/mysql/recipes/default.rb line 23: No such file or directory - /sbin/status
Ubuntu 18.04 Docker 🔶 cookbooks/mysql/recipes/default.rb line 23: No such file or directory - /sbin/status
Mac OS Docker 🔶 cookbooks/mysql/recipes/default.rb line 23: No such file or directory - /sbin/status

Vagrant under Mac OS gets furthest, then ultimately gets stuck on problems in the cookbook itself. This is potentially fixable in OSM/Chef by updating the Mediawiki and Wiki cookbooks.

There is no meaningful difference between Ubuntu 16.04 and 18.04. The older version needs Ruby 2.4+ to be installed explicitly, but both run test-kitchen with minimal fuss. Although OSM is deployed to Ubuntu 18.04, the host OS for these tests varies.

Here’s the approximate script I’m using to run the tests from the Ubuntu 18.04 host OS:

git clone osm-chef && cd osm-chef
sudo apt-get update -y && sudo apt-get upgrade -y
sudo apt-get install build-essential ruby ruby-dev vagrant virtualbox 
sudo gem install test-kitchen kitchen-docker kitchen-vagrant
kitchen test --destroy=always wiki-ubuntu-1804

Vagrant on Ubuntu fails immediately. Googling for error strings does not turn up any obvious mistakes. The guest OS running under Virtualbox is unresponsive to input from Vagrant. I’m not experienced with either Vagrant or Virtualbox on Linux, so I’m unsure how to dig up problems here.

Docker on Mac OS and Ubuntu gets further and looks more promising, but bumps into a very different problem. Most of the cookbook runs happily until the MySQL cookbook tries to start the database and immediately fails to find /sbin/status, a script that no longer exists under systemd. The guest OS under Docker is identical to that under Vagrant, so I am confused and stumped by this error.

Next Steps

  • OSM operations group prefers to Vagrant to Docker for test configuration, but it fails inscrutably. Figure out why Vagrant guest OS’s under Ubuntu are unreachable.
  • Docker guest OS gets quite a bit further but runs into the /sbin/status issues. Figure out why this happens and advocate for switching from Vagrant to Docker in OSM test-kitchen configuration.
  • Return to OSM/Chef repository and get the Wiki cookbook working all the way through to the end.

Comment from TomH on 19 January 2020 at 18:04

The reason docker fails is that there is no init at all when using kitchen-docket so chef (having found there is no systemd running) assumes upstart is the init and then fails to find /sbin/status and errors.

As I explained on the github ticket, this is why we use vagrant, so that we have full VMs with a running init where we can manage services.

I am working on trying to use containers, but with a running init, and I have many of our tests working now, though with podman and I haven’t tried with real docker yet.

Comment from migurski on 19 January 2020 at 18:30

Thanks Tom. I’m still picking at Vagrant/Virtualbox to see if I can figure anything out there but the logfiles aren’t yielding anything I recognize as a clue.

Login to leave a comment