Initial Results When Testing OSM Chef With CI Tools
Posted by migurski on 19 January 2020 in English.I’ve been trying some light contributions to OSM’s Chef repository. In the OpenAddresses project we learned early that reliable and responsive continuous testing and integration make it easier for contributors to approach our project, and I’m hoping to build similar tests for OSM Chef. We already do a basic syntax lint, but these new tests would run each complete cookbook on a clean disposable host and notify Github of the results:
kitchen test --parallel --destroy=always all && notify-passed || notify-failed
Contributors would see an additional green check-mark in their pull requests, and OSM admins would be able to accept contributions confident that they’ve been fully tested.
Why Mess With Chef?
I’ve been in a long conversation with Andy Allan about small ways to help with OSM’s operational infrastructure. He nudged me in the direction of OSM’s Chef configuration, which shouldn’t be a surprise: Chef is how OSM manages the configuration of all the servers run by the OpenStreetMap Foundation’s Operations Working Group. Contributions to Chef are specifically cited in Andy’s Getting Involved post and mentioned in policies for both the Operations Working Group (OWG) and the Sysadmins group.
Andy recommended that I pay special attention to the Wiki cookbook: it’s the system that has the most outside interest from non-sysadmins over the last three years. For people who would like to change the configuration of wiki.openstreetmap.org a working cookbook would make it easier to test locally with test-kitchen
and offer contributions that are known to work prior to deployment. Today, “we only find out if the changes actually work when we run them on the live servers.”
This PR is my initial pass at fixing some bugs in Chef, and there’s feedback there from Andy, Tom Hughes, and Grant Slater about OSM’s Chef expectations. My hope is to see this ultimately become functional, trusted, and automated enough that OWG repo admins are comfortable defaulting to “Yes” and accepting any change with community support that passes tests without technical debate. Chef is not a widely-understood technology and I think a lot of the casual DevOps world has moved on to container-based approaches, so it’s critical for OSM’s Chef to work in an automated and well-understood way to welcome new contributors.
Getting To Headless
Contributions to open projects are encouraged by providing a smooth entry path and headless continuous integration is an excellent way to make this happen. Github provides strong support for automated status updates that we’ve used for the OpenAddresses project throughout the past four years. We use it to automated feedback to users and generate screenshots of their contributions so they know they’re doing the right thing and we can safely rubber-stamp their input:
For OSM Chef, we could run all cookbooks under test-kitchen
. I’ve tested the PR above in a few environments that might potentially be used as part of an automated flow: on my local Mac where a developer might check their work prior to publishing, under the current long-term supported Ubuntu 18.04 that might be part of an AWS EC2 setup, and under the previous Ubuntu 16.04 that might be part of a Github Actions setup.
Results So Far
Host OS | Test Driver | Result |
---|---|---|
Mac OS | Vagrant | 🔶 cookbooks/mediawiki/resources/site.rb line 528: Column 'cuc_user' cannot be null |
Ubuntu 16.04 | Vagrant | 🚫 Timed out while waiting for the machine to boot |
Ubuntu 18.04 | Vagrant | 🚫 Timed out while waiting for the machine to boot |
Ubuntu 16.04 | Docker | 🔶 cookbooks/mysql/recipes/default.rb line 23: No such file or directory - /sbin/status |
Ubuntu 18.04 | Docker | 🔶 cookbooks/mysql/recipes/default.rb line 23: No such file or directory - /sbin/status |
Mac OS | Docker | 🔶 cookbooks/mysql/recipes/default.rb line 23: No such file or directory - /sbin/status |
Vagrant under Mac OS gets furthest, then ultimately gets stuck on problems in the cookbook itself. This is potentially fixable in OSM/Chef by updating the Mediawiki and Wiki cookbooks.
There is no meaningful difference between Ubuntu 16.04 and 18.04. The older version needs Ruby 2.4+ to be installed explicitly, but both run test-kitchen
with minimal fuss. Although OSM is deployed to Ubuntu 18.04, the host OS for these tests varies.
Here’s the approximate script I’m using to run the tests from the Ubuntu 18.04 host OS:
git clone https://github.com/migurski/chef.git osm-chef && cd osm-chef
sudo apt-get update -y && sudo apt-get upgrade -y
sudo apt-get install build-essential ruby ruby-dev docker.io vagrant virtualbox
sudo gem install test-kitchen kitchen-docker kitchen-vagrant
kitchen test --destroy=always wiki-ubuntu-1804
Vagrant on Ubuntu fails immediately. Googling for error strings does not turn up any obvious mistakes. The guest OS running under Virtualbox is unresponsive to input from Vagrant. I’m not experienced with either Vagrant or Virtualbox on Linux, so I’m unsure how to dig up problems here.
Docker on Mac OS and Ubuntu gets further and looks more promising, but bumps into a very different problem. Most of the cookbook runs happily until the MySQL cookbook tries to start the database and immediately fails to find /sbin/status
, a script that no longer exists under systemd. The guest OS under Docker is identical to that under Vagrant, so I am confused and stumped by this error.
Next Steps
- OSM operations group prefers to Vagrant to Docker for test configuration, but it fails inscrutably. Figure out why Vagrant guest OS’s under Ubuntu are unreachable.
- Docker guest OS gets quite a bit further but runs into the
/sbin/status
issues. Figure out why this happens and advocate for switching from Vagrant to Docker in OSMtest-kitchen
configuration. - Return to OSM/Chef repository and get the Wiki cookbook working all the way through to the end.
Discussion
Comment from TomH on 19 January 2020 at 18:04
The reason docker fails is that there is no init at all when using kitchen-docket so chef (having found there is no systemd running) assumes upstart is the init and then fails to find
/sbin/status
and errors.As I explained on the github ticket, this is why we use vagrant, so that we have full VMs with a running init where we can manage services.
I am working on trying to use containers, but with a running init, and I have many of our tests working now, though with podman and I haven’t tried with real docker yet.
Comment from migurski on 19 January 2020 at 18:30
Thanks Tom. I’m still picking at Vagrant/Virtualbox to see if I can figure anything out there but the logfiles aren’t yielding anything I recognize as a clue.