Automated visual checking of deployments with ImageMagick

We’re big fans of continuous deployment here at REA. Merging a pull request and seeing the changes flow all the way to production in a matter of minutes is really awesome. Unfortunately, even with a large number of automated tests, this also makes it possible for an uncaught bug to flow all the way through as well.

We recently experienced this when some new cache-busting code was mistakenly committed and caused our landing page to use a non-existent CSS file. Fortunately we noticed quickly and so the user impact was minimal, but it highlighted that the tests in our deployment pipeline were not as effective as we would like them to be.

Inspired by the visual diffs provided by SpeedCurve, we set about to see if we could produce something similar that would stop the deployment if the new version of the page looked really different from the previous one. It turns out the tech behind these visual diffs is ImageMagick, which makes this process really simple.

The first step is to capture some screenshots to compare. To do this we use the render method in PhantomJS and capture an image of the new version of the page after it has been deployed to our staging environment and another one of the current version from our production environment.

Here is an example of the two image captures side by side (spot the differences):363-364-sidebyside

The first issue is that the images are slightly different sizes which causes trouble for the compare feature in ImageMagick, the second is the content at the bottom of the page which is independent of the build may trigger false alarms. We can get around both of those easily enough by cropping the images.

convert -crop 1030x1750+0+0 staging.png staging-cropped.png

Which gives:


However the skyscraper advertisements on the right can also change, so let’s draw a rectangle over it to hide any differences. We chose blue to make it obvious in the resulting images that it has been purposely blocked out.

convert -crop 1030x1750+0+0 -draw "rectangle 698,448 1198,1337" -fill blue staging.png staging-cropped.png


So now we have a good base to compare, and with a little more ImageMagick we can get both the number of pixels that differ (with the -metric AE flag) and a new image showing the differences visually:

compare -metric AE staging-cropped.png production-cropped.png staging-production-diff.png


In this example only one article or 4956 pixels have changed, this represents less than 0.3% of the total image and suggests the deployment is probably fine.

Happy with this result we set it up in our build pipeline, but even with the page cropped and the advertisements blocked we were still getting some false positives.

So we decided to crop the image to just the top of the page. This should (almost) never change, and even though we’re only checking a small part it still requires the page to execute the core html, css and javascript and render correctly.

A standard deployment gives us identical images: 363-364-small


But if the CSS is missing the images differ by more than 75%:364-nocss-small


So we now have much greater confidence that each continuous deployment of our home page is issue free.