The dreaded CREATE_FAILED message can be all too common a source of frustration when deploying new stacks with CloudFormation. The AWS Console does show you which component in your stack has failed but if you have a heavy reliance upon metadata and userdata components more often than not you’ll only get a wait condition timeout error which gives you no indication at all as to what has actually gone wrong under the covers.
The good news is that there are some tips and tricks out there for troubleshooting CloudFormation stack failures. Some of the tips revolve around CLI switches, some around knowing a bit more about the CF internals and others about knowing where specific scripts live on your typical EC2 instance. This post attempts to document a few approaches to troubleshooting CloudFormation stack errors and help the reader to take a (somewhat..) structured approach to troubleshooting wait condition timeouts.
A few months ago we were catching up with the guys from Puppet Labs here in the REA offices in Melbourne and they asked us this question:
PL: ” Configuration management, what are you doing about it? ”
J: ” Well, that’s a long story…”
We spent the rest of the morning sketching on the whiteboard the evolution of configuration management in REA, and the different stages we went through. A couple of weeks later my colleague David Lutz asked me if I wanted to present at the Melbourne Infrastructure Coders meetup that he co-hosts, and I thought that I could share the story with the wider audience. After receving some positive feedback about the presentation I sent a proposal to linux.conf.au to repeat the talk there at the Sysadmin miniconf. A couple of weeks ago I presented it in Auckland.
If you want to review the journey we’ve been through regarding configuration management at REA, and get a good peak into our Devops culture, check out the attached video:
Also if you are interested in the slides you can find them in Slideshare.
Introducing the latest addition to the Technology Services team – The Walkupinator – a device which simplifies the way we log our tickets from people just dropping by.
The Technology Services team at REA Group is extremely proud of the walk up service we provide to our staff, however logging tickets for our walk ups has become problematic.
After a busy morning on the service desk in the Innovation Hub it’s often hard to recall who we’ve assisted or what the issue was. With over 550 people at REA HQ, things get busy. To solve an issue that has consistently plagued our team, I’ve created a system that utilises existing technology to allow users to simply swipe a card to log a ticket. This system, which we’ve named “The Walkupinator” can save the person manning the service desk up to an hour a day, as well as saving time for our internal colleagues – or as we like to think of them, our customers.
In REA, Amazon Web Services (AWS) is our major development and production environment, and CloudFormation (CF) is one of the best tools we’ve found to manage deployments in AWS. At the time of writing, JSON is still the only template format supported by CloudFormation; but if you search for “Programming in JSON” in your favorite search engine, the results may be very disappointing. Some developers find writing JSON templates hard and have trouble with the data format, especially when the templates are big (you can’t have comments, syntactic strictness, etc).
At REA, we encourage people to explore and find new technologies to solve problems, improve product quality and speed up deployment cycles; this freedom to explore has given us a few choices for addressing this problem.
As previously discussed we’re pretty keen on micro services at REA. Our delivery teams are organised around small, autonomous “squads” that get to choose pretty much any language and technology stack they wish to implement their solutions.
This inevitably leads to a fairly broad church of language use. 🙂