Meet our CIO – Captain Agile!

Captain Agile Minecraft Avatar

Nigel’s Minecraft avatar – Captain Agile!

Nigel Dalton, our Chief Information Officer, is a great champion for the engineering and innovation culture at REA. Here’s an interview Nigel recently did for CIO Magazine published in full to give you an insight into his psyche.

What’s your name and title?
Nigel Dalton, REA Group’s Chief Information Officer. I also have a role as an executive coach for our team that runs the Commercial real estate line of business within REA Group in Melbourne.

What’s your professional background? How did you get to where you are today?
Social scientist, not an engineer – but I have a passion for machines that has proven powerful when combined with an innate passion for people. I have worked globally in both IT and business roles (marketing, sales, product and service), and often in the twilight zone between those more traditionally defined professions – in modern digital companies, they are the same thing.

Continue reading

Relaunching the REA tech blog

Hello World! It is the REA tech blog here!

Yes, we know you’ve been thinking that there’s nothing happening here anymore. We hear you say, “You’ve not written, you’ve not tweeted, there were cracks in your home page HTML and we thought you’d left without telling anyone”.

Well we’re excited to announce we have not left. We have just been deeply immersed in our tech caves, indulging ourselves in what we love doing – playing with new technologies, experimenting with methodologies and carving out new user experiences.

Until somebody said the other day, “Hey, we forgot to blog about all this”. So we’re making a renewed effort to share back to the community that we get so much from.

Continue reading

Are you responsive to your users?

Responsive web design has been pretty hot for a few years now. As online products and services jostle for our attention it’s imperative that your digital wares are available whenever and wherever your customers want.

But is responsive the golden hammer? The utopian solution for every single website?

There is no single perfect solution, it ultimately comes down to your particular requirements.

REA has an award winning iOS app. We have an incredible mobile optimised website. We have one of Australia’s most visited websites in our flagship ‘desktop’ site.

We have also embraced responsive on a few sites including our careers portal, the all new retirement living section and the share accommodation sites which now provide a multi screen experience in a single application.

There are a few key things to evaluate when deciding between a dedicated mobile experience versus a responsive, single application.

Continue reading

Loading Google Maps with RequireJS

Recently at REA, we’ve started to use RequireJS on a few projects to help modularize our Javascript. I came across a fairly subtle issue when trying to use the Google Maps JS API with RequireJS which I thought was interesting.

Here’s our situation:

  1. We want to load google maps and let our code treat it as an AMD module.
  2. We have several third-party libraries that depend on Google Maps and aren’t AMD modules; we of course want to treat them as AMD modules so we’ve shimmed them.
  3. We want to use r.js to build a single JS file for production.

Because Google Maps loads asynchronously, we can’t simply shim it. Miller Medeiros’ async plugin seems to be a common (and good) solution to this problem. His blog post describes the technique, but it doesn’t mention a couple of potential gotchas to do with shimmed modules and single-file optimized builds.

With the async plugin, Google Maps is a “real” AMD module from RequireJS’ perspective. As explained in the RequireJS docs, shimmed modules can’t depend on real modules, because RequireJS has no way ensure in the built file that the shimmed module executes after the real modules it depends on.

This implies that in the built version of the code, we’ll need to load the Google Maps API separately, before all our RequireJS modules (what the RequireJS docs refer to as “CDN loading”). But we’ll also need a way to load Google Maps via the async plugin, only in the non-built environment (otherwise we’d get two copies of Google Maps after the build).

So our complete solution consists of a gmaps.js containing:

define(['async!http://maps.google.com/maps/api/js?v=3&sensor=false&client=gme-nsp&channel=new-homes-app'], function () {
return google.maps;
});

as in the blog post, but for the built version of the code, we created a gmaps-stub.js containing:

define(function () {
    return google.maps;
});

and in our build config, loaded the stub instead:

paths: {
    gmaps: 'gmaps-stub'
}

This lets us meet all three of the above requirements.

Bork Night – A Series of Successful Failures!

On Thursday 23rd of February, the Site Operations team held their first Bork Night.  This was an exercise in resilience engineering by introducing faults into our production systems in a controlled manner, and responding to them. The senior engineers designed a number of faults, and challenged the rest of the team to identify and fix them.  This ended up being a lot of fun, and we came away with a good set of learnings which we can apply back into our systems and processes to do better next time.

Bork Night - The team at work fixing failures

The format of the challenge was:

  1. Teams were assembled (we had two teams of two, and one of three).
  2. The senior engineers set up their faults, and introduced them into the production environment.
  3. A fault was assigned to each team, who then had 10 minutes to Evaluatetheir problem.  No attempts to fix were permitted at this time.
  4. The problems were then handed-over from one team to the next, there was 2 minutes given to do this.
  5. The next team then had 10 minutes to continue evaluating the problem, building upon what the first team to look at the problem had learned.
  6. There was one more phase of hand-over and evaluation.  We then let all the teams try to agree with each other what each fault was about.
  7. We then let the teams prioritize the faults, and create new teams however they saw best to fix each problem.   This started the Reaction phase. (Originally we were planning to do rotations of this React phase around each team after 10 minutes, but changed our approach on the fly.)
  8. Later, we had a debrief over pizza and beer.

Trent running the bork night and setting up the failures

The challenges presented were:

  • Duplication of a server’s MAC address, causing it to not respond. Normally, every server on the network has a unique address so that information can be routed to it.  A new virtual machine image was created with a duplicated MAC address.  This confuses the network as it can no longer route packets of information to the correct server, causing anything that depends on that server to start failing.  We picked on a key database server for this one.  Kudos to Gianluca  for discovering the cause of this enabling a quick recovery by shutting down the offending duplicate machine.
  • A failure of a web server to reboot. After deletion of boot configuration, the web server (normally used for realestate.com.au) was made to shut down.  Because the boot information was (deliberately) deleted it would not restart.  The machine had to be fixed using a management interface by copying the boot config from another machine.  Congrats Daniel for speculating correctly the cause of this.
  • Forcing several property listing search server to slow down after becoming I/O bound. This fault did not hamper us as badly as we thought it might.  On several of the FAST query node servers, which normally power the property searches on REA and RCA we caused them to slow down by saturating their ability to read information off the disks.  On one hand this was a reassuring surprise that our systems were resilient enough against this kind of problem, and we later realized better ways we could introduce this sort of fault in future by ensuring the search service did not have anything cached in-memory first.
  • And as an extra bonus complication during the event, we deliberately crashed the Nagios monitoring service, so that the teams had to re-prioritize their incident response partway.  Kudos to Patrick for figuring out the full extent of what was broken and getting Nagios up and running again.

Working through the failures on Bork Night

Several things worked well, some things we can do better.  Our learnings included:

  • Our emergency console and DRAC access to our servers is not smooth, with confusion over what passwords to use, and limitations of single-users at a time.
  • In future, the scenario design should try to avoid overlaps where they affect the same services as other scenarios.
  • Some scenarios need better fault-description information.
  • We need a venue where the monitoring dashboards can be shown, such as naglite.
  • Wifi access continued to plague us.
  • Outsider insight was a fantastic asset.  We had developer participation, and while they might not have had technical familarity with the siteops details, there were great insights coming from them as to where to focus the troubleshooting.  The next Bork Night really needs to welcome non-siteops participation.

Finally, a big thank you to Tammy for arranging the pizza and drinks!

(Reposting Aaron Wigleys post from the realestate.com.au internal community site)

Kanban in Operations – Virtual Card Wall

Three months ago I joined the Site Operations team at realestate.com.au and I was pleased to see that the team were using a card wall for work.

Card Wall

Although the physical card wall proved to be a great place to have stand ups and manage work, it had its problems:

  • We have a distributed team. With operations teams in Italy (casa.it) and Luxembourg (athome.lu), people on devops rotations and working from home on occasion makes it hard for them to participate during stand up.
  • Data associated with cards such as creation timestamps, creators etc. is dependant on users writing it on the cards.
  • Limited external visibility into Site Ops work load. If any one wanted to know what we are currently working on, they would have to head up to the Site Ops area and have a look.

After a discussion with the team, we decided to trail a virtual card wall.

Scope

The trial would run for two weeks, replicating the cards on our physical card wall, with a retrospective and decision to continue at the end.

The trial would not include capturing incidents or deployments and would be light as possible.

Setup

To get the trial up and running as soon as possible, we utilised our existing Jira installation with Greenhopper. The project setup and configuration was kept to a bare minimum.

We created five new issue types, based on the cards on our physical wall – Service Requests, Deployment, Provisioning, Housekeeping and Faults.

Card Types

A week before the trial commenced, we manually imported the cards into Jira and wrote the Jira issue number on the cards. During that week we also duplicated the any new physical cards into Jira. This allowed us to start tracking behaviour before we started the trial.

Card

Our virtual card wall is tactile. Stand ups would now be conducted in front of a Smart Board, which allowed us to interact with Greenhopper using our fingers as the mouse.

The Trial

The trial kicked off on Friday 8th July at 0900, we had our regular stand up with the exception of the new virtual card wall.

Stand up

In addition to Greenhopper, we started a trialling weekly iterations (versions) in Jira – Thursday to Thursday.

Although we weren’t planning the iterations, the option is there for participants to put cards into a few iterations later if the card won’t be actioned for a few weeks.

What works and what doesn’t?

The trial of Greenhopper has been great. The trial has identified a few things that work well, and some that don’t. So what works and what doesn’t?

  • It’s difficult to raise new cards at stand up. It’s a change to our regular process of raising cards at stand up, as we have to create and edit cards before or after stand up. However this has minimised interruptions during stand up, allowing the team to focus on stand up.
  • We are able to raise cards wherever we have access to a web browser and we are not constrained to being in the office.
  • For a few of the stand ups we didn’t have access to the Smart Board and used a projector instead. It felt awkward. Having physical interaction with the card wall definitely enhances the experience. It feels natural for the team to huddle around the card wall, rather than a computer.

What’s next?

So what’s next for the Site Operations Greenhopper integration?

  • First up is to trial the system to the global operations teams with a possible change to our stand up time to a more sensible hour for our European colleagues.
  • Next is to increase transparency into Site Operations current work load. To achieve this we will look into publishing a read-only card wall to the wider company.
  • Start planning work for iterations. We didn’t plan beyond one week during the trail, but we are collecting data on how long cards are taking to cycle through our system.
  • Estimating card size again. Based on  the data collected we should be able to reliably estimate work and compare that to the actual durations.
  • Customise Jira to suit the work flow in Site Operations, including incident management and deployments. This will be an evolutionary process, with an aim to try and keep the work flow as light as possible.
  • The final goal is to investigate integration with other operations systems, such as ZenDesk and Nagios. This would minimise the amount of duplicated for and streamline our work flow.
(cross posted on geekle.id.au)