Meet our CIO – Captain Agile!

Captain Agile Minecraft Avatar

Nigel’s Minecraft avatar – Captain Agile!

Nigel Dalton, our Chief Information Officer, is a great champion for the engineering and innovation culture at REA. Here’s an interview Nigel recently did for CIO Magazine published in full to give you an insight into his psyche.

What’s your name and title?
Nigel Dalton, REA Group’s Chief Information Officer. I also have a role as an executive coach for our team that runs the Commercial real estate line of business within REA Group in Melbourne.

What’s your professional background? How did you get to where you are today?
Social scientist, not an engineer – but I have a passion for machines that has proven powerful when combined with an innate passion for people. I have worked globally in both IT and business roles (marketing, sales, product and service), and often in the twilight zone between those more traditionally defined professions – in modern digital companies, they are the same thing.

Continue reading

Relaunching the REA tech blog

Hello World! It is the REA tech blog here!

Yes, we know you’ve been thinking that there’s nothing happening here anymore. We hear you say, “You’ve not written, you’ve not tweeted, there were cracks in your home page HTML and we thought you’d left without telling anyone”.

Well we’re excited to announce we have not left. We have just been deeply immersed in our tech caves, indulging ourselves in what we love doing – playing with new technologies, experimenting with methodologies and carving out new user experiences.

Until somebody said the other day, “Hey, we forgot to blog about all this”. So we’re making a renewed effort to share back to the community that we get so much from.

Continue reading

Are you responsive to your users?

Responsive web design has been pretty hot for a few years now. As online products and services jostle for our attention it’s imperative that your digital wares are available whenever and wherever your customers want.

But is responsive the golden hammer? The utopian solution for every single website?

There is no single perfect solution, it ultimately comes down to your particular requirements.

REA has an award winning iOS app. We have an incredible mobile optimised website. We have one of Australia’s most visited websites in our flagship ‘desktop’ site.

We have also embraced responsive on a few sites including our careers portal, the all new retirement living section and the share accommodation sites which now provide a multi screen experience in a single application.

There are a few key things to evaluate when deciding between a dedicated mobile experience versus a responsive, single application.

Continue reading

Loading Google Maps with RequireJS

Recently at REA, we’ve started to use RequireJS on a few projects to help modularize our Javascript. I came across a fairly subtle issue when trying to use the Google Maps JS API with RequireJS which I thought was interesting.

Here’s our situation:

  1. We want to load google maps and let our code treat it as an AMD module.
  2. We have several third-party libraries that depend on Google Maps and aren’t AMD modules; we of course want to treat them as AMD modules so we’ve shimmed them.
  3. We want to use r.js to build a single JS file for production.

Because Google Maps loads asynchronously, we can’t simply shim it. Miller Medeiros’ async plugin seems to be a common (and good) solution to this problem. His blog post describes the technique, but it doesn’t mention a couple of potential gotchas to do with shimmed modules and single-file optimized builds.

With the async plugin, Google Maps is a “real” AMD module from RequireJS’ perspective. As explained in the RequireJS docs, shimmed modules can’t depend on real modules, because RequireJS has no way ensure in the built file that the shimmed module executes after the real modules it depends on.

This implies that in the built version of the code, we’ll need to load the Google Maps API separately, before all our RequireJS modules (what the RequireJS docs refer to as “CDN loading”). But we’ll also need a way to load Google Maps via the async plugin, only in the non-built environment (otherwise we’d get two copies of Google Maps after the build).

So our complete solution consists of a gmaps.js containing:

define(['async!http://maps.google.com/maps/api/js?v=3&sensor=false&client=gme-nsp&channel=new-homes-app'], function () {
return google.maps;
});

as in the blog post, but for the built version of the code, we created a gmaps-stub.js containing:

define(function () {
    return google.maps;
});

and in our build config, loaded the stub instead:

paths: {
    gmaps: 'gmaps-stub'
}

This lets us meet all three of the above requirements.

Bork Night – A Series of Successful Failures!

On Thursday 23rd of February, the Site Operations team held their first Bork Night.  This was an exercise in resilience engineering by introducing faults into our production systems in a controlled manner, and responding to them. The senior engineers designed a number of faults, and challenged the rest of the team to identify and fix them.  This ended up being a lot of fun, and we came away with a good set of learnings which we can apply back into our systems and processes to do better next time.

Bork Night - The team at work fixing failures

The format of the challenge was:

  1. Teams were assembled (we had two teams of two, and one of three).
  2. The senior engineers set up their faults, and introduced them into the production environment.
  3. A fault was assigned to each team, who then had 10 minutes to Evaluatetheir problem.  No attempts to fix were permitted at this time.
  4. The problems were then handed-over from one team to the next, there was 2 minutes given to do this.
  5. The next team then had 10 minutes to continue evaluating the problem, building upon what the first team to look at the problem had learned.
  6. There was one more phase of hand-over and evaluation.  We then let all the teams try to agree with each other what each fault was about.
  7. We then let the teams prioritize the faults, and create new teams however they saw best to fix each problem.   This started the Reaction phase. (Originally we were planning to do rotations of this React phase around each team after 10 minutes, but changed our approach on the fly.)
  8. Later, we had a debrief over pizza and beer.

Trent running the bork night and setting up the failures

The challenges presented were:

  • Duplication of a server’s MAC address, causing it to not respond. Normally, every server on the network has a unique address so that information can be routed to it.  A new virtual machine image was created with a duplicated MAC address.  This confuses the network as it can no longer route packets of information to the correct server, causing anything that depends on that server to start failing.  We picked on a key database server for this one.  Kudos to Gianluca  for discovering the cause of this enabling a quick recovery by shutting down the offending duplicate machine.
  • A failure of a web server to reboot. After deletion of boot configuration, the web server (normally used for realestate.com.au) was made to shut down.  Because the boot information was (deliberately) deleted it would not restart.  The machine had to be fixed using a management interface by copying the boot config from another machine.  Congrats Daniel for speculating correctly the cause of this.
  • Forcing several property listing search server to slow down after becoming I/O bound. This fault did not hamper us as badly as we thought it might.  On several of the FAST query node servers, which normally power the property searches on REA and RCA we caused them to slow down by saturating their ability to read information off the disks.  On one hand this was a reassuring surprise that our systems were resilient enough against this kind of problem, and we later realized better ways we could introduce this sort of fault in future by ensuring the search service did not have anything cached in-memory first.
  • And as an extra bonus complication during the event, we deliberately crashed the Nagios monitoring service, so that the teams had to re-prioritize their incident response partway.  Kudos to Patrick for figuring out the full extent of what was broken and getting Nagios up and running again.

Working through the failures on Bork Night

Several things worked well, some things we can do better.  Our learnings included:

  • Our emergency console and DRAC access to our servers is not smooth, with confusion over what passwords to use, and limitations of single-users at a time.
  • In future, the scenario design should try to avoid overlaps where they affect the same services as other scenarios.
  • Some scenarios need better fault-description information.
  • We need a venue where the monitoring dashboards can be shown, such as naglite.
  • Wifi access continued to plague us.
  • Outsider insight was a fantastic asset.  We had developer participation, and while they might not have had technical familarity with the siteops details, there were great insights coming from them as to where to focus the troubleshooting.  The next Bork Night really needs to welcome non-siteops participation.

Finally, a big thank you to Tammy for arranging the pizza and drinks!

(Reposting Aaron Wigleys post from the realestate.com.au internal community site)