Not Zoolander: The other kind of modelling…

The Behavioural Communications & Analytics, Media & Developer, and IT Delivery teams, working closely with ThoughtWorks, have been working on an exciting project around behavioural targeting. This work was recently presented at the Big Data & Analytics Innovation Summit held in Sydney, where REA Group was proven to be at the forefront of analytics in this space.

Presentation: More Than Meets The Eye

At REA, there is a wealth of data at our disposal around visitor behaviour on site, such as: section(s) visited; time on site/section(s); traffic source; myREA status; return visits; agent interaction; saving OFI times; saving searches; saving properties; getting directions; social engagement; types of suburbs searched; search refinements (price, bedrooms, bathrooms, car spaces, land size); number of properties viewed; property types viewed; attributes of properties viewed; the list goes on…

Where it gets exciting is when we start to think about how we can use this information to predict something about our visitors that we don’t know, be it: demographics; the likelihood of purchasing a particular product or responding to a particular message; the likelihood of obtaining a desired home loan; or, something REA Group is particularly interested in understanding right now is whether they belong to any of our key consumer groups, such as first home buyers, investors, renovators, or vendors. First home buyers are the first cab off the rank to trial this approach. Continue reading

Pomodoro technique as a collaboration tool

We recently started using the Pomodoro technique in our development team. Pomodoro technique is a time management method that specifies working in 25 minute blocks with short breaks in between. A 25 minute block is called a pomodoro.

We have adapted it a little for our purposes. We work as a team in synchronised pomodoros and then have a mini-standup after each. Each week we assign a pomodoro master that is responsible for managing the process – start pomodoros, keep time, count the completed pomodoros, etc. Continue reading

Testing interactions with web services without integration tests in Ruby

Our team decided to move to a micro-service architecture, and we started wondering how we would test all of our integration points with lots of little services without having to rely on integration tests. We felt that testing the interactions between these services quickly become a major headache.

Integration tests typically are slow and brittle, requiring each component to have its own environment to run the tests in. With a micro-service architecture, this becomes even more of a problem. They also have to be ‘all-knowing’ and this makes them difficult to keep from being fragile.

After seeing J. B. Rainsbergers talk “Integrated Tests Are A Scam” we have been thinking on how to get the confidence we need to deploy our software to production without having a tiresome integration test suite that does not give us all the coverage we think it does.

Continue reading

Automated Schema Migration in a MySQL cluster

The PSeeker Database

REA stores listing and agency data for Australia in a MySQL database named PSeeker. This large, complex database plays a central role in REA’s business:

  • About 95 tables in use
  • Largest table has 38 million rows
  • 24 tables have over 1 million rows
  • Near 100% uptime required

PSeeker in production is a loose cluster of ~10 MySQL database servers, that play a variety of roles:

  • A single active, writeable master instance that runs in our primary data center.
  • Several replica slaves in the primary data center used for read-only application load.
  • A replica in our secondary data center for disaster recovery
  • A replica used for investigation by support staff
  • A replica that feeds into our data warehouse

We use MySQL statement-based replication between the master and it’s replicas. This essentially runs the same statements on the replicas as has been run on the master.

Typically, replicas in the same data center as the master will be running behind changes in the master database by less than a second. More distant replicas can be up to a minute behind, depending upon the rate of updates and the bandwidth between the servers.

Manual Schema Management

Schema changes, such as adding columns or new tables, occur as new products are built or legacy systems are upgraded. They originate with the development teams involved in building or upgrading applications, at an average rate of 5-8 per month.

Continue reading

Our journey from Ruby 1.8.7 to 1.9.3

Recently REA made the move to Ruby 1.9.3 from Ruby 1.8.7 for our listing administration tool, a large Rails application used by realestate agents to manage their listings. The endeavour was ultimately successful, but not without significant challenges.

The most notable of these was a nasty segmentation fault. This fault at one stage caused so much pain that we believed that sharing our discoveries was necessary, in the hope that we might be able ease the pain for someone else.

Background

Like all projects, the upgrade had a number of inherent challenges and restrictions. We had to fit it in between major project priorities. We also had to make sure that one of our shared libraries, which describes common domain objects for several other internal Rails applications, maintained compatibility with 1.8.7.  This effectively meant that the build pipeline needed to create artifacts for both 1.8.7 and 1.9.3, and the source had to be compatible with both.

Continue reading

Git as a hiring tool

The greatest resource of any company are the people who work for it. Therefore the process by which you hire the people who work for or with you, is extremely important. At REA Tech we have always been keen on trialling new ways of hiring people. We have played around for years with utilities such as codility.com, sample algorithm questions, and even the one hour pairing session with tech leads and other software developers for a substantial amount of time on a business feature. We in the Media and Developer team decided to try and use Git as a means of sorting out the initial set of candidates before bringing them in for the face to face interview. Git is a wonderful version control system designed by Linus Torvalds and one of its features is the history of changes it keeps. This is the most critical feature of Git we chose to use as a candidate sieve.

The purpose of the initial test is to ensure that the person who has applied is indeed a skilled software developer. Previously we have used Codility for that, but in my opinion, the information that it conveys is limited. You only see a very narrow window of the persons technical expertise and whats worse you are limiting them in terms of time. You want the person to perform to the best of his/her abilities and for that reason, you need to give a bit of leeway in terms of what time can they have to do the test and also, the format of the question. To that regard, we decided to reorder our interview test.

Continue reading