Over the last year at realestate.com.au (REA), I worked on two integration projects that involved synchronising data between large, third party applications. We implemented the synchronisation functionality using microservices. Our team, along with many others at REA, chose to use a microservice architecture to avoid the problems associated with the “tightly coupled monolith” anti-pattern, and make services that are easy to maintain, reuse and even rewrite.
Our design used microservices in 3 different roles:
- Stable interfaces – in front of each application we put a service that exposed a RESTful API for the underlying domain objects. This minimised the amount of coupling between the internals of the application and the other services.
- Event feeds – each “change” to the domain objects that we cared about within the third party applications was exposed by an event feed service.
- Synchronisers – the “sync” services ran at regular intervals, reading from the event feeds, then using the stable interfaces to make the appropriate data updates.
Things that worked well
- Using a template project to get started
At REA we maintain a template project called “Stencil” which has the bare bones of a microservice, with optional features such as a database connection or a triggered task. It is immediately deployable, so a simple new service can be created and deployed within a few hours.
- Making our services resilient
We started “lean” with synchronous tasks that were triggered by hitting an endpoint on the service. One of the down sides of splitting out code into separate services is that there is an increased likelihood of errors due to network gremlins, timeouts and third party systems going down. Failure is always an option. In our synchronous, single-try world, the number of unnecessary alerts which required manual intervention just to kick of the process again was a drain on our time. So, we changed all our services to use background jobs with retries, and revelled in the relative calm.
- Making calls idempotent
Given that we had built in retries, each part of our retry-able code needed to be idempotent so that a retry would not corrupt our data. Using PUT and PATCH is great for this, but sometimes we did have to do a GET and make a check before making the next request.
- Using consumer driven contract testing
Testing data flows involving 4 microservices, two third party applications and triggered jobs using traditional integration tests would have been a nightmare. We used Pact, an open source “consumer driven contracts” gem developed by one of REA’s own teams, to test the interactions between our services. This gave us confidence to deploy to production knowing our services would talk to each other correctly, without the overhead of integration test maintenance.
- Where possible, exposing meaningful business events, not raw data changes
It took a while to really grasp the meaning of this, however once we “got” it, it made sense. It is probably easiest to explain by example. One domain object had a “probability” percentage field that could be changed directly by a user. Instead of exposing “probability field changed” as an event, we exposed “escalations”. This meant we were hiding the actual implementation of how the “rise in probability” was executed in the system, and not asking every other system that inspected the event feed to have to re-implement the logic of “new value of probability is greater than old value, therefore its likelihood has increased”.
- Automating all the things
We are lucky enough to be able to use AWS for all our development, test and production environments. We used continuous deployment to our development environment, and we had a script to deploy the entire suite of microservices to the test environment at one click. This made the workflow painless, and helped counteract the overhead of having so many codebases.
- Using HAL and the HAL Browser
HAL is a lightweight JSON (and XML if you are so inclined) standard for exposing and navigating links between HTTP resources. The “Stencil” app comes with Mike Kelly’s HAL browser already included (this is just an HTML page that lets you navigate through the HAL responses like a web browser). As well as the resources for the business functionality, we created simple endpoints that exposed debugging and diagnostic data such as “status of connection to dependencies” or “last processed event” and included links in the index resource so that finding information about the state of the service was trivially easy, even for someone who didn’t have much prior knowledge of the application.
Things that didn’t work well
- Coming up with a way to easily share code between projects
Our first microservices implementation used the strict rule of “one service has one endpoint”. This made services that could be easily deployed separately without affecting or holding up other development work. However, it also increased the maintenance overhead, as each new service was made from a copy of the previous one and then modified to suit. When a problem was found in the design of one of them, or we wanted to add a new feature, then we had to go and change the same code (with just enough variations to be annoying) in each of the other projects. The common code was more structural than business logic (eg. Rakefile, config.ru, configuration, logging), and it was not written in a way that made it easy to put in a gem for sharing.
Things we have questions about
- What is the “right size” for a microservice?
Soon after having completed the first microservices integration project, we had an opportunity to do a second. This time, instead of making many different “event feed” services that each exposed a single type of event, we made one event service that had an endpoint for each different type of event. Some might argue that we were stretching the definition of “microservice”, however, there was still at tight cohesion between the endpoints, as they were all exposing events for objects in the same underlying aggregate root. For us, the payoff of having fewer codebases and less code to maintain made the trade-off worth it, as the turnaround for a exposing a new type of event was a matter of hours, instead of a matter of days.
I suspect that the “right size” is going to vary between projects, languages, companies and developers. I’m actually glad we made our services what I now consider to be “too small” in the first project, just as an experiment to work out where the line was for us. I now think of the “micro” as pertaining more to the “purpose” of a service than the size. Perhaps “single purpose service” would be a better term – but it just ain’t as catchy!