How We Optimise Apache Spark Jobs

Here in Consumer Insights we have been operating Big Data processing jobs using Apache Spark for more than 2 years. Spark empowers our daily batch jobs which extract insights from consumer behaviors from tens of millions of users who visit our sites. This blog covers our usage of Spark and aims to provide some useful insights for optimizing Spark applications based on our experience.

Continue reading

Game of Lambdas

Recently we launched a recommendation engine, which was built using AWS Serverless technology. The journey of implementing this solution turned out to be an interesting one on a number of levels. Since its deployment into production, we thought it would be a good idea to share some of our lessons.

Bucket of Data

Essentially the system transforms a very large dataset into smaller ones that are used to create audiences or data segments which are used for hyper targeted EDMs.

To get from the initial state to the final state, the data is transformed over several stages using 8 Lambdas. Continue reading

Static assets in an eventually consistent webapp deployment

The Problem

Deploying a high traffic website with zero downtime is a challenge – there’s a natural tradeoff between:

  • Performance and cacheability.
  • Getting updates versions of the application live.

The approach you use to manage your static assets plays a big role in this.

This post explains how we dealt with the challenges in our move from the data centre to a multi region highly available cloud-based architecture.

Continue reading

A Journey into Extensible Effects in Scala

A Journey into Extensible Effects in Scala

This article is an introduction to using the Scala Eff library, which is an implementation of Extensible Effects. This library is now under the Typelevel umbrella, which means it integrates well with popular functional programming libraries in Scala like Cats and Monix. I will not touch on the theoretical side of the concept in this post. Instead, I will be using code snippets to describe how you would introduce it to an existing Scala code base. This should hopefully improve extensibility and maintainability of the code. As part of this, I will demonstrate how to build a purely functional program in Scala using concepts such as Either, Writer and Reader.


Continue reading

Lean QA (aka QA Ops)

 

Developers have – with the advent of DevOps – been working more and more in Operations and Infrastructure. Testers however, have not.

Thus far, the testing personnel have been mostly or wholly assigned to application testing work. As SOFTWARE testers, we have only worked on software – and then mostly only on application software.

I pose the questions: What about infrastructure as code? Should that not be explicitly tested?

And: if Testers are meant to be testing the system, why then have they not explicitly been testing the whole system, infrastructure included?

I am going to make a case here for including QA in Operations and Infrastructure, by clarifying how I see the QA fitting in the DevOps world. Continue reading

AWS API Gateway, Lambda and Swagger

TL;DR

With this article you will be able to build

With

Building upon

Continue reading