At REA we happily use a variety of programming languages. Teams are given the freedom to choose a fitting language for a given project. Mostly this ends up being one of Ruby, Java, or Scala. However, there are some languages that we as developers and ops people get excited about, but the viability as a mainstream REA language hasn't yet been established.
For me personally, Haskell is what I code on the weekends and I've been looking for a way to shoehorn it into my regular work 😉
Recently I learnt that REA does in fact have some Haskell in prod. Who owns it? What does it do? No one will ever know. However as the story goes, it was automating a manual task and as such it was simply a value add that became a useful tool, and avoided questions like 'just what is a monad anyway!?'.
Jim Gaylard and I, Haskell acolytes, attempted something similar on hack day.
The REA customer experience team (CXP) have a process they do each morning, Monday to Saturday. It is a lovely manual process involving our core listings database, Excel macros, manual massaging of data, secure FTP, and goat sacrifices. Ultimately this allows the agents to get emails about their failed listing uploads. If the process is not successfully completed by 9:30am no emails go out for that day.
The Proposed Solution
- Scheduled via some mechanism.
- Connect to the database and run a query.
- Massage the data.
- Produce a CSV file.
- Email (via Amazon SES) the results to CXP.
- FTP the results to a third party.
Challenges: Adapting To Our Patterns
Docker, Shipper (an REA internal tool for deploying apps to AWS) and Buildkite (our CI tool of choice). Ultimately this wasn't too challenging. I had previously gotten a Docker solution for Haskell going using the official Docker images published by Haskell Stack. In prod we simply use a cut down Ubuntu container with the compiled binary loaded in.
Shipper was trivial thanks to it being agnostic to what lives inside your Docker container. Buildkite again was relatively uninteresting. The main workaround required was to precompile our dependencies and bake them into a base Docker image because it was taking ~60 minutes on the small agents.
This integration proved the most challenging. We chose the amazonka library over the older AWS for Haskell. The amazonka library is a more ambitious solution as it is largely auto generated from the AWS service definitions. This gives it comprehensive and up to date coverage of AWS services. AWS for Haskell is a more manual effort. We wanted to try out amazonka to see if it was the dream it portrayed.
Beyond a lot of type Tetris, it wasn't too difficult writing the code for this. This included having to authenticate with AWS in multiple environments (host, development Docker, EC2). The one hitch was in EC2 the automatic authenticate in any environment,
Discover data constructor, wasn't working. It turns out it has issues resolving the metadata endpoint with certain VPC settings. Adding a hosts entry
169.254.169.254 instance-data in the container fixes this issue. :/
Challenges: A Lot Of IO Logic
When you generate the standard stack template it creates an
App and a
Lib package. My general assumption has been that IO type logic goes in App and Lib has your pure code. This works nicely for most apps which tend to have a small amount of IO and a large amount of business logic.
In this case we had a large amount of IO and a small amount of business logic. Without thinking too hard about it, this led to a fat
Main module with a convoluted main entry point.
The solution was to split out
Main into many modules. We decided rather than make these pure which forces part of their logic into
Main, just give everything a type IO to allow them to perform IO as necessary. The result was quite a clean separation of concerns and a simple main entry point.
See the before and after.
For the pure code it was easy to write tests. We used a mix of BDD and property based testing. This was all well and good, however ultimately we didn't come up with a good solution for testing the bulk of the IO heavy code, either with unit or integration tests.
I think the solution for this is to use a higher level abstraction for the IO related code. Either Monad Transformers or some variant of Free Monads. I think Free would have especially good here as it seems suited to IO heavy code.
Having said that, it probably wouldn't be worth going down this path. Given how IO heavy it is, I suspect it would be better bang for buck to spend the time making full integration tests possible with a dev database and AWS SES Docker containers. This tests all the code and a lot of the infrastructure in the one go.
We considered Haskell 'cron' library, but that involves running an instance all day for a five second job.
In the end we went with auto scaling group abuse. The instances spin up at 9am every morning, which needs to be updated when daylight savings occurs. I think this is less a Haskell problem than a generally unsolved problem at REA. We are working on a sustainable solution for orchestrating scheduled jobs.
Was coding in Haskell like getting a back massage by a space unicorn?
Well.. sort of.
The solution as mentioned required a lot of integration code which would have been trivial in a more popular language. Added to that, this was the first real thing Jim or I had done in Haskell so there was a learning curve to push through. Plus there was lots of boring but still challenging Docker/Shipper/Buildkite bits to get done as well.
However, it was still incredibly enjoyable using the language which is just so bloody lovely. It's really nice learning more about it knowing there is so much power to it, and lots of juicy new ways of coding to be mastered. Personally I like being a bit over my head, winging it, but learning a lot as I go, which is what it feels like working in Haskell.
As of this week the end to end process is fully automated. CXP are overjoyed and the number of fists through walls has decreased by 27%!