Using our everyday dev tools for effective Load and Performance testing

Previously at REA we’d had very special tools for Load and Performance testing that were quite expensive, very richly featured but completely disconnected from our every day development tools. The main outcome of this was that we ended up with a couple of engineers who were quite good at L & P testing with our enterprise tools while the majority of engineers found the barriers too great. We have moved to an approach which is far more inclusive and utilises many of the tools our engineers are working with on a daily basis. I’ll talk about how we did this for the most recent project I worked on.

Developing an application simulation model

Our project was for a brand new application so we didn’t have hard numbers that we could use for simulating expected traffic. But we were able to look at similar public facing apps and use these as our basis. It’s important for a number of reasons that we can closely simulate actual production traffic. It will allow us to better tailor/tweak our application stack, have confidence it’ll hold up under peak loads and not require us to over resource it. There are two main metrics we need to gather to create an application simulation model which we can get through our regular tools:

  1. Transaction Rate (requests per min). We need to work out what the transaction rate is for our different web pages during peak load. In the past I’d used New Relic for this, but had at times found it problematic matching individual requests to the controllers shown there. Using Splunk proved far more profitable, but other ways of analyzing your access log files can work nicely. There is something very nice about dealing with the raw requests and being able to query on them.
  2. Concurrency. It’s all good and well knowing how many individual transactions we need to simulate but it’s also important knowing how many concurrent users are required to generate this load. Matching the expected concurrent user levels will mean we accurately simulate things like open sessions on our servers and TCP ports on our network devices. We have end user stats collected for us by Omniture, and using these we could establish our peak hourly unique visitors and average session duration. Using this simple equation we can work out our peak concurrency: hourly unique visitors / (60 minutes / average session duration)

Writing a user friendly L&P script

We used a DSL provided by the ruby-jmeter gem to capture rather succinctly a representative user flow. Throughput percentages again were calculated based on Splunk data. It would be possible to script this up directly in JMeter itself but having the script written in this DSL is beneficial for these reasons:

  1. It goes nicely into source control, unlike JMeter’s JMX (XML) files.
  2. I find it far cleaner and easier to understand than JMeter itself. The DSL presumes various common sense defaults and keeps you away from some of the more arcane elements of JMeter.
  3. I find the L & P knowledge easier to share in this format. It makes it really easy to share and copy snippets of L & P script logic.
  4. Our developers are generally familiar with Ruby, but not necessarily so with JMeter. Being in Ruby the script can also do smart things outside the L & P script itself to setup test data etc.

Here is a slightly cut down version of the script we used.

test do

  defaults domain: 'www.realestate.com.au',
    image_parser: false
  with_user_agent :ie9

  header [
    { name: 'Accept-Encoding', value: 'gzip,deflate,sdch' },
    { name: 'Accept', value: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' }
  ]

  csv_data_set_config filename: 'agents.csv',
    variableNames: 'primary_agent,secondary_agent'

  threads count: 1000, rampup: 600, scheduler: false, continue_forever: true do
    random_timer 60_000, 40_000
    head name: 'Check Primary Agent Is Published', url: '/agent/${primary_agent}', protocol: 'https'

    Throughput percent: 24 do
      head name: 'Check Secondary Agent Is Published', url: '/agent/${secondary_agent}', protocol: 'https'
    end

    Throughput percent: 5 do
      get name: 'Get Agent Profile Page', url: '/agent/${primary_agent}', protocol: 'https' do
        assert substring: 'Share this agent', scope: 'main'
      end
    end
  end

end.flood

Running the script and diagnostics

Generally we’ll generate load from nodes in the cloud and we use a service from flood.io to make this easier for us. The service takes care of provisioning load generators (in our AWS account or theirs) and aggregating results, and it provides pretty handy reporting and statistics. These reports are great at letting us know if something is going wrong, but we’ll generally simultaneously use other tools such as New Relic and CloudWatch to monitor what’s happening on our servers. Here is an example failed L&P test:

Failed Test Run

Failed Test Run

Generally we’d expect the transaction rate to track the ramping concurrency, but in this case the servers couldn’t keep up and as a result the response time blew out. We were able to align these blips with garbage collection events identified by using new relic.

Ultimately we were able to establish that the default JVM thread count used by our Scala application was far too low and limited what our servers should have been able to handle. By changing this and rerunning the test we were able to prove that our servers would handle expected peak load fine. This was by far a better solution than just simply provisioning extra servers.

Reporting and Source Control

We try to keep everything from the load script (and accompanying test data etc) to the test plan/reports in source control. By writing the plan/reports in Markdown these can easily be tracked in git and presented on our GitHub appliance.

When it’s time for a fresh L&P testing session I’ll create a new directory (with date or other meaningful string in the name) and the test plan/report as well as script can adapt over time.

This entry was posted in devops, Engineering, General and tagged , , , , by Andrew Midgley. Bookmark the permalink.

About Andrew Midgley

I have worked at REA since 2006, primarily in the software testing space. I am currently working as the Software Testing Lead for the residential line of business. I try to spend time at all points in the deployment pipeline, looking for ways to improve overall quality and delivery agility. Over the years i've spent much time developing specialist testing skills, particularly in the L&P and automated testing areas. It's been a real pleasure seeing REA, and the wider IT community, progress towards more agile practices. Probably even more exciting has been the evolution of cloud technologies and how it's changing our world.
  • Ross Simpson

    Hey Midge,

    Great post! From the description of your results and the graph you shared, it sounds like your test may be susceptible to coordinated omission[1]. Do you have any thoughts on avoiding it?

    While it sounds like the test was effective in helping diagnose a problem with JVM tuning, the actual numbers that came out of it may be untrustworthy due to the effects of CO.

    Curious to hear your take on it.

    Cheers,
    Ross

    1: http://www.infoq.com/presentations/latency-pitfalls

    • Andrew Midgley

      Great link Ross, very revealing in how important it is to come up with proper requirements and how regular tools/reporting can hide really what is going on.

      As Gil mentions the most popular tools, such as jmeter, will suffer from this phenomenon and indeed so would our test runs. The one thing I will say is that our testing strategy is at least robust enough to pick up the most significant hiccups, and allow us to rectify them, even if the way it characterizes the blip is untrustworthy.

      The main technique we use that helps mitigate this risk is watching the results live as the test is running. A clear giveaway of a hiccup is going to be throughput dropping, which we’ll be able to see in flood.io or from within new relic. We’d also be testing the target web application in our browsers to see what’s going on for the end user. But you are correct that a test run with such hiccups is going to give misleading percentiles. The bigger the hiccup the more misleading. So I guess for me it’s about being vigilant in looking for when throughput drops off. To this point we haven’t been caught out when taking an app to production, but it’s something we definitely need to be aware of.

    • Great link Ross, very revealing in how important it is to come up with proper requirements and how regular tools/reporting can hide really what is going on.

      As Gil mentions the most popular tools, such as jmeter, will suffer from this phenomenon and indeed so would our test runs. The one thing I will say is that our testing strategy is at least robust enough to pick up the most significant hiccups, and allow us to rectify them, even if the way it characterizes the blip is untrustworthy.

      The main technique we use that helps mitigate this risk is watching the results live as the test is running. A clear giveaway of a hiccup is going to be throughput dropping, which we’ll be able to see in flood.io or from within new relic. We’d also be testing the target web application in our browsers to see what’s going on for the end user. But you are correct that a test run with such hiccups is going to give misleading percentiles. The bigger the hiccup the more misleading. So I guess for me it’s about being vigilant in looking for when throughput drops off. To this point we haven’t been caught out when taking an app to production, but it’s something we definitely need to be aware of.

  • Rob Pocklington

    We’re using ‘ruby-jmeter’ at my current workplace for ad-hoc and CI-based performance / load tests. It’s a really nice DSL for JMeter. Combine this with New Relic and you get some really deep metrics and knowledge of your system behaviour.

    Would love to hear if you guys have any best practices around scaling rails apps.