To Kill a Mockingtest

Please don’t use mocks or stubs in tests.  While they are seemingly ubiquitous in enterprise development, they have serious drawbacks, and typically mask easily fixable deficiencies in the underlying code.

Even their most ardent defenders concede that mocks and stubs have flaws. These include:

Dependence on fragile implementation details

Mocks and stubs require intimate knowledge of how code interacts with other modules.  Even if the implementation is correctly refactored without altering public contracts, these tests will tend to break, and draw your attention away from more productive tasks.

Testing incidental properties with no bearing on correctness

What is the point of this code?  This is an essential question to ask, in order to understand it.  Tests have a story to tell here, and mocks invariably tell the wrong one.  Is the point of makeCoffee() that we made a coffee, or that we opened the fridge to get the milk?  When we payShopkeeper(), do we care that we completed a transaction, or that we rummaged though our wallet for change?  When mocking tests fail, the poor maintainer is left to reconstruct the real intent from a trail of indirect clues and anecdotes.

Web of lies

It is good practice to write data structures that are correct-by-construction; any constructor or sequence of method calls is guaranteed to leave the data in a meaningful state.  Stubs introduce test-only fictions that are stripped of any of the safety latches and guarantees that may have been built in; they introduce fresh sources of error that are not present in the codebase.  There is no value in detecting any failure that arises in such a way.

As time goes on, lies beget more lies.  It is not unusual for a stubbed input in one place to result in another here, and another there; the fiction leaks and spreads into some kind of evil facsimile of the original code, but with more bulk, complexity and defects.

Common Scenarios

Sometimes mocking and stubbing appears to be the most appropriate way to test a class, given the buttons and levers it affords.  Usually though, this tells us that there is a better way to write it, with better separation of concerns, modularity and reusability.  Let’s look at some common examples:

Exhibit A: Fat dependencies

//-------- Code -----------
case class Config(numBees: Int, numSharks: Int, /* 50 other things */)

class VillainHideout(config: Config) {
  private val bees: Seq[Bee] = generateBees(config.numBees)
  private val sharks: Seq[Shark] = generateSharks(config.numSharks)

  def unleashTheBeesAndSharks(): BeesAndSharks

//-------- Test -----------
def testVillainHideout = {
  val config = mock[Config]

  val hideout = new VillainHideout(config)
  hideout.unleashTheBeesAndSharks() should equal (expectedResult)

Here we have a class VillainHideout, that depends on Config, which is a sprawling data structure of 52 fields.  We only care about two: the number of bees and sharks to release.  Because of this, Config is stubbed; it is too hard to construct otherwise.  Apart from the ugliness of the sock-puppetry, we can already see some avoidable problems:

  • VillainHideout is less useful and reusable than it might be, because it knows too much about things that are of no use to it.
  • We have introduced a new source of error: how can we know what an acceptable state for Config is?  By re-implementing it piecemeal, we are undermining efforts that it might have taken to establish guarantees, and arrogating construction knowledge that has no place in a foreign test.

There are several ways to address this.  We can ask of Config: “Who could possibly want to know all of the information you hold?”  If the answer is “nobody”, Config could be broken into several smaller structures of more specific interest: VillainConfig, HeroConfig, DamselInDistressConfig and the like.

We can also ask of VillainHideout: “What do you need to do your job?  Do you really care where it comes from?” The answer to the first question is simply the number of bees and sharks; the answer to the second is probably “no”.  None of the interesting functionality in the class that we might want to test would depend on the specific source of the configuration items. The top level of the application might care, but that is a matter of wiring, rather than the nefarious misdeeds in VillainHideout that we care about.

Here is one way to improve it:

//-------- App wiring somewhere else -----------
val theAppConfig: Config = readFromFile(appConfigFile)
val theVillainHideout = new VillainHideout(theAppConfig.numBees, theAppConfig.numSharks)

//-------- Code -----------
class VillainHideout(numBees: Int, numSharks: Int) {
  private val bees: Seq[Bee] = generateBees(numBees)
  private val sharks: Seq[Shark] = generateSharks(numSharks)

  def unleashTheBeesAndSharks(): BeesAndSharks

//-------- Test -----------
def testVillainHideout = {
  val hideout = new VillainHideout(33, 44)
  hideout.unleashTheBeesAndSharks() should equal (expectedResult)

The code is cleaner, simpler, has less dependencies and is more reusable.  The separate concern of application wiring has been moved elsewhere; the test is shorter, clearer, and has no mocks or stubs.  It’s all win so far.

Exhibit B: Mutable domain

//-------- Code -----------
class CreditCard {
  private var cents: Int = 0  

  def amount() = cents
  def overdrawn(): Boolean = cents <= 0

  def charge(amt: Int): Unit = 
    cents -= amt

case class Item(name: String, price: Int)

class ShoppingBasket {
  private val items = mutable.Seq[Item]()

  def addItem(item: Item): Unit = 
    items += item

  def removeItem(item: Item): Unit = 
    items -= item

  def price() =

class Customer(basketFactory: () => ShoppingBasket) {
  private var basket = basketFactory()
  private var paid = false

  def hasPaid() = paid

  def addItem(item: Item): Unit = 

  def removeItem(item: Item): Unit = 

  def pay(card: CreditCard): Unit = {
    if (!paid && !card.overdrawn()) {
      paid = true

//-------- Test -----------
class CustomerTest {
  def testCustomerPayment = {
    val basket = mock[Basket]

    val card = mock[Card]

    val customer = new Customer(() => basket)


This is a typical OO domain model; we have classes that are metaphors for real-world objects, that change in-place. Obeying the “Tell, Don’t Ask” principle, there are a handful of actions that drive the behaviour, with most state hidden.  A Customer holds a ShoppingBasket, can add Items to it, and can pay() for it with a CreditCard.

In our test, we (correctly) assess that we cannot locally reason about Customer while it talks to mutable collaborators, so we stub them out, providing fixed input from the basket, and detecting the charge() action on the card.  In order to stub the ShoppingBasket, we can’t allow Customer to create its own, so we pass in a factory for that extra layer of indirection.

I’m reminded of a quote I read recently about testing:

“(Testing is) to create a tiny universe where the software exists to do one thing and do it well”.

The obvious insight is damning: why wouldn’t you write the software like this in the first place?

Mutable state is far more complex than the alternative.  An immutable data structure is simply a value; like an integer, or a point. A mutable data structure, on the other hand, hold many values over time; it intrinsically represents an identity that strings together this series of facts. It is far harder to reason about; we are irretrievably entangled with the passage of time, and we cannot use equational reasoning or substitute calculations with their results. Encapsulation cannot save us; these properties are transitive, and will virally leak into anything that interacts with the mutable structure.

We should also be suspicious of appeals to familiarity, and especially appeals of similarity to the “real world”.  Familiarity is no friend of simplicity. The world we experience is shackled to the arrow of passing time, and is limited by what can squeeze into three dimensions, and built with unreliable or expensive materials.  Software can effortlessly cast these aside; we can do better.

The interesting behaviour in this example can easily be represented by pure functions and immutable values:

//-------- Code -----------
case class CreditCard(amount: Int) {
  def charge(amt: Int) = CreditCard(amount - amt)
  def overdrawn: Boolean = amount < 0

case class Item(name: String, price: Int)

case class ShoppingBasket(items: Seq[Item]) {
  def addItem(item: Item) = ShoppingBasket(items :+ item)
  def removeItem(item: Item) = ShoppingBasket(items - item)
  def price =

object Customer {
  val NewCustomer = Customer(ShoppingBasket(Nil), false)

case class Customer(basket: ShoppingBasket, paid: Boolean) {
  def addItem(item: Item) = Customer(basket.addItem(item), paid)
  def removeItem(item: Item) = Customer(basket.removeItem(item), paid)

  def pay(card: CreditCard): (Customer, CreditCard) = {
    if (!paid && !card.overdrawn) (Customer(basket, true), card.charge(basket.price))
    else (this, card)

//-------- Test -----------
class CustomerTest {
  def testCustomerPayment = {

    val customer = NewCustomer addItem Item("Banana", 555)
    val (customer2, card2) = customer pay CreditCard(655)

    customer2.paid should be (true)
    card2.amount should equal (100)

The code is now already composed of things that “do one thing and do it well”.  There are no moving parts; everything is immutable. There are no “identities” that vary over time. The factory is gone. The methods are all pure functions, representing a straight mapping from inputs to outputs.  The test, naturally, provides an input, and checks an output.  This is clearly a better assessment of correctness than the earlier mock, which beats around the bush and sniffs for evidence.

“But then it’s an integration test!”

This shouldn’t concern us when we are dealing with pure functions. Compose two functions, and you still have a function. Compose ten, or a hundred, and it is still a function mapping values to values. In an important sense, the code is no more complex.

By contrast, by chaining even two mutable collaborators together, we have lost the ability to easily reason about the system; the answer to any question we might ask is “it depends”.  It depends on when we ask; it depends on where they’ve been; it may even depend on the order in which we ask them.

You can see why OO test-writers are tempted to use intrusive shims to stem the bloodflow of complexity.  Not only do they fail, but the problem they are trying to solve need not even exist.

A further example

If you’re still not sure, consider the following:

def add(a: Int, b: Int): Int = a + b

// For positive ints...
def multiply(a: Int, b: Int): Int = {
  if (a == 0) 0
  else add(b, multiply(a-1, b))

Do you think we should test multiply() by checking, say, multiply(3, 9) == 27, or mocking the add() call and seeing if it gets called 3 times?  Should we stub the Ints we pass in?

Mocking and stubbing is plainly ridiculous here, but not because the example is so simple.  Int is a value, but essentially any well-defined immutable structure is a value too.  Mocking the add() call provides no value whatsoever, not because it is trivial, but because it is testing something that has no bearing on a correct result.  The code takes responsibility for its own correctness, and can pick whatever tools it pleases; the test has no business peeking further than that.

Exhibit C: Side effects and I/O

//-------- Code -----------
case class EmailAddress(address: String)
case class Email(content: String)
case class Customer(name: String, emailAddress: EmailAddress, unsubscribed: Boolean)

trait EmailSender {
  def send(addr: EmailAddress, email: Email): Unit

class SpecialOffersService(sender: EmailSender) {

  def sendSpecialOffersEmail(cust: Customer): Unit = {
    if (!cust.unsubscribed) {
      val content = s"Hi, ${}! Boy, have we got a deal for you!"
      sender.send(cust.emailAddress, Email(content))


//-------- Test -----------
class SpecialOffersServiceTest {
  def testSpecialOffersSend() {
    val sender = mock[EmailSender]
    val customer = mock[Customer]

    val service = new SpecialOffersService(sender)

      Email("Hi, Bob! Boy, have we got a deal for you!"))

Here we have a service that performs some logic and sends an email.  We don’t want to actually send emails in our test, so we mock the sending mechanism.

This is similar to the previous example in some ways; our code contains side effects, and we cannot treat it as a pure function.  However, this time we cannot make the effect vanish in a puff of smoke; the decision to send the email has to happen one way or another.  If we mock the call though, we have still lost the benefits of purity and referential transparency; mutable state might get all the attention, but other side effects are just as bad.

Let’s consider: how much of this scenario can we represent without the side effect at all?  Checking unsubscribe status is pure, generating the email content is pure, and importantly, the decision to send the email is pure.  Perhaps we can rewrite it as a pure function of Customer => Option[SendEmail], and let something else pull the trigger?

//-------- Code -----------

sealed trait AppEffect
case class SendEmail(addr: EmailAddress, content: Email) extends AppEffect
case class Log(level: LogLevel, message: String) extends AppEffect
// etc

object SpecialOffersService {
  def generateContent(custName: String): Email = 
    s"Hi, ${custName}! Boy, have we got a deal for you!"

  def prepareSpecialOffersEmail(cust: Customer): Option[SendEmail] = {
    if (!cust.unsubscribed) {
      val email = Email(generateContent(cust))
      Some(SendEmailDecision(cust.emailAddress, email)
    else None  

class EffectInterpreter(logger: Logger, sender: EmailSender) {
  def apply(effect: AppEffect): Unit = effect match {
    case SendEmail(addr, content) => sender.send(addr, content)
    case Log(level, msg) => logger.log(level, msg)

//-------- App wiring somewhere else -----------
class App(interpret: EffectInterpreter) {

  def sendSpecialOffersEmail(cust: Customer): Unit = 

//-------- Test -----------
class SpecialOffersServiceTest {
  def testPrepareSpecialOffersEmail() {
    val customer = Customer("Bob", EmailAddress(""), false)

    val result = SpecialOffersService.prepareSpecialOffersEmail(customer)

    result should equal (Some(SendEmail(
      Email("Hi, Bob! Boy, have we got a deal for you!"))))

We have called out the EmailSender call as a new data type, representing the decision to send; all of the interesting behaviour is now in a pure function.  We didn’t need the stubs in the first place of course; the immutable Customer can be considered a plain value, rather than some kind of foreign collaborator.  The mocks have all but evaporated.

We have pushed the actual effect out into an interpreter.  Do we want to use mocks to test this?  Maybe; but perhaps it’s not so interesting to test anymore.  It’s often easier to check that last inch of inter-system I/O manually.

Lifting mocked calls into messages

Mocking a method call makes a statement that the purpose of the code under test is to call the next thing.  In a sense, the method call is the logical output of the function.  When this is the case, we can always represent the call as a returned message object, like we did above.  This has several advantages:

  • The message is a true output of your function, and can be tested more easily.
  • The code need not know about the “next thing”; application wiring can be handled separately.
  • The functionality is now far more reusable and recombinable; anything can consume the message and continue the flow.  In the mocked version, only a single specific type is allowed to consume the message, and how to propagate it is hard-wired.

It seems far more often the case, though, that message-sending is not the intent, and the tests would be better served by simply looking at the broader input and output.

Controlling side effects is the real battle

We have seen that mocks and stubs have to compensate for serious flaws before they can hope to provide value:

  • They are linked to fragile implementation details, and will constantly break under routine refactoring.
  • What they test is almost always beside the point of what’s actually required.
  • They bypass in-built guarantees and safeguards, introducing new, spurious errors.

The real battle for clean tests, and code for that matter, is about controlling side-effects.  Mocks and stubs attempt to provide some level of testing in the face of entangled effects in output or input, respectively.  However, in practice, almost every usage can be obviated by simply writing better code; the entanglement is the problem, and mocks only allow the developer to ignore it.   Most common usages fall under the categories described here:

  • Huge dependencies that are too hard to create, which can be broken down or pushed further out.
  • Fragile mutable domain modelling that can be made simpler and more robust, by replacing with equivalent immutable values and functions.
  • Essential side effects or I/O coupled with interesting pure behaviour, that can be separated and pushed out.
  • Code that provides concrete results entwined with application wiring and message sending.

How many of the mocks in your codebase fit into these categories?  I’m interested to hear if they don’t — but otherwise, given some spare afternoons, perhaps you can improve the code and wipe them out!