Nomadic cattle rustler and inventor of the electric lasso.
Company Website
Follow me on twitter
Contact me for frontend answers.

Automation testing is broke and here is why

November 10, 2019

I originally wrote this post at logrocket.

Before I start, I want to point out that I am not referring to one particular project or any particular individuals. I believe these problems are industry wide having spoken to others. Nearly all the automation testers I have worked with have busted a gut to make this faulty machine work. I am hating the game, not the player.

If I am not mistaken, I appear to have awoken in an alternate reality where vast sums of money, time and resources are allocated to both the writing and the continual maintenance of end-to-end tests. We have a new breed of a developer known as the automation tester whose primary reason for being is not only to find bugs but also to write a regression test to negate the need to do a re-run of the initial manual testing. Automated regression tests sound great in theory, and anybody starting a new job could not fail to be impressed when finding out that every story in every sprint would have an accompanying end-to-end test written in selenium webdriver.

I have heard numerous tales of end-to-end tests usually written in selenium webdriver getting deleted due to their brittle nature. Test automation seems to only result in CI build sabotage with non-deterministic tests making change and progression next to impossible. We have test automation engineers too busy or unwilling to carry out manual tests and instead stoking the flames of hell with these underperforming time and resource grasping non-deterministic tests. Tests that re-run on failure is standard and even provided by some test runners. Some of the most challenging code to write is being written and maintained by the least experienced developers. Test code does not have the same spotlight of scrutiny shone on it. We never stop to ask ourselves whether this insane effort is worth it. We don’t track metrics, and we only ever add more tests.

It is like a bizarre version of Ggroundhog Dday only it is a broken build and not a new day that starts the same series of events. I am now going to list the repeating problems that I see on a project laden with the burden of carrying a massive end-to-end test suite.

Wrong expectations that automated tests will find new defects

At this time of writing, nearly all tests assert their expectations on a fixed set of inputs. Below is a simple login feature file:

Feature: Login Action

Scenario: Successful Login with Valid Credentials

  Given User is on Home Page
  When User Navigate to LogIn Page
  And User enters UserName and Password
  Then Message displayed Login Successfully

The feature file executes the following Java code in what is known as a step definition:

@When("^User enters UserName and Password$")
  public void user_enters_UserName_and_Password() throws Throwable {
  driver.findElement(By.id("log")).sendKeys("testuser_1"); 
  driver.findElement(By.id("pwd")).sendKeys("Test@123");
  driver.findElement(By.id("login")).click();
 }

This test will only ever find bugs if this finite set of inputs triggers the bug. A new user entering other characters other than testuser_1 and Test@123 won’t be caught by this end-to-end test. We can increase the number of inputs by using a cucumber table:

Given I open Facebook URL
 And fill up the new account form with the following data 
 | First Name | Last Name | Phone No | Password | DOB Day | DOB Month | DOB Year | Gender |
 | Test FN | Test LN | 0123123123 | Pass1234 | 01 | Jan | 1990 | Male |

The most likely time that these tests will find bugs is the first time they run. While the above tests or tests still exist, we will have to maintain these tests. If they use selenium webdriver, then we might run into latency problems on our continuous integration pipeline.

These tests can be pushed down the test pyramid onto the unit tests or integration tests.

Don’t drive all testing through the user interface

I am not saying we should do away with end-to-end tests, but if we want to avoid the maintenance of these often brittle tests, then we should only test the happy path. I want a smoke test that lets me know the most crucial functionality is working. Exceptional paths should be handled at a more granular level in the developer unit tests or integration tests.

The most common reason for a bug in the login example is user input. We should not be spinning up selenium to test user input. We can write inexpensive unit tests to check user input that does not require the maintenance overhead of an end-to-end test. We still need one end-to-end test for the happy path just to check it all hangs together, but we don’t need end-to-end tests for the exceptional paths.

Testing can be and should be broken up with most of the burden carried by unit tests and integration tests.

Has everyone forgotten the test pyramid?

Selenium webdriver is not fit for purpose

I have blogged about this previously in my post cypress the selenium killer. It is nearly impossible not to write non-deterministic selenium tests because you have to wait for the DOM and the four corners of the cosmos to be perfectly aligned to run your tests.

If you are testing a static webpage with no dynamic content, then selenium is excellent. If however, your website has one or more of these conditions, then you are going to have to contend with flakey or non-deterministic tests:

  • reads and writes from a database
  • JjavaSscript/ajax is used to update the page dynamically,
  • (JjavaAscript/CSS) is loaded from a remote server,
  • CSScss or JjavaSscript is used for animations
  • JavaSscript or a framework such as Rreact/Aangular/Vvue renders the HTML

An automation tester faced with any of the above conditions will litter their tests with a series of waits, polling waits, check for ajax done, check for javascript done, check for animations done, etc.

The tests turn into an absolute mess and a complete maintenance nightmare. Before you know it, you have test code like this:

click(selector) {
    const el = this.$(selector)
    // make sure element is displayed first
    waitFor(el.waitForDisplayed(2000))
    // this bit waits for element to stop moving (i.e. x/y position is same).
    // Note: I'd probably check width/height in WebdriverIO but not necessary in my use case
    waitFor(
      this.client.executeAsync(function(selector, done) {
  	    const el = document.querySelector(selector)
        
        if (!el)
          throw new Error(
            `Couldn't find element even though we .waitForDisplayed it`
          )
        let prevRect
        function checkFinishedAnimating() {
          const nextRect = el.getBoundingClientRect()
          // if it's not the first run (i.e. no prevRect yet) and the position is the same, anim
          // is finished so call done()
          if (
            prevRect != null &&
            prevRect.x === nextRect.x &&
            prevRect.y === nextRect.y
          ) {
            done()
          } else {
            // Otherwise, set the prevRect and wait 100ms to do another check.
            // We can play with what amount of wait works best. Probably less than 100ms.
            prevRect = nextRect
            setTimeout(checkFinishedAnimating, 100)
          }
        }
        checkFinishedAnimating()
      }, selector)
    )
    // then click
    waitFor(el.click())
    return this;
  }

My eyes water looking at this code. How can this be anything but one big massive flake and time sync on everybody’s time keeping this monster alive?

Cypress.io gets around this by embedding itself in the browser and executing in the same event loop as the browser and code executes synchronously. Taking the asynchronicity and not having to resort to polling, sleeping, and waiting for helpers is hugely empowering.

The effectiveness of tests is not tracked, and we don’t delete bad tests

Test automation engineers are very possessive about their tests, and in my experience, we don’t do any work to identify whether a test is paying its way.

We need tooling that monitors the flakiness of tests, and if the flakiness is too high, it automatically quarantines the test. Quarantining removes the test from the critical path and files a bug for developers to reduce the flakiness.

Eradicate all non-deterministic tests from the face of the planet

If re-running the build is the solution to fixing a test, then that test needs to be deleted. Once developers get into the mindset of pressing the build again button, then all faith in the test suite has gone.

Re-running the tests on failure is a sign of utter failure

The test runner courgette can be disgracefully configured to re-run on a fail:

@RunWith(Courgette.class)=
 @CourgetteOptions(
  threads = 1,
  runLevel = CourgetteRunLevel.FEATURE,
  rerunFailedScenarios = true,
  showTestOutput = true,
  ))

 public class TestRunner {
 }

What is being said by rerunFailedScenarios = true is that our tests are non-deterministic, but we don’t care, we are just going to re-run them because hopefully next time they will work. I take this as an admission of guilt. Current test automation thinking has deemed this acceptable behaviour.

If your test is non-deterministic, i.e. it has different behaviour when running with the same inputs, then delete it. Non-deterministic tests can drain the confidence of your project. If your developers are pressing the magic button without thinking, then you have reached this point. Delete these tests and start again.

Maintenance of end-to-end tests comes at a high price

Test maintenance has been the death of many test automation initiatives. When it takes more effort to update the tests than it would take to re-run them manually, test automation will be abandoned. Your test automation initiative should not fall victim to high maintenance costs.

There’s a lot more to testing than simply executing and reporting. Environment setup, test design, strategy, test data, set up, are often forgotten. You can watch your monthly invoice skyrocket from your cloud provider of choice as the amount of resources required to run this every expanding test suite.

Automation test code should be treated as production code

Automation testers are often new to development and are suddenly tasked with writing complicated end-to-end tests in selenium webdriver, and as such, they need to do the following:

  • Don’t copy and paste code. Copy and pasted code takes on a life of its own and must never happen. I see this a lot.
  • Don’t set up test code through the user interface. I have seen this many times, and you end up with bloated tests that re-run the same test setup code many times to reach the point of writing more test code for a new scenario. Tests need to be independent and repeatable. The seeding or initialisation of each new feature should take place through scripting or outside of the test.
  • Don’t use Thread.sleep and other hacks. A puppy dies in heaven every time an automation tester uses Thread.sleep with some arbitrary number in the futile hope that after x milliseconds the world will be as they expects. Failure is the only result of using Thread.sleep.

Automation test code needs to come under the same scrutiny as real code. These difficult to write test scenarios should not be a sea of copy and paste hacks to reach the finish point.

Testers no longer want to test

I have some sympathy with this point, but manual testing is not as compelling as writing code, so manual testing is perceived as outdated and boring. Automation tests should be written after the manual testing to catch regressions. A lot of automation testers that I have worked with do not like manual testing any more, and it is falling by the wayside. Manual testing will catch many more bugs that writing one test with one fixed set of inputs.

It is often commonplace now to write Gherkin syntax on a brand new ticket or story and go straight into writing the feature file and step definition. If this happens, then, manual testing has is bypassed, and a regression test is written before the actual regression has happened. We are writing a test for a bug that will probably never happen.

Conclusion

In my estimation, we are spending vast sums of money and resources on something that is just not working. The only good result that I have seen from automated testing is an insanely long build, and we have made change exceptionally difficult.

We are not sensible about automated testing. It sounds great in principle. Still, there are so many bear traps that we can quickly end up in a dead-end where change is excruciating and difficult to maintain tests are kept alive for no good reason.

I will leave you with these questions that I think need to be answered:

  • Why is nobody questioning if the payback is worth the effort?
  • Why are we allowing flakey tests to be the norm, not the exception?
  • Why is re-running a test with the same inputs and getting a different result excusable to the point where we have runner such as courgette that do this automatically?
  • Why is selenium the norm when it is not fit for purpose?
  • Why are we letting less experienced developers write a sea of waits, polling waits and at worst Thread.sleep code in their rush to complete the task?

Paul Cowan

Nomadic cattle rustler and inventor of the electric lasso.
Company Website
Follow me on twitter
Contact me for frontend answers.