Effective Automated Testing

I'll be giving a talk at work about improving our test automation. Initial ideas are listed below. Feedback on talk content and general approach are welcome along with any automated testing anecdotes you'd like to share. Possible talk sections are listed below.

Automation Benefits

Reduce cost.
Improve testing accuracy/efficiency.
Regression tests ensure new features don't break old ones. Essential for continuous delivery.
Automation is essential for tests that cannot be done manually: performance, reliability, stress/load testing, for example.
Psychological. More challenging/rewarding. Less tedious. Robots never get tired or bored.

Automation Drawbacks

Opportunity cost of not finding bugs had you done more manual testing.
Automated test suite needs ongoing maintenance. So test code should be well-designed and maintainable; that is, you should avoid the common pitfall of "oh, it's only test code, so I'll just quickly cut n paste this code".
Cost of investigating spurious failures. It is wasteful to spend hours investigating a test failure only to find out the code is fine, the tests are fine, it's just that someone kicked out a cable. This has been a chronic nuisance for us, so ideas are especially welcome on techniques that reduce the cost of investigating test failures.
May give a false sense of security.
Still need manual testing. Humans notice flickering screens and a white form on a white background.

When and Where Should You Automate?

Testing is essentially an economic activity. There are an infinite number of tests you could write. You test until you cannot afford to test any more. Look for value for money in your automated tests.
Tests have a finite lifetime. The longer the lifetime, the better the value.
The more bugs a test finds, the better the value.
Stable interfaces provide better value because it is cheaper to maintain the tests. Testing a stable API is cheaper than testing an unstable user interface, for instance.
Automated tests give great value when porting to new platforms and when upgrading existing ones.
Writing a test for customer bugs is good because it helps focus your testing effort around things that cost you real money and may further reduce future support call costs.

Adding New Tests

Add new tests whenever you find a bug.
Around code hot spots and areas known to be complex, fragile or risky.
Where you fear a bug. A test that never finds a bug is poor value.
Customer focus. Add new tests based on what is important to the customer. For example, if your new release is correct but requires the customer to upgrade the hardware of 1000 nodes, they will not be happy.
Documentation-driven tests. Go through the user manual and write a test for each example given there.
Add tests (and refactor code if appropriate) whenever you add a new feature.
Boundary conditions.
Stress tests.
Big ones, but not too big. A test that takes too long to run is a barrier to running it often.
Tools. Code coverage tools tell you which sections of the code have not been tested. Other tools, such as static (e.g. lint, Perl::Critic) and dynamic (e.g. valgrind) code analyzers, are also useful.

Test Infrastructure and Tools

Single step, automated build and test. Aim for continuous integration.
Clear and timely build/test reporting is essential.
Keep metrics (via test metadata, say) on the test suite itself. Is a test providing "value". How often does it fail validly? How often does it fail spuriously? How long does it take to run?
Aim for around 80% code coverage (for most applications 100% code coverage is not worth it).
It's vital to quarantine intermittently failing tests quickly and to fix them quickly ... only returning them to the main build when reliable (if you don't do that, people start ignoring test failures!). No broken windows.
Make it easy to find and categorize tests. Use test metadata.
Integrate automated tests with revision control, bug tracking, and other systems, as required.
Divide test suite into components that can be run separately and in parallel. Quick test turnaround time is crucial.

Design for Testability

It is easier/cheaper to write automated tests for systems that were designed with testability in mind in the first place.
Interfaces Matter. Make them: consistent, easy to use correctly, hard to use incorrectly, easy to read/maintain/extend, clearly documented, appropriate to audience, testable in isolation.
Dependency Injection is perhaps the most important design pattern in making code easier to test.
Mock Objects are frequently useful and are broader than unit tests - for example, a mock server written in Perl (e.g. a mock SMTP server) to simulate errors, delays, and so on.
Consider ease of support and diagnosing test failures during design.

Test Driven Development (TDD)

Improved interfaces and design. Especially beneficial when writing new code. Writing a test first forces you to focus on interface - from the point of view of the user. Hard to test code is often hard to use. Simpler interfaces are easier to test. Functions that are encapsulated and easy to test are easy to reuse. Components that are easy to mock are usually more flexible/extensible. Testing components in isolation ensures they can be understood in isolation and promotes low coupling/high cohesion. Implementing only what is required to pass your tests helps prevent over-engineering.
Easier Maintenance. Regression tests are a safety net when making bug fixes. No tested component can break accidentally. No fixed bugs can recur. Essential when refactoring.
Improved Technical Documentation. Well-written tests are a precise, up-to-date form of technical documentation. Especially beneficial to new developers familiarising themselves with a codebase.
Debugging. Spend less time in crack-pipe debugging sessions. When you find a bug, add a new test before you start debugging (see practice no. 9 at Ten Essential Development Practices).
Automation. Easy to test code is easy to script.
Improved Reliability and Security. How does the code handle bad input?
Easier to verify the component with memory checking and other tools.
Improved Estimation. You've finished when all your tests pass. Your true rate of progress is more visible to others.
Improved Bug Reports. When a bug comes in, write a new test for it and refer to the test from the bug report.
Improved test coverage. If tests aren't written early, they tend never to get written. Without the discipline of TDD, developers tend to move on to the next task before completing the tests for the current one.
Psychological. Instant and positive feedback; especially important during long development projects.
Reduce time spent in System Testing. The cost of investigating a test failure is much lower for unit tests than for complex black box system tests. Compared to end-to-end tests, unit tests are: fast, reliable, isolate failures (easy to find root cause of failure). See also Test Pyramid.

Test Doubles

Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists.
Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an InMemoryTestDatabase for example).
Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed for the test.
Spies are stubs that also record some information based on how they were called; for example an email service that records how many messages were sent.
Mocks are pre-programmed with expectations which form a specification of the calls they are expected to receive; they can throw an exception if they receive a call they don't expect and are checked during verification to ensure they got all the calls they were expecting. Note that only mocks insist upon behavior verification. The other doubles can, and usually do, use state verification. Mocks behave like other doubles during the exercise phase because they need to make the SUT (System Under Test) believe it's talking with its real collaborators - but mocks differ in the setup and the verification phases. While mocks are valuable when testing side-effects, protocols and interactions between objects, note that overuse of mocks inhibits refactoring due to tight coupling between the tests and the implementation (instead of just the interface contract).

See also:

Mocks aren't Stubs article by Martin Fowler (mockists vs classicists, classic: use real objects if possible and a double if it's awkward to use the real thing, use state verification; vs mockist: always use a mock for any object with interesting behavior, use behavior verification (mocks are pre-programmed with expectations, a specification of the calls they are expected to receive, verification ensures they got all the calls they were expecting).
Concise version of Fowler mocks arent stubs
Visual Studio 11 Fakes, Stubs and Shims (run-time method interceptors) (Stub: State-based verification "Arrange, Act, Assert"; Mock: behavior-based verification: A mock provides not only a fake implementation but also logic for verifying how calls were made on the fake. When you are testing side-effects, protocols and interactions between objects, they are extremely valuable. Some folks fall into behavior verification when none is needed)
Test double flavours (Test Stub, Test Spy, Mock Object, Fake Object, ...)
What is the difference between a mock and a stub (stack overflow)
verified fakes in python

Testing Memory and Threads

Re: Perl Memory problem ... (Memory Tools References) (long list of nodes on memory checking and other code analysis tools)
Re: Threads or no Threads (Threading, Forking, Signals, Event Loop and Concurrency References) -- see "Unit Testing Concurrent Code" section

Race condition (wikipedia)
Heisenbug (wikipedia)
lsof (wikipedia)

Testing Tools

Google Test (wikipedia)
Google Test and Google Mock (github)
Catch2 Testing Framework (github)
Rosetta Test: Long List is Long (2023) - includes an example C++ unit test using Catch2
Re: Rosetta Test: Long List is Long - Abseil (2023) - includes C++ example code and building and using Google's Abseil library
Clang and LLVM (contain many useful tools such as AddressSanitizer and ThreadSanitizer)

Test Anything Protocol (TAP)

Test Anything Protocol (wikipedia)
Test Anything Protocol (testanything.org)
Re^3: proving something other than perl by petdance - The reason I wrote prove was so that I could test PHP scripts

Types of Testing

Static testing. Code review by humans and static code analysers (e.g. lint, Perl::Critic).
Passive testing. Contrary to active testing, testers do not provide any test data, just examine system logs and traces.
Dynamic testing. Unit tests, Integration tests, System tests, Acceptance tests, ...
Dynamic program analysis. e.g. Purify, Valgrind, ThreadSanitizer, ...
Exploratory testing. Simultaneous learning, test design and test execution.
Performance testing. Stress testing. Load testing.
Usability testing.
Regression testing.
Acceptance testing.
End-to-end testing.
Security testing.
Equivalence partitioning.
Critical path testing.
Failover testing.
Internationalization testing.
Smoke testing.
Alpha, Beta testing.
... and many more :)

References Added Later

Re: Winning people over to better development practises (TDD)
Re^3: [RFC] Review of module code and POD (TDD) (2021 response to Bod)
Re: What to test in a new module (TDD) (2023 response to Bod)
Re^3: STDIN typeglob (2023 response to Bod)
Re^4: Rogue character(s) at start of JSON file by cavac (2023) - aircraft crash caused by Confirmation bias (example of the danger of ignoring computer warnings)

Re: Overcoming 5.10.0 vs 5.38.2 incompatibilities by SankoR (2024) - endorses Test2::V0 (moving large code base with existing test suite from perl v5.10.0 to v5.38.2)
CPAN test suites with SQL by LanX (2024) - seeking advice on using SQLite in his module's test suite

What to test in a new module by Bod (2023)
Testing scripts under Test::More (with Test::Script) by bliako (2023)
Re^3: What to test in a new module by stonecolddevin (2023) - argues against strict adherence to TDD
Re^14: What to test in a new module by choroba (2023) - testing anecdote from early in his career, see also Re: Effective Automated Testing
Re: STDIN typeglob by hv (2023) - if something is difficult to write tests for, maybe it's the interface that should change

Rosetta Test: Long List is Long by me (2023) (Matchers are essential when unit testing statically typed languages) (2023)
Re^3: How to test for empty hash? by me (2021) (prefer cmp_ok to ok because you get clearer diagnostics when a test fails)
Re^7: Introspecting function signatures (Dependency Injection and Monkey Patching) by LanX and me (2021)

Error handling in a module (discusses testing vs remotely supporting a module) by Bod (2023)
Re^5: Using the perl debugger to look at a renaming files function (Remote Support) by me (2021) (example of remotely supporting products that run on many customer machines)

RFC: Basic Testing Tutorial by hippo (2019)
Test Driven Development, for software and for pancakes by talexb (2017)
[RFC] Discipulus's step by step tutorial on module creation with tests and git by Discipulus (2018)
Testing in real life by nbezzala (2011)

Perl CPAN test metadata
Re: Using the perl debugger to look at a renaming files function (on Debuggers References) (on debuggers)
Re: Re: Re: Are debuggers good? by merlyn (aggressive approach to rewriting unmaintainable code)

Re: How to write testable command line script? by davido (2018)
Re: Multiple consecutive connections to a socket - example event-driven server using IO::Select (2016) - an example of a Perl syslog server (using IO::Select) used during automated smoke testing

How to structure tests that span several modules by talexb (2005)
Re: How to structure tests that span several modules by xdg (2005 - what to put in each .t file, e.g. a .t file might cover a Use case)
Re: How to structure tests that span several modules by dragonchild (2005 - testing with mock objects)
Re: Why a taint flag on test files? by me (2005 - I like to run all my tests both in normal mode and taint mode ... plus test in "persistent" environments, such as mod_perl)
A danger of test driven development. by Perl_Mouse (2005)
proving something other than perl by Tanktalus (2007)

CPAN Testing Tools

Basic Testing Tutorial by hippo

Test::More - classic Perl testing framework
Test::Harness
Test::Class
Test::Deep
Test::Most

Test2::Suite - the most recent and modern set of tools for testing
Test::Script
Test2::V0
Test2::API

Perl Testing in 2023 by tobyink (blog)
Getting started with Test2 (perlmaven)
Anyone using the new Test2 framework? by stevieb (2017) - no replies
Re: arithmetic in a regex ? by choroba (2023) - examples using Test2::V0
Re^3: OO Pattern Container x Elements and Method Chaining by choroba (2022) - ditto
Mite: an OO compiler for Perl by tobyink (2022) - ditto

Outlier test fail by Bod (2024) - see response from choroba (modern Perl testing: Test::Deep, Test2::V0, ...)

General References

Software testing (wikipedia)
Software quality (wikipedia)
Static program analysis (wikipedia)
Dynamic program analysis (wikipedia)
Dynamic testing (wikipedia)
Unit testing (wikipedia)
Exploratory testing (wikipedia)
White box testing (wikipedia)
Black box testing (wikipedia)
GUI testing (wikipedia)
Equivalence partitioning (wikipedia)
Test double (wikipedia)

List of Unit Testing Frameworks (wikipedia)
Hamcrest (wikipedia) (pioneered assertion matchers - Hamcrest is an anagram of matchers)

Security testing (wikipedia)
Penetration test (wikipedia)
Dynamic application security testing (wikipedia)
Static application security testing (wikipedia)

Test Automation (wikipedia)
Regression testing (wikipedia)
Data-driven testing (wikipedia)
Keyword-driven testing (wikipedia)
Broken window theory (wikipedia)

When should a test be automated? by Brian Marick
Atlassian test automation
Atlassian types of testing
Why Automated Testing? (Smartbear)

Related References

On Interfaces and APIs
Ten Essential Development Practices by Damian Conway

Re: I need perl coding standards (Coding Standards References) (long list of references on coding standards)
Re: Strategies for maintenance of horrible code? (Legacy Code References) (long list of references on dealing with legacy code)
Re: Security techniques every programmer should know (Security References) (long list of Security references)
Re: "Magic tools" that take the fun away (Releng/DevOps/Cloud/Server References)
Re^2: What's Perl good at or better than Python (Game of Life, LLiL, Rosetta and Performance References)

Updated: many extra references were added long after the original node was written. 2019: Added Test Doubles section. 2021: Added Types of Testing section. 2023: Added links to C++ examples using Catch2 and Google Abseil library.

Comment on Effective Automated Testing Select or Download Code


The stupid question is the question not asked
	PerlMonks