After the first pains of writing tests - Taking back control of your tests

3/16/20

Being confronted with the need to write tests during a project comes out of nowhere sometimes: The beginning of a project starts fast, goes fast and slows down after a while because the code base grows bigger. Classes seem to be harder to read and the framework you usually love is trying to make you mad for being inflexible. "That's totally normal, because it becomes more complex" is what you might think first, but you and your team will hear magical stories of projects that successfully keep up the pace from sprint to sprint. "They use tests" is one of the mentioned reasons for their progress and it becomes clear that you need to write tests too. Together the team decides on a testing library and they start writing tests for their project. The end.

Well, this is not yet a happily ever after and the story doesn't end here. This is just the beginning. Many developers tend to throw technology at a problem and believe that this is the solution for the issues they run into, because most of the time it works at first. They bend and twist the implemented technology to make it fit for their use case. "That's what it is for" is your answer and you're right. But every tech was created for a use case and intended to be used in a certain way. This isn't any different with tests. After a while it feels like that writing tests are actually slowing down development and make sprints fail. What went wrong? You started to use the technology, you followed the documentation and you even read about your framework's best testing practices.

It's also about how you write your code and how your architecture looks that makes a difference at testing and maintaining it. Dependency injection is one half of what's important for the tests you have to write for your code, and designing your application will affect how much effort you'll have to put into your tests. Before we cover a topic like how someone can structure her or his code, there's the obstacle of being new with testing code first, so let's talk about some basics that will help you understand where you can start to take control of your code and tests.

Covering requirements

When developers start to write tests, they usually have an existing project with a big code base. This often leads to the approach to write code first and test it later. There are ways of testing that promote writing tests first, and then the implementation. This can be achieved by Test Driven Development (TDD) or Tests First, but despite these strategies helping to improve your project's code, they might be hard to accomplish in a team who only recently started writing tests. If you write your feature or bugfix code first, you have to be sure that your tests are covering the requirements that classes and methods cover.

public function foo(?int $num) : int
{
    // implementation
}

This method has an implementation, but I reduced it to the relevant parts: The chosen API. This and the requirements of what this method is supposed to do is everything you need to create test cases that need to be covered. This view of things is mostly helpful for unit tests but may also help on other test types. The point is to focus more on requirements than testing implementation details. If the requirements are the same but someone is changing how to accomplish them, aka. refactoring, your tests should still be green without doing any changes on them. You will realize that this can sometimes be tricky, especially when test doubles like mock objects are involved, but with time you will experience that your tests become more stable and you will see that this point of view on your methods will prepare you for switching to testing approaches like TDD.

There are also benefits on code coverage when you focus less on the implementation and more on requirements.

100% code coverage

Even code coverage can follow some strategies. On a domain view there's an approach that only the core domain needs to be covered with lots of tests while less important subdomains don't need a high coverage or tests at all. Having 100% coverage doesn't mean that your code is completely covered and free of bugs.

public function qux(int $num)
{
    if ($num > 0) {
        // do stuff
    }
    
    // in between code
    
    if ($num < 100) {
        // do more stuff
    }
    
    // rest of the code and return value
}

A test for this code example that is using a number between 0 and 100 would result in a code coverage of 100% in the metrics. But there are uncovered paths that aren't tested yet: The paths where $num is 0 or below and 100 or higher. I left the rest of the method's implementation hidden in form of comments to hide how important or different the code is in those if-blocks. There are at least two uncovered paths that might result in a wrong return value, depending on the previous mentioned requirements for the method. In TDD you add requirements and paths step by step to get as close to 100% coverage as possible which will lead automatically to tests that will cover the different paths that might be hidden in written code of your pre-test era. For an existing implementation, you have to be careful not to test the implementation itself over the requirements. So this time the goal of 100% code coverage for your existing code can help you

find a forgotten test case from the requirements
realize a new use case during development
recognize dead code

100% code coverage is a state where a bug might still exist in a project, while uncovered code will have a hidden bug until you proove otherwise. The strategy how you covered your code will make a difference between 100% and 100%. There's also the benefit that mutant testing will become more valuable with a higher coverage.

Double trouble - The Test Doubles

Your code usually has dependencies that get injected into your classes, methods or functions and depending on what kind of tests you are going to write (and need), you will have to use Test Doubles instead of the original dependencies. One Test Double you might already know at this point is a Mock object that got created based on a real dependency like a class from your own code or one of the many third party software classes your project depends on. Mocks or other Test Doubles can be even more than just classes, but since you're about to test your own code, this will be about creating doubles for classes.
The scope of Unit Tests and its need for isolation is predestined for Test Doubles like Mock objects and will be used by developers, who are new to testing, and also by "test-veterans".

"I know Mock objects. They are well described in the documentation of testing library X". Yes, they are. Most documentations are well written at least for the basics, but how they should be used also depends on the previously mentioned requirements of the tested code. Mocks have the ability to check how often a method was called in the context of a test and break the test in case of a different invocation amount than expected: Was it called only once or more than once? Exactly three times? Never? Those checks are also assertions and will be accepted for a full test despite leaving out an e.g. assertEquals() on a return value. If your code works as expected, only a certain amount of invocation will be triggered in your test by the mocked method. This is what the documentation is teaching us so far and this is how people tend to code on their first steps on unit tests. Have you ever considered that it sometimes doesn't matter how often some methods are called on a dependency? Or that you don't need to mock a dependency?

Stub! Hammertime!

Well, actually it's stub time, because unlike Mock objects stubs don't have the need to check how often methods were called during a test and that can be important for refactoring or extending existing code. There are more differences between Mock objects and stubs, but let's stay with this one because I want to focus on one simple fact: You don't write tests that covers your newly created feature for now, you write those tests to guarantee that it will work correctly as long as the code exist. The tests have to work e.g. even one year after you created it. And we all know that code never stays the same during a project's lifetime. If your code gets extended, the previously implemented requirements for a feature are still valid and your old tests have to work even after the new code was added. Your old tests are only allowed to fail when the newly added functionality is causing different results in the assertions for the old tests. If tests fail despite your code is still working, they might have unneeded assertions. It isn't an easy accomplishment to prevent this during the lifetime of a project and even experienced developers will have trouble from time to time. Sometimes a refactoring is in reality a rewrite and the existing tests won't be able to handle that.

public function qux(ValueObject $vo)
{
    if ($vo->getNumber() > 0) {
        // do stuff
    }
    
    // in between code
    
    // Newly added if-block
    if ($vo->getNumber() < 100) {
        // do more stuff
    }
    
    // rest of the code
}

If you had only one invocation of $vo->getNumber() and added the new if-block with the condition $vo->getNumber() < 100, a test that is checking the path of the first condition using a Mock object would fail in case this if-block doesn't contain an early exit. Yes, you can argue that value objects don't need to be a Mock object, but not every value containing object is free of dependencies. ValueObject is a deceiving naming choice in the code example. A stub could prevent other older tests from failing because if $vo->getNumber() is only returning an injected value, it usually doesn't really matter how often it is called in your code.

There are always exceptions, but keep in mind that tests need to fail when something is wrong with your implemented code, not with your usage of Test Doubles.

Integration Test works, Unit Test fails

There are some funny animations online about two passing Unit Tests and one failing Integration Test and it is correct that a project shouldn't depend on just one testing type. Even though Unit Tests are a great way to test your code in isolation, they are just a fundament for the test of your project when it comes to bigger projects or projects that involve frameworks. The different forms of integration, which includes that more or less original dependencies are executed in tests too, can help you test the interaction of the different components and check if they are playing nicely with each other. Some prefer them over Unit Tests because they seem to be sporadically easier to initialize, especially with the help of frameworks, however they still can give you a false feeling of security just like the funny animations are doing with Unit Tests. Let's take a look at two different testing types and how they are different from each other.

public function fetchBar()
{
    // some code
    
    $bars = [];
    foreach ($this->dbTableFoo->findAllBar() as $bar) {
        // whatever you wanna do with the bars of Foo
    }
    
    // more code
    
    return $bars;
}

What is happening in that piece of code? The irrelevant parts are kept as code comments and the focus lies on one foreach loop, that is accessing a database with the help of an injected dependency in $this->dbTableFoo. Now let someone do a refactoring of that loop and change it to the following code:

public function fetchBar()
{
    // some code
    
    $bars = [];
    for ($i = 0; $i < count($result = $this->dbTableFoo->findAllBar()); $i++) {
        // whatever you wanna do with the bars of Foo
    }
    
    // more code
    
    return $bars;
}

I wrote "refactoring", not "improvement". I chose this example of a change because its disadvantage is well known in the PHP world. If you don't know what is wrong with that change let me explain it.

First of all: The Integration Test that is accessing a real database doesn't fail in this scenario. But if this change will be released, you might run into performance issues, that cause problems in your production environment. While the foreach code will call findAllBar() only one time, the for approach will call it for every iteration. So for every iteration in the for-loop will execute the SQL-queries inside findAllBar() and the method will be called as many times depending on how many rows will be returned by its query. A testing environment usually works with data-fixtures which are low on table rows for the use case and leads to developers realizing this issue when it's too late. Having the same amount of rows just like the production environment isn't a solution because tests are supposed to run fast and need to break if something doesn't work as expected. We need to break a build in this use case.

This is a fitting example for the usage of Unit Tests with Mock objects where a method is supposed to be called only once during its test case. We expect a method, that is accessing a database, to be only called once to prevent performance issues by the application itself. If the method is called multiple times just like in the for-loop example, the Unit Test would fail and break a build or deployment. The query behind findAllBar() can get its own tests to ensure a fast SQL execution and fitting indices for the use case. Does this mean that the Integration Test is useless in our scenario? If you compare both loops you will see that there is an addition in the for-loop that can't be found in the foreach-loop: the use of keys. The variable $i got introduced and is intended to be used as a key to access the query result in $result, but does the method even return consecutive numeric keys? An Integration Test would be able to break in case the keys are different to what was expected by the author and you likely see that the code depends now on other elements of the data structure returned by findAllBar(). Despite that this change would be a no-go and that it should be usually rejected by reviewers, we learned that sometimes it's favourable to have more than just one type of test to be on a safe side. Every test type has its advantages and disadvantages and can be used as supplementation to other type of tests.

Be sure that a failing test is expressive in its error message and doesn't leave any doubt about the reason why a test breaks and where the culprit can be found. Tests should prevent you from debugging whole parts of your code. This is why End-To-End Tests shouldn't be your mainly used test type. A failing End-To-End Test is most of the time like a customer's statement: "Something is broken". Without more specific information you will spend too much time on finding the bug.

What about static code analysis tools?

Isn't the previous example also catchable with good static analysis tools? It's an example to point out the variety of test types and that one test type is not superior to others by default. Static analysis tools aren't a replacement for tests but another addition to maintain code quality. They don't exist to make testing obsolete, but they avert that the developer needs to create unnecessary test cases that are checking, for example, wrong type hints. Tests and code analysis are a worthwhile combination just like the human equivalents of reviewing and functional tests by a person. Tests are for the functionality while static code analysis is reviewing your code. Static code analysis still can't replace human interaction or intervention, but it will reduce it where it isn't necessary. It's similar to tests that are failing because of a BC break. A feature that isn't working properly or breaks another part of the code shouldn't be ready for a review by people who need their time for themselves and other tasks. It's an early feedback circle through a development task, so remember to have a fitting and helpful configuration for your static code analysis too.

Tests - The handy feedback

These aren't the only ways to take control of your tests and your development processes. If it is hard or complex to write tests for a class, you probably have sophisticated code and you need to tame that complexity by a change of design or by splitting up functionality. This makes writing tests a just in time metric and feedback for yourself to rethink about your current development. You are experiencing the maintainability of your code first hand. Solving a problem comes first and then you have to decide how to put your solution into your development. If test don't influence and change how you write code during a longer time period, you probably are still trying to bend your tests and code instead of reassessing your structure, design and tasks. Your tests will be the judge for the development part of your solution.

This is the first part of (maybe?) follow up blog posts about writing tests for those who have to struggle with them during your first steps. There are more interesting topics when it comes to create code and if you'd like to read more about a deep dive into one of the mentioned strategies, test types or about maintaining complex project-code, you can leave a comment with your suggestions and feedback.

About Claudio Zizza
I'm working as a software developer for more than 15 years and used many coding languages and frameworks during that time, like PHP (of course), ASP, .NET, Javascript, Symfony, Laminas and more. I also contribute to the open source communities and am a part of the PHP Usergroup in Karlsruhe, Germany.
Follow me on Twitter (@SenseException)