The Harmful Misuse of Code Coverage

What's code coverage?

Code coverage is a metric used in software development. It helps us with measuring how much of our code is tested. It breaks down into four components: Statements, Branches, Functions, and Lines. Each component gives you a score that indicates how thoroughly your code is tested.

You can think of code coverage like a car inspection. Imagine you take your car for an inspection, and the mechanic checks the tires, brakes, and lights. If they didn't check the engine, you'll get 75% coverage score (don't test me on my car manufacturing skills please!).
In large codebases, it's common to set a coverage threshold that must be met before new code can be pushed to the main branch. So if you'll want to push a new feature to your car - let's say a multimedia system without it being tested, you'll now get 60% coverage score so you won't be able to add this functionality.

Let's explore the pros of code coverage tools and why people use them.

The benefits

Enforces Testing: If your team isn’t accustomed to writing tests, setting a coverage threshold that blocks pull requests (PRs) if not met can enforce the practice of writing tests for every piece of code.
Compliance: Some organizations use coverage metrics to ensure that applications are sufficiently tested before reaching production.
Boosts Confidence: High code coverage can create a sense of assurance that the application is well-tested.
Catching Bugs: In some cases, code coverage tools can act as "training wheels" for beginners - a way to verify that each line of code is covered and therefore catch bugs.

After seeing the pros of code coverage, let me ask you a question. What do you think is the coverage score for this block?

test("renders App component", () => {
  render(<App />);

  expect(true).toBeTruthy();
});

Is it 10? 20? 50? 100?

Well, in this case, the coverage score is 100%. The reason for that lies in the way coverage tools calculate the score.
To calculate the score, code coverage tools identify which lines of the code run as part of our test block.
If we go back to our car mechanic analogy, one mechanic might do a good job and actually test that the breaks are working and another mechanic might just briefly look to see that they are there. Both will result in the same coverage score. In the example above, as our App component contains only JSX, 100% of our code runs when we call render. But does that give you any confidence that the App component behaves as it should?

Returning to our car inspection analogy, imagine we visit a new garage where the mechanics lack proper training. They give the brakes only a cursory look, failing to test them thoroughly. Although they mark the brakes as inspected and in good condition, the brakes could still fail at a crucial moment. Similarly, high code coverage might show that many parts of your code are executed during tests, but it doesn’t ensure that the tests are thorough or that critical functionality is adequately verified.
With that in mind, let's try to understand why I think the benefits listed above aren't beneficial.

The dark side of the pros

I believe code coverage tools are generally not worth the effort, and here’s why the supposed benefits might not be as valuable as you think:

Enforces Testing: Enforcing tests is more about fostering a strong development culture. A coverage metric might lead to a counterproductive mindset where developers aim to “satisfy” the threshold without seeing the real value in testing, causing them to dislike the process.
Compliance: While numbers can drive decisions, the metrics provided by code coverage tools can be misleading. For example, what does 90% coverage really mean? If the remaining 10% could cause significant bugs, is it acceptable to leave it untested?
Boosts Confidence: True confidence comes from testing the application the way users interact with it. As demonstrated earlier, simply rendering a component can achieve 100% coverage without actually verifying any user-facing behavior or interactions.
Catching Bugs: While it's true that code coverage tools can help catch bugs, they can also lead to a false sense of security. Believing that 90% coverage is “enough” might make us overlook critical issues. Returning to the "training wheels" analogy, even though they can help you find issues, using "training wheels" will probably mean you're going slower or working harder to reach the same velocity.

If confidence is what we're looking for, instead of relying on code coverage metrics, consider focusing on test quality - write meaningful tests that reflect how users interact with your application.

Summary

In summary, I do believe that code coverage can be useful, if you use it correctly. Setting a coverage threshold is not the way to do it.
Using it to see if we've missed a part of our code in our tests can be beneficial. That's why I wouldn’t recommend relying on code coverage thresholds in a project. They can create a false sense of confidence and might not provide the value you expect. Instead, focus on writing high-quality tests, figuring out the purpose of your tests , understanding user behavior, ignoring implementation details and adopting best practices and good dev culture that ensure code reliability.
If your project enforces code coverage, I'd love to hear your thoughts and experiences—whether you find it beneficial or not. I’m here and also on X .
Until next time 👋
Thanks, Matan.