Achieving Coverage
Overview
We’re excited to share our first post about The Fuzzing Book! In this post, we’ll explain what code coverage is and how it helps us check how well our tests work. Code coverage shows us which parts of a program are actually used when a test runs. It helps us to assess how effectively a test suite exercises a program, especially when bugs are scarce. By measuring which parts of the code are executed during a test run, we can gauge the thoroughness of our tests and identify potential areas that may require additional scrutiny. This can be especially useful when there aren’t many bugs, helping us see how much of the program is being tested. We’ll also talk about different ways to measure this and make sure our tests are properly working.
Summary
We’ll be exploring two main approaches to measuring coverage: black-box testing and white-box testing. These methods differ in how they create tests. Black-box testing focuses on the specifications of the program (what the program is supposed to do), without looking at the internal code. On the other hand, white-box testing examines the program’s implementation (how the code is written) to guide the tests.
Understanding these types of testing is especially important for test generators, which aim to cover as much code as possible. But what exactly do black-box and white-box testing involve? Let’s break it down further using a CGI decoder. CGI encoding is used in URLs (i.e., Web addresses) to encode characters that would be invalid in a URL, such as blanks and certain punctuation: In CGI encoding, the string CGI Decoder
+
%xx
, where xx
is the two-digit hexadecimal equivalent.What do you think
"Hello, world!"
would look like if it was a CGI-encoded string?Click to Expand for the Answer
"Hello, world!"
would thus become "Hello%2c+world%21"
where 2c
and 21
are the hexadecimal equivalents of ‘,’ and ‘!’, respectively.If we had a function that could take an encoded string and decode it to its original form, what would that function look like?
Expand cell to see the
cgi_decode()
function:def cgi_decode(s: str) -> str:
"""Decode the CGI-encoded string `s`:
* replace '+' by ' '
* replace "%xx" by the character with hex number xx.
Return the decoded string. Raise `ValueError` for invalid inputs."""
# Mapping of hex digits to their integer values
= {
hex_values '0': 0, '1': 1, '2': 2, '3': 3, '4': 4,
'5': 5, '6': 6, '7': 7, '8': 8, '9': 9,
'a': 10, 'b': 11, 'c': 12, 'd': 13, 'e': 14, 'f': 15,
'A': 10, 'B': 11, 'C': 12, 'D': 13, 'E': 14, 'F': 15,
}
= ""
t = 0
i while i < len(s):
= s[i]
c if c == '+':
+= ' '
t elif c == '%':
= s[i + 1], s[i + 2]
digit_high, digit_low += 2
i if digit_high in hex_values and digit_low in hex_values:
= hex_values[digit_high] * 16 + hex_values[digit_low]
v += chr(v)
t else:
raise ValueError("Invalid encoding")
else:
+= c
t += 1
i return t
How can we run tests?
As mentioned before, we can test the In the above case, we thus would have to test Here are four assertions (tests) that cover these four features. We can see that they all pass: Black-box testing has the advantage of detecting errors in a program’s expected behavior. Since it doesn’t rely on the actual code, tests can be designed even before the program is built. However, the downside is that black-box testing focuses only on what the program is supposed to do, not how it does it. As a result, it may miss certain details in the implementation, leaving parts of the code untested because the tests are only based on the specifications and not the full scope of the code. White-box testing is closely tied to the concept of covering structural features of the code. If a statement in the code is not executed during testing, for instance, this means that an error in this statement cannot be triggered either. White-box testing thus introduces a number of coverage criteria that have to be fulfilled before the test can be said to be sufficient. The most frequently used coverage criteria are: Let us consider This creates a situation similar to black-box testing, where the tests focus on specified behaviors. This happens often because programmers usually implement different behaviors in separate parts of the code. By testing those specific locations, the tests naturally end up covering the various behaviors outlined in the program’s specification. So, even though the tests are based on the program’s behavior, they still manage to cover much of the underlying code. The downside is that it may miss non-implemented behavior: If some specified functionality is missing, white-box testing will not find it. A great advantage of white-box testing is that it allows automatic tracking of whether certain parts of the code were executed during testing. This is done by adding special functionality to monitor the program’s execution. As the program runs, this functionality records which lines of code were executed, and afterward, this data is provided to the programmer. The programmer can then focus on creating tests to cover any uncovered parts of the code. An important part of white-box testing is “tracing.” In Python, tracing can be automated with the cgi_decode()
function with two different approaches to testing: black-box testing and white-box testing.Black-box testing
cgi_decode()
by the features specified and documented, including:'+'
;"%xx"
;assert cgi_decode('+') == ' '
assert cgi_decode('%20') == ' '
assert cgi_decode('abc') == 'abc'
try:
'%?a')
cgi_decode(assert False
except ValueError:
pass
White-box testing
if
and while
decision once being true, and once being false.) Besides these, there are far more coverage criteria, including sequences of branches taken, loop iterations taken (zero, one, many), data flows between variable definitions and usages, and many more.cgi_decode()
, above, and reason what we have to do such that each statement of the code is executed at least once. We’d have to cover:if c == '+'
if c == '%'
(one for valid input, one for invalid)else
case for all other characters.Tracing executions
sys.settrace(f)
command, where f
is a function that the interpreter calls every time a line of code is executed. By analyzing this tracing data, you can determine the code coverage — the percentage of code that has been tested. Higher coverage is better, as it indicates that more of the code has been tested, reducing the likelihood of undetected errors.
Reflection
This post helped us to understand how black-box and white-box testing play complementary roles in improving test coverage. Black-box testing checks if a program behaves as expected based on its specifications, while white-box testing ensures that all code paths are executed by examining the internal structure. Together, they provide a more thorough approach to testing, ensuring that both expected behaviors and edge cases are accounted for. Code coverage tracking, especially with tools like fuzzing, helps identify untested areas, improving overall test effectiveness and ensuring a more robust program.
Action Items
As we progress with fuzzing and code coverage analysis, it’s crucial to prioritize writing tests that not only improve coverage but are sustainable and well-documented. We must ensure that our fuzz tests are clear and adaptable, making it easier for others to understand the test logic and the areas of code they target. By emphasizing maintainability in our coverage tracking and testing processes, we’ll create a more resilient tool that can be expanded and enhanced long-term, serving the Allegheny College community and beyond.