Efficient Grammar Fuzzing
Overview
This article discusses the Efficient Grammar Fuzzing chapter from The Fuzzing Book, exploring how its content could potentially be useful to the development of our tool, chasten. This article builds on the Fuzzing with Grammars, aiming for a better implementation. Let’s learn more!
Summary
This chapter opens with outlining the issues with the simple_grammar_fuzzer
introduced previously, which are that it is inefficient and hard to control. It can continue adding parentheses indefinitely, for example, and has a complexity of \(O(n^2)\) as part of this. Therefore, derivation trees are now introduced, which are a form of visualization for the steps of a grammar. Below is an example of a tree built from the expression 2 + 2
.
This representation works well for a higher-level understanding, but is not a feasible way to show in code as that stands. The code representation is possible through the use of a combination of tuples and lists, using the formula (SYMBOL_NAME, CHILDREN)
. The following source code example shows how to combine tuples and lists to express a derivation tree:
= ("<start>",
DerivationTree "<expr>",
[("<expr>", None),
[(" + ", []),
("<term>", None)]
( )])
Each node has its own sub-nodes, until it reaches the outermost “leaves” of the derivation tree. Using this, the chapter then outlines how to add functionality for traversing and expanding nodes/whole trees using the methods it defines for a GrammarFuzzer
class. What follows is an example of a method for expanding a node randomly for fuzzing purposes:
class GrammarFuzzer(GrammarFuzzer):
def choose_node_expansion(self, node: DerivationTree,
-> int:
children_alternatives: List[List[DerivationTree]]) """Return index of expansion in `children_alternatives` to be selected.
'children_alternatives`: a list of possible children for `node`.
Defaults to random. To be overloaded in subclasses."""
return random.randrange(0, len(children_alternatives))
From there, limits are defined for maximum and minimum costs as another manner of expanding that is less random. The minimum costs help with closing the tree, to avoid some of the concerns about inefficiency surrounding the simple_grammar_fuzzer
. An example of expanding by minimum costs is presented in the following source code segment:
class GrammarFuzzer(GrammarFuzzer):
def expand_node_min_cost(self, node: DerivationTree) -> DerivationTree:
if self.log:
print("Expanding", all_terminals(node), "at minimum cost")
return self.expand_node_by_cost(node, min)
And then the expanding by maximum costs, good for using to start the tree, works as follows:
class GrammarFuzzer(GrammarFuzzer):
def expand_node_max_cost(self, node: DerivationTree) -> DerivationTree:
if self.log:
print("Expanding", all_terminals(node), "at maximum cost")
return self.expand_node_by_cost(node, max)
These three forms of expansion can be applied together in order to achieve a more efficient form of fuzzing with grammars. By using the GrammarFuzzer
class and the methods created for it, tests can be generated much more quickly, with the additional benefit of creating smaller inputs over which we have increased control.
Reflection
This chapter helped to grow our understanding of grammar-based fuzzing. Before, we would not have known that there were more ways to perform grammar-based fuzzing. Grammar Fuzzing focuses on the concept of grammar-based fuzzing and highlights some of its initial flaws as well as how to address and debug them. There are also many drawbacks associated with grammar-based fuzzing, such as efficiency, infinite loops, and bugs that can occur by not paying careful attention to the grammar rules set in place. Here are two concepts to remember:
Derivation Trees: Deviation trees are a visual representation of the grammar of a string. They create a tree-like structure that breaks down the grammar into smaller and smaller components until you reach the end of the tree. Deviation trees are useful for understanding how a string conforms to its grammar and identifying deviations or errors.
Grammar Implementation: The chapter discusses how to implement grammars in data structures such as tuples and lists. This is a crucial step in grammar-based fuzzing, as it defines the rules and structures that the fuzzer will use to generate test cases.
The use of grammar-based fuzzing can be a powerful tool for testing software, but this also has many trade-offs. Simple grammar fuzzing can be really inefficient, especially when dealing with large grammars and complex language structures. These inefficiencies become apparent as the function has to iterate over the generated string, searching for matching symbols.
Use Cases
The implementation of efficient grammar fuzzing, as described in the chapter, offers a good opportunity for improving the testing of our code in the development of Chasten. Since our tool checks XPath expressions, using the three-phase expansion approach would create a highly efficient testing tool. Here is an example of an XPath, //FunctionDef/element[position() = 3]
this selects the third element node within a FunctionDef node. This specific pattern narrows down the selection to a particular position within the document structure, exploring different variations or alterations of this pattern is essential in testing to cover diverse scenarios. An efficient fuzzer can generate a multitude of XPath expressions with various modifications, like changing the positional value, using different node types, or incorporating more conditions.