Parsing Inputs
Overview
This article covers the “Parsing Inputs” chapter from Fuzzing Book. This chapter builds on the concepts from the “Fuzzing with Grammars” chapter and the “Efficient Grammar Fuzzing” chapter. Let’s dive in to learn more about how input parsing can help us as software engineers!
Summary
This chapter uses grammars to parse a set of valid inputs and create derivation trees that can be used to fuzz, generate new valid inputs for testing. Two specific parsers are covered by the chapter: PEGParser
and EarleyParser
.
The PEGParser
utilizes predicate expression grammars (PEGs), which are very similar to context-free grammars (CFGs). The one key difference between the two types of grammars is that PEGs will stop at the first rule that matches, while CFGs will match will all possible rules. Building the derivation tree with only the first matches is the PEGParser’s way of resolving ambiguities. Below is an example of the PEGParser
:
class PEGParser(Parser):
def parse_prefix(self, text):
= self.unify_key(self.start_symbol(), text, 0)
cursor, tree return cursor, [tree]
We can use this parser to create a derivation tree:
= "1 + (2 * 3)"
mystring = PEGParser(EXPR_GRAMMAR)
peg for tree in peg.parse(mystring):
assert tree_to_string(tree) == mystring
display(display_tree(tree))
For the purpose of fuzzing, generation of strings, the use of the PEGParser is very effective. This is because a PEG cannot be re-interpreted as a CFG; therefore, a parser that uses CFGs is the best option. The EarleyParser
can use any CFG to parse. This parser will return all the possible derivation tree as a way to resolve ambiguities. These trees are returned in a list. An example of the EarleyParser
is below:
class EarleyParser(EarleyParser):
def parse(self, text):
= self.parse_prefix(text)
cursor, states = next((s for s in states if s.finished()), None)
start
if cursor < len(text) or not start:
raise SyntaxError("at " + repr(text[cursor:]))
= self.parse_forest(self.table, start)
forest for tree in self.extract_trees(forest):
yield self.prune_tree(tree)
We can use this parser to create derivation trees:
= "1 + (2 * 3)"
mystring = EarleyParser(EXPR_GRAMMAR)
earley for tree in earley.parse(mystring):
assert tree_to_string(tree) == mystring
display(display_tree(tree))
Reflection
In the chapter about how we use rules to create languages, we learned that these rules, which we call grammars, are really powerful. They are not just for making up sentences in a language. They can also take apart a sentence to show us how it was built.
When we look at a sentence, we can break it down into different parts that match up with the grammar rules. It is similar to a puzzle! This means that where each piece is a part of the sentence, and when you put them all together, you see the whole picture.
This time, we focused on taking sentences that follow the rules and breaking them down into these puzzle pieces, which we call derivation trees. This is helpful because once we know how the pieces fit together, we can switch them around to make new sentences. By changing the pieces just a little bit, we can create new sentences that still make sense according to the rules.
Use Cases
Utilization and implementation of PEGParser and EarleyParser can be an effective way to improve the Chasten tool. Both of these functions are easy to import and utilize, and because of the fact that these parsers write simple branched grammars that end once a run is established in a list, this results in our tool having faster grammar times.
Combinations of more complicated parsers such as an Ad-Hoc parser and formal parsers can also benefit our tools. By utilizing these nodes we can “compose” grammars, or introduce finer and finer details to our grammars to influence their significance in our programs without affecting their external structures or significance.