GNU Bison is a general-purpose compiler-compiler (parser generator) that converts an annotated context-free grammar into an LALR(1) or GLR parser for that grammar. Developing a parser can be a challenging task, requiring extensive amounts of debugging and tracing. This debugging is impossible without a description of the produced automaton, that contains detailed information about automaton states and tokens that trigger state transitions. Such a description is generated by Bison when invoked with the `--report' or `-v' option. It is stored in a plain text file named *.output and contains the following information:
The plain text format traditionally used in parser output reports makes navigation in these files difficult, especially in case of large and complex parsers. Producing reports in an XML format may be a solution to this problem. Files in XML are machine-readable while remaining relatively human-legible. The XML language allows for performing a wide range of transformations, thus an XML report can easily be converted into another format (including the traditional plain text one). In particular, it can be converted into an HTML page with rich text formatting and appropriate hyperlinks, facilitating the navigation through it. This would increase efficiency of reading, analyzing, and navigating through the report.
This document describes format for XML automaton reports.
This format and its implementation is now part of the Bison
mainline.
Let's take as an example the calc.y grammar used in the GNU Bison Manual:
%token NUM STR %left '+' '-' %left '*' %% exp: exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp | NUM ; useless: STR; %%
Running Bison with `--report=all' creates a full automaton report, whose content has been outlined in the Introduction (the detailed description is available in the manual.) An online copy of the produced report is available for your reference.
This section describes the approach used to transform this sample .output file into an XML format.
The XML report starts with the <bison-xml-report> element containing the version attribute.
<?xml version="1.0"?> <bison-xml-report version="2.3a+"> ... </bison-xml-report>
The following elements are allowed inside <bison-xml-report>:
The <reductions> element contains information about unused terminals, useless nonterminals and rules. It allows the following elements inside:
<reductions> <useless> <nonterminals> <nonterminal>useless</nonterminal> </nonterminals> <rules> <rule number="6"> <lhs>useless</lhs> <rhs> <symbol class="terminal">STR</symbol> </rhs> </rule> </rules> </useless> <unused> <terminals> <terminal>STR</terminal> </terminals> </unused> </reductions>
The <useless> element contains <nonterminals> and <rules>. Both of them may contain zero or more child elements. The <unused> element contains <terminals> with zero or more child elements.
The <nonterminal> and <terminal> elements contain text, which is a symbol name.
The <rule> element, with a rule number written as the value of the number attribute, contains two children: <lhs> (a left-hand side of the production) and <rhs> (the right-hand side of the production). The former contains the name of a nonterminal symbol this rule reduces to, and the latter contains zero or more <symbol> and <empty> elements.
The <symbol> element contains the name of a symbol used in RHS of the production. The class attribute tells about the class of this symbol, which can be terminal or nonterminal.
<conflicts> <conflict state="8" num="1" type="shift/reduce"/> <conflict state="9" num="1" type="shift/reduce"/> <conflict state="10" num="1" type="shift/reduce"/> <conflict state="11" num="4" type="shift/reduce"/> </conflicts>
The <conflicts> element has zero or more <conflict> elements each of which describes a state that has non-solved conflicts along with their number and type.
The <grammar> element contains a summary of the grammar (<rules>) and information about the symbols usage (<terminals>, <nonterminals>).
<grammar> <rules> <rule number="0"> <lhs>$accept</lhs> <rhs> <symbol class="nonterminal">exp</symbol> <symbol class="terminal">$end</symbol> </rhs> </rule> ... </rules> <terminals> <terminal number="0" name="$end"> <rule>0</rule> </terminal> ... </terminals> <nonterminals> <nonterminal number="9" name="$accept"> <left> <rule>0</rule> </left> </nonterminal> ... </nonterminals> </grammar>
The <automaton> element contains a set of items also known as pointed rules (<item>) with look-aheads (<lookaheads> - when `--report=all' or `--report=lookahead' was specified). It also contains <actions> element with possible <transitions>, <errors>, <reductions> and <solved-conflicts> enclosed elements.
<automaton> ... <state number="8"> <itemset> <item rule-number="1" point="1"/> <item rule-number="1" point="3"> <lookaheads> <symbol class="terminal">$end</symbol> <symbol class="terminal">'+'</symbol> <symbol class="terminal">'-'</symbol> <symbol class="terminal">'/'</symbol> </lookaheads> </item> <item rule-number="2" point="1"/> <item rule-number="3" point="1"/> <item rule-number="4" point="1"/> </itemset> <actions> <transitions> <transition type="shift" symbol="'*'" state="6"/> <transition type="shift" symbol="'/'" state="7"/> </transitions> <errors/> <reductions> <reduction symbol="'/'" rule="1" enabled="false"/> <reduction symbol="$default" rule="1" enabled="true"/> </reductions> </actions> <solved-conflicts> <resolution rule="1" symbol="'+'" type="reduce">%left '+'</resolution> <resolution rule="1" symbol="'-'" type="reduce">%left '-'</resolution> <resolution rule="1" symbol="'*'" type="shift">'+' < '*'</resolution> </solved-conflicts> </state> ... </automaton>
XSLT is the best known XML transformation language. It transforms a given XML into other XML or any other human-readable format, as, for instance, HTML or plain text. XSLT is heavily influenced by functional languages.
Three complete XSLT transformations are available, one for converting an XML report to Bison's traditonal *.output file format (including all text formatting and indentation!), another one, for converting the report to an HTML page, and last one, for converting the automaton to Dot graph:
Example usage of xml2text.xsl:
$ xsltproc xslt/xml2text.xsl calc.xml >calc.output
Example usage of xml2xhtml.xsl:
$ xsltproc xslt/xml2xhtml.xsl calc.xml >calc.html
Example usage of xml2dot.xsl:
$ xsltproc xslt/xml2dot.xsl calc.xml >calc.dot
Grammar | Original .output | New XML Report | Processed with xml2text.xsl | Processed with xml2xhtml.xsl |
---|---|---|---|---|
anubis.y | anubis.output | anubis.xml | anubis.output | anubis.xml.html |
awk.y | awk.output | awk.xml | awk.output | awk.xml.html |
bison.y | bison.output | bison.xml | bison.output | bison.xml.html |
calc.y | calc.output | calc.xml | calc.output | calc.xml.html |
c.y | c.output | c.xml | c.output | c.xml.html |
errors.y | errors.output | errors.xml | errors.output | errors.xml.html |
pascal.y | pascal.output | pascal.xml | pascal.output | pascal.xml.html |
rewrite.y | rewrite.output | rewrite.xml | rewrite.output | rewrite.xml.html |
sieve.y | sieve.output | sieve.xml | sieve.output | sieve.xml.html |