GNU Bison XML Automaton Report

Table of Contents

Introduction

GNU Bison is a general-purpose compiler-compiler (parser generator) that converts an annotated context-free grammar into an LALR(1) or GLR parser for that grammar. Developing a parser can be a challenging task, requiring extensive amounts of debugging and tracing. This debugging is impossible without a description of the produced automaton, that contains detailed information about automaton states and tokens that trigger state transitions. Such a description is generated by Bison when invoked with the `--report' or `-v' option. It is stored in a plain text file named *.output and contains the following information:

The plain text format traditionally used in parser output reports makes navigation in these files difficult, especially in case of large and complex parsers. Producing reports in an XML format may be a solution to this problem. Files in XML are machine-readable while remaining relatively human-legible. The XML language allows for performing a wide range of transformations, thus an XML report can easily be converted into another format (including the traditional plain text one). In particular, it can be converted into an HTML page with rich text formatting and appropriate hyperlinks, facilitating the navigation through it. This would increase efficiency of reading, analyzing, and navigating through the report.

This document describes format for XML automaton reports.
This format and its implementation is now part of the Bison mainline.

XML grammar

Let's take as an example the calc.y grammar used in the GNU Bison Manual:

  %token NUM STR
  %left '+' '-'
  %left '*'
  %%
  exp: exp '+' exp
     | exp '-' exp
     | exp '*' exp
     | exp '/' exp
     | NUM
     ;
  useless: STR;
  %%

Running Bison with `--report=all' creates a full automaton report, whose content has been outlined in the Introduction (the detailed description is available in the manual.) An online copy of the produced report is available for your reference.

This section describes the approach used to transform this sample .output file into an XML format.

The XML report starts with the <bison-xml-report> element containing the version attribute.

  <?xml version="1.0"?>
  <bison-xml-report version="2.3a+">
  ...
  </bison-xml-report>

The following elements are allowed inside <bison-xml-report>:

Reductions

The <reductions> element contains information about unused terminals, useless nonterminals and rules. It allows the following elements inside:

  <reductions>

    <useless>
      <nonterminals>
        <nonterminal>useless</nonterminal>
      </nonterminals>
      <rules>
        <rule number="6">
          <lhs>useless</lhs>
          <rhs>
            <symbol class="terminal">STR</symbol>
          </rhs>
        </rule>
      </rules>
    </useless>

    <unused>
      <terminals>
        <terminal>STR</terminal>
      </terminals>
    </unused>

  </reductions>

The <useless> element contains <nonterminals> and <rules>. Both of them may contain zero or more child elements. The <unused> element contains <terminals> with zero or more child elements.

The <nonterminal> and <terminal> elements contain text, which is a symbol name.

The <rule> element, with a rule number written as the value of the number attribute, contains two children: <lhs> (a left-hand side of the production) and <rhs> (the right-hand side of the production). The former contains the name of a nonterminal symbol this rule reduces to, and the latter contains zero or more <symbol> and <empty> elements.

The <symbol> element contains the name of a symbol used in RHS of the production. The class attribute tells about the class of this symbol, which can be terminal or nonterminal.

Conflicts

  <conflicts>
    <conflict state="8" num="1" type="shift/reduce"/>
    <conflict state="9" num="1" type="shift/reduce"/>
    <conflict state="10" num="1" type="shift/reduce"/>
    <conflict state="11" num="4" type="shift/reduce"/>
  </conflicts>

The <conflicts> element has zero or more <conflict> elements each of which describes a state that has non-solved conflicts along with their number and type.

Grammar

The <grammar> element contains a summary of the grammar (<rules>) and information about the symbols usage (<terminals>, <nonterminals>).

  <grammar>

    <rules>
      <rule number="0">
        <lhs>$accept</lhs>
        <rhs>
          <symbol class="nonterminal">exp</symbol>
          <symbol class="terminal">$end</symbol>
        </rhs>
      </rule>
      ...
    </rules>

    <terminals>
      <terminal number="0" name="$end">
        <rule>0</rule>
      </terminal>
      ...
    </terminals>

    <nonterminals>
      <nonterminal number="9" name="$accept">
        <left>
          <rule>0</rule>
        </left>
      </nonterminal>
      ...
    </nonterminals>

  </grammar>

Automaton

The <automaton> element contains a set of items also known as pointed rules (<item>) with look-aheads (<lookaheads> - when `--report=all' or `--report=lookahead' was specified). It also contains <actions> element with possible <transitions>, <errors>, <reductions> and <solved-conflicts> enclosed elements.

 <automaton>
   ...
   <state number="8">
     <itemset>
       <item rule-number="1" point="1"/>
       <item rule-number="1" point="3">
         <lookaheads>
           <symbol class="terminal">$end</symbol>
           <symbol class="terminal">'+'</symbol>
           <symbol class="terminal">'-'</symbol>
           <symbol class="terminal">'/'</symbol>
         </lookaheads>
       </item>
       <item rule-number="2" point="1"/>
       <item rule-number="3" point="1"/>
       <item rule-number="4" point="1"/>
     </itemset>

     <actions>
       <transitions>
         <transition type="shift" symbol="'*'" state="6"/>
         <transition type="shift" symbol="'/'" state="7"/>
       </transitions>
       <errors/>
       <reductions>
         <reduction symbol="'/'" rule="1" enabled="false"/>
         <reduction symbol="$default" rule="1" enabled="true"/>
       </reductions>
     </actions>

     <solved-conflicts>
       <resolution rule="1" symbol="'+'" type="reduce">%left '+'</resolution>
       <resolution rule="1" symbol="'-'" type="reduce">%left '-'</resolution>
       <resolution rule="1" symbol="'*'" type="shift">'+' &lt; '*'</resolution>
     </solved-conflicts>

   </state>
   ...
 </automaton>

XML Transformation

XSLT is the best known XML transformation language. It transforms a given XML into other XML or any other human-readable format, as, for instance, HTML or plain text. XSLT is heavily influenced by functional languages.

Three complete XSLT transformations are available, one for converting an XML report to Bison's traditonal *.output file format (including all text formatting and indentation!), another one, for converting the report to an HTML page, and last one, for converting the automaton to Dot graph:

Example usage of xml2text.xsl:

  $ xsltproc xslt/xml2text.xsl calc.xml >calc.output

Example usage of xml2xhtml.xsl:

  $ xsltproc xslt/xml2xhtml.xsl calc.xml >calc.html

Example usage of xml2dot.xsl:

  $ xsltproc xslt/xml2dot.xsl calc.xml >calc.dot

Samples

Grammar Original .output New XML Report Processed with xml2text.xsl Processed with xml2xhtml.xsl
anubis.y anubis.output anubis.xml anubis.output anubis.xml.html
awk.y awk.output awk.xml awk.output awk.xml.html
bison.y bison.output bison.xml bison.output bison.xml.html
calc.y calc.output calc.xml calc.output calc.xml.html
c.y c.output c.xml c.output c.xml.html
errors.y errors.output errors.xml errors.output errors.xml.html
pascal.y pascal.output pascal.xml pascal.output pascal.xml.html
rewrite.y rewrite.output rewrite.xml rewrite.output rewrite.xml.html
sieve.y sieve.output sieve.xml sieve.output sieve.xml.html

Other Articles