Schluessel: 3.45 Generating parsers by formal grammar

3.45 Generating parsers by formal grammar

Procedures for generating parsers by formal grammar.
These procedures can generate a LALR(1) parser.
These procedures also can generate a tokenizer using regular expression.
In the parser which is generated by these procedures, any procedures which return two values, the first value is a symbol expressing a terminal symbol and the second value is an semantic object, can be a lexical analyser.
Regular expression which is available in these procedures is shown as follows. Java standard regular expression can’t use in these procedures.

concatenate r and s

r|s

r or s

repeat r more than 0 times

repeat r more than 1 time

optional

(r)

r itself

any character except newline

[a-zA]

character set

[^a-zA]

complement character set

Function: make-grammar grammar

creates a format grammar object which can use by procedure parse-grammar.
Syntax of grammar is shown as follows.

((left-value-of-grammar-rule (right-value-of-grammar-rule …) procedure-when-the-rule-is-reduced
 …)

Terminal symbols and nonterminal symbols must be Scheme symbols. The start symbol must be the symbol upper case S.
"Procedure-when-the-rule-is-reduced" will be called with the results of right-value-of-grammar-rule. For example, when there is a rules shown as follows (these are a rules defining usual arithmetic operation),

S → S + T
S → S - T
S → T
T → T * F
T → T / F
T → F
F → lparen S rparen
F → num

definition of calculating obeying the grammar is shown as follows.

  (make-grammar
    `((S (S + T)     ,(lambda (t1 x t2) (+ t1 t2)))
      (S (S - T)     ,(lambda (t1 x t2) (- t1 t2)))
      (S (T)         ,(lambda (t1) t1))
      (T (T * F)     ,(lambda (f1 x f2) (* f1 f2)))
      (T (T / F)     ,(lambda (f1 x f2) (* f1 f2)))
      (T (F)         ,(lambda (f1) f1))
      (F (lparen S rparen)
         ,(lambda (x s1 y) s1))
      (F (num)
         ,(lambda (v1) (string->number v1)))))

Symbols lparen, rparen and num will be given by lexical analyser.
If the rule has a conflicts, this procedure doesn’t report any errors.
To get the conflicts you should use procedure get-conflicts.

Function: parse-grammar parser-object lexer

parses objects with the given grammar and lexical analyser.
"Lexical analyser" is a procedure which returns two values, the first value is a symbol expressing a terminal symbol and the second value is an semantic object.

Function: get-conflicts parser-object

gets conflicts about the grammar rule. Syntax of the result is shown as follows.

(…
 (shift-reduce  terminal symbol to shift
              terminal symbol to reduce)   ;shift-reduce conflict
 (reduce-reduce terminal symbol to reduce
              terminal symbol to reduce)   ;reduce-reduce conflict
 …)

Function: make-regexp-tokenize-pattern lexer-patterns

generates a tokenizer by regular expression. Syntax of lexer-patterns is shown as follows.

((string representing a regulae expression . termial symbol)
 …)

Example 1: a tokenizer which tokenize arithmetic operators and numbers.

(make-regexp-tokenize-pattern
    '(("[0-9]+(\\.[0-9]+)?" . num
      ("[ 	
      ]+" . whitespace)  ; space+tab+newline
      ("\\(" . lparen
      ("\\)" . rparen
      ("\\+" . +
      ("-" . -
      ("\\*" . *
      ("\\/" . /))

Function: make-regexp-tokenizer-string tokenizer-pattern string

creates a tokenizer. string is a string to be tokenized.

Function: make-regexp-tokenizer-port tokenizer-pattern iport

creates a tokenizer. iport is an input port to be tokenized.

Function: tokenize tokenizer

tokenizes the input and returns a termial symbol and matched string.

This document was generated on August 9, 2012 using texi2html 5.0.