Linux Manual Pages

Free Software * Books

Source Code

Free Media

Linux

leg(1)

NAME

   peg, leg - parser generators

SYNOPSIS

   peg [-hvV -ooutput] [filename ...]
   leg [-hvV -ooutput] [filename ...]

DESCRIPTION

   peg  and  leg  are  tools  for  generating  recursive-descent  parsers:
   programs that perform pattern matching on text.  They process a Parsing
   Expression  Grammar  (PEG)  [Ford  2004]  to  produce  a  program  that
   recognises legal sentences of that grammar.  peg processes PEGs written
   using the original syntax described by Ford; leg processes PEGs written
   using slightly different syntax and conventions that  are  intended  to
   make  it  an  attractive  replacement for parsers built with lex(1) and
   yacc(1).   Unlike  lex  and  yacc,  peg  and  leg   support   unlimited
   backtracking, provide ordered choice as a means for disambiguation, and
   can  combine  scanning  (lexical  analysis)  and   parsing   (syntactic
   analysis) into a single activity.

   peg  reads  the  specified filenames, or standard input if no filenames
   are given, for a grammar describing the parser to generate.   peg  then
   generates  a  C  source file that defines a function yyparse().  This C
   source file can be included in, or compiled and  then  linked  with,  a
   client  program.   Each  time  the  client  program calls yyparse() the
   parser consumes input text according to  the  parsing  rules,  starting
   from  the first rule in the grammar.  yyparse() returns non-zero if the
   input could be parsed according to the grammar; it returns zero if  the
   input could not be parsed.

   The  prefix 'yy' or 'YY' is prepended to all externally-visible symbols
   in the generated parser.  This  is  intended  to  reduce  the  risk  of
   namespace  pollution  in  client  programs.   (The  choice  of  'yy' is
   historical; see lex(1) and yacc(1), for example.)

OPTIONS

   peg and leg provide the following options:

   -h     prints a summary of available options and then exits.

   -ooutput
          writes the generated parser to the file output  instead  of  the
          standard output.

   -P     suppresses #line directives in the output.

   -v     writes verbose information to standard error while working.

   -V     writes version information to standard error then exits.

A SIMPLE EXAMPLE

   The  following peg input specifies a grammar with a single rule (called
   'start')  that  is  satisfied  when  the  input  contains  the   string
   "username".

       start <- "username"

   (The  quotation  marks  are not part of the matched text; they serve to
   indicate a literal string to be matched.)  In other words, yyparse() in
   the  generated  C  source  will  return non-zero only if the next eight
   characters read from the input spell the word "username".  If the input
   contains  anything  else, yyparse() returns zero and no input will have
   been consumed.  (Subsequent calls to yyparse() will also  return  zero,
   since  the  parser  is  effectively  blocked  looking  for  the  string
   "username".)  To ensure progress we can add an  alternative  clause  to
   the  'start' rule that will match any single character if "username" is
   not found.

       start <- "username"
              / .

   yyparse() now always returns non-zero (except at the very  end  of  the
   input).  To do something useful we can add actions to the rules.  These
   actions are performed after a complete match is  found  (starting  from
   the  first  rule)  and are chosen according to the 'path' taken through
   the grammar to match the input.  (Linguists  would  call  this  path  a
   'phrase marker'.)

       start <- "username"    { printf("%s\n", getlogin()); }
              / < . >         { putchar(yytext[0]); }

   The  first  line  instructs  the  parser to print the user's login name
   whenever it sees "username" in the input.  If  that  match  fails,  the
   second  line  tells  the parser to echo the next character on the input
   the standard output.  Our parser is now performing useful work: it will
   copy  the  input to the output, replacing all occurrences of "username"
   with the user's account name.

   Note the angle brackets ('<' and '>') that were  added  to  the  second
   alternative.   These  have  no  effect  on the meaning of the rule, but
   serve to delimit the text made available to the following action in the
   variable yytext.

   If  the  above  grammar is placed in the file username.peg, running the
   command

       peg -o username.c username.peg

   will save the corresponding parser in the file username.c.  To create a
   complete  program  this  parser  could  be  included  by a C program as
   follows.

       #include <stdio.h>      /* printf(), putchar() */
       #include <unistd.h>     /* getlogin() */

       #include "username.c"   /* yyparse() */

       int main()
       {
         while (yyparse())     /* repeat until EOF */
           ;
         return 0;
       }

PEG GRAMMARS

   A grammar consists of a set of named rules.

       name <- pattern

   The pattern contains one or more of the following elements.

   name   The element stands for the entire pattern in the rule  with  the
          given name.

   "characters"
          A  character  or  string  enclosed  in  double quotes is matched
          literally.  The ANSI C escape sequences  are  recognised  within
          the characters.

   'characters'
          A  character  or  string  enclosed  in  single quotes is matched
          literally, as above.

   [characters]
          A set of characters enclosed  in  square  brackets  matches  any
          single character from the set, with escape characters recognised
          as above.  If the set begins with an uparrow (^) then the set is
          negated (the element matches any character not in the set).  Any
          pair of characters separated with  a  dash  (-)  represents  the
          range  of characters from the first to the second, inclusive.  A
          single alphabetic character or  underscore  is  matched  by  the
          following set.

              [a-zA-Z_]

          Similarly,   the   following   matches    any  single  non-digit
          character.

              [^0-9]

   .      A dot matches any character.  Note that the only time this fails
          is at the end of file, where there is no character to match.

   ( pattern )
          Parentheses  are  used for grouping (modifying the precedence of
          the operators described below).

   { action }
          Curly braces surround actions.  The action is arbitrary C source
          code  to  be executed at the end of matching.  Any braces within
          the action must be properly nested.  Any  input  text  that  was
          matched  before  the action and delimited by angle brackets (see
          below) is made available within the action as  the  contents  of
          the character array yytext.  The length of (number of characters
          in) yytext is available in the variable yyleng.  (These variable
          names are historical; see lex(1).)

   <      An opening angle bracket always matches (consuming no input) and
          causes the parser to begin accumulating matched text.  This text
          will be made available to actions in the variable yytext.

   >      A  closing angle bracket always matches (consuming no input) and
          causes the parser to stop accumulating text for yytext.

   The above elements can be made  optional  and/or  repeatable  with  the
   following suffixes:

   element ?
          The  element  is  optional.   If  present  on  the  input, it is
          consumed and the match succeeds.  If not present on  the  input,
          no text is consumed and the match succeeds anyway.

   element +
          The element is repeatable.  If present on the input, one or more
          occurrences of element are consumed and the match succeeds.   If
          no  occurrences  of  element are present on the input, the match
          fails.

   element *
          The element is optional  and  repeatable.   If  present  on  the
          input,  one  or more occurrences of element are consumed and the
          match succeeds.  If no occurrences of element are present on the
          input, the match succeeds anyway.

   The  above elements and suffixes can be converted into predicates (that
   match arbitrary input text and subsequently  succeed  or  fail  without
   consuming that input) with the following prefixes:

   & element
          The  predicate  succeeds  only if element can be matched.  Input
          text scanned while matching element is  not  consumed  from  the
          input and remains available for subsequent matching.

   ! element
          The predicate succeeds only if element cannot be matched.  Input
          text scanned while matching element is  not  consumed  from  the
          input  and remains available for subsequent matching.  A popular
          idiom is

              !.

          which matches the end of file, after the last character  of  the
          input has already been consumed.

   A special form of the '&' predicate is provided:

   &{ expression }
          In  this  predicate  the  simple C expression (not statement) is
          evaluated immediately when the parser reaches the predicate.  If
          the  expression  yields non-zero (true) the 'match' succeeds and
          the parser continues with the next element in the  pattern.   If
          the  expression  yields  zero  (false) the 'match' fails and the
          parser backs up to look for an alternative parse of the input.

   Several elements  (with  or  without  prefixes  and  suffixes)  can  be
   combined  into  a  sequence  by  writing them one after the other.  The
   entire sequence matches only  if  each  individual  element  within  it
   matches, from left to right.

   Sequences   can   be   separated  into  disjoint  alternatives  by  the
   alternation operator '/'.

   sequence-1 / sequence-2 / ... / sequence-N
          Each sequence is tried in turn until one  of  them  matches,  at
          which  time  matching for the overall pattern succeeds.  If none
          of the sequences matches then the match of the  overall  pattern
          fails.

   Finally,  the  pound  sign  (#)  introduces  a comment (discarded) that
   continues until the end of the line.

   To summarise the above, the  parser  tries  to  match  the  input  text
   against  a  pattern  containing  literals,  names  (representing  other
   rules),  and  various  operators  (written   as   prefixes,   suffixes,
   juxtaposition  for  sequencing and and infix alternation operator) that
   modify how the elements within the pattern are  matched.   Matches  are
   made  from left to right, 'descending' into named sub-rules as they are
   encountered.  If the matching process fails, the parser  'back  tracks'
   ('rewinding'  the  input  appropriately  in  the  process)  to find the
   nearest alternative 'path' through the grammar.   In  other  words  the
   parser  performs  a  depth-first,  left-to-right  search  for the first
   successfully-matching path through the rules.  If  found,  the  actions
   along  the  successful  path  are  executed  (in  the  order  they were
   encountered).

   Note that predicates are evaluated immediately during the search for  a
   successful  match,  since  they contribute to the success or failure of
   the search.  Actions, however, are evaluated only  after  a  successful
   match has been found.

PEG GRAMMAR FOR PEG GRAMMARS

   The grammar for peg grammars is shown below.  This will both illustrate
   and formalise the above description.

       Grammar         <- Spacing Definition+ EndOfFile

       Definition      <- Identifier LEFTARROW Expression
       Expression      <- Sequence ( SLASH Sequence )*
       Sequence        <- Prefix*
       Prefix          <- AND Action
                        / ( AND | NOT )? Suffix
       Suffix          <- Primary ( QUERY / STAR / PLUS )?
       Primary         <- Identifier !LEFTARROW
                        / OPEN Expression CLOSE
                        / Literal
                        / Class
                        / DOT
                        / Action
                        / BEGIN
                        / END

       Identifier      <- < IdentStart IdentCont* > Spacing
       IdentStart      <- [a-zA-Z_]
       IdentCont       <- IdentStart / [0-9]
       Literal         <- ['] < ( !['] Char  )* > ['] Spacing
                        / ["] < ( !["] Char  )* > ["] Spacing
       Class           <- '[' < ( !']' Range )* > ']' Spacing
       Range           <- Char '-' Char / Char
       Char            <- '\\' [abefnrtv'"\[\]\\]
                        / '\\' [0-3][0-7][0-7]
                        / '\\' [0-7][0-7]?
                        / '\\' '-'
                        / !'\\' .
       LEFTARROW       <- '<-' Spacing
       SLASH           <- '/' Spacing
       AND             <- '&' Spacing
       NOT             <- '!' Spacing
       QUERY           <- '?' Spacing
       STAR            <- '*' Spacing
       PLUS            <- '+' Spacing
       OPEN            <- '(' Spacing
       CLOSE           <- ')' Spacing
       DOT             <- '.' Spacing
       Spacing         <- ( Space / Comment )*
       Comment         <- '#' ( !EndOfLine . )* EndOfLine
       Space           <- ' ' / '\t' / EndOfLine
       EndOfLine       <- '\r\n' / '\n' / '\r'
       EndOfFile       <- !.
       Action          <- '{' < [^}]* > '}' Spacing
       BEGIN           <- '<' Spacing
       END             <- '>' Spacing

LEG GRAMMARS

   leg is a variant of peg that adds some features of lex(1) and  yacc(1).
   It differs from peg in the following ways.

   %{ text... %}
          A declaration section can appear anywhere that a rule definition
          is expected.  The text between the delimiters '%{' and  '%}'  is
          copied  verbatim  to the generated C parser code before the code
          that implements the parser itself.

   name = pattern
          The 'assignment' operator replaces the left arrow operator '<-'.

   rule-name
          Hyphens can appear as letters  in  the  names  of  rules.   Each
          hyphen is converted into an underscore in the generated C source
          code.  A single hyphen '-' is a legal rule name.

              -       = [ \t\n\r]*
              number  = [0-9]+                 -
              name    = [a-zA-Z_][a-zA_Z_0-9]* -
              l-paren = '('                    -
              r-paren = ')'                    -

          This example shows how ignored whitespace can  be  obvious  when
          reading the grammar and yet unobtrusive when placed liberally at
          the end of every rule associated with a lexical element.

   seq-1 | seq-2
          The alternation operator is vertical bar '|' rather than forward
          slash '/'.  The peg rule

              name <- sequence-1
                    / sequence-2
                    / sequence-3

          is therefore written

              name = sequence-1
                   | sequence-2
                   | sequence-3
                   ;

          in  leg  (with  the final semicolon being optional, as described
          next).

   @{ action }
          Actions prefixed with an 'at' symbol will  be  performed  during
          parsing,  at  the  time  they are encountered while matching the
          input text with a rule.  Because of  back-tracking  in  the  PEG
          parsing  algorithm, actions prefixed with '@' might be performed
          multiple times for the same input text.  (The usual behviour  of
          actions  is  that  they are saved up until matching is complete,
          and then those  that  are  part  of  the  final  derivation  are
          performed  in  left-to-right  order.)   The  variable  yytext is
          available within these actions.

   exp ~ { action }
          A  postfix  operator  ~{ action }  can  be  placed   after   any
          expression  and  will  behave  like a normal action (arbitrary C
          code) except that it is invoked only when exp fails.   It  binds
          less  tightly  than  any  other  operator except alternation and
          sequencing, and is intended to make error handling and  recovery
          code  easier  to  write.   Note  that  yytext and yyleng are not
          available inside these actions, but the pointer variable  yy  is
          available to give the code access to any user-defined members of
          the parser state (see "CUSTOMISING  THE  PARSER"  below).   Note
          also  that exp is always a single expression; to invoke an error
          action for any failure within a sequence,  parentheses  must  be
          used to group the sequence into a single expression.

              rule = e1 e2 e3 ~{ error("e[12] ok; e3 has failed"); }
                   | ...

              rule = (e1 e2 e3) ~{ error("one of e[123] has failed"); }
                   | ...

   pattern ;
          A semicolon punctuator can optionally terminate a pattern.

   %% text...
          A  double  percent  '%%' terminates the rules (and declarations)
          section of the grammar.   All  text  following  '%%'  is  copied
          verbatim  to  the  generated  C  parser  code  after  the parser
          implementation code.

   $$ = value
          A sub-rule can  return  a  semantic  value  from  an  action  by
          assigning  it  to the pseudo-variable '$$'.  All semantic values
          must have the same type (which defaults to  'int').   This  type
          can be changed by defining YYSTYPE in a declaration section.

   identifier:name
          The  semantic  value  returned  (by  assigning to '$$') from the
          sub-rule name is associated  with  the  identifier  and  can  be
          referred to in subsequent actions.

   The desk calculator example below illustrates the use of '$$' and ':'.

LEG EXAMPLE: A DESK CALCULATOR

   The  extensions  in  leg  described  above  allow  useful  parsers  and
   evaluators (including declarations, grammar  rules,  and  supporting  C
   functions  such  as 'main') to be kept within a single source file.  To
   illustrate this we show a simple desk calculator  supporting  the  four
   common  arithmetic  operators  and  named  variables.  The intermediate
   results of arithmetic evaluation will be  accumulated  on  an  implicit
   stack by returning them as semantic values from sub-rules.

       %{
       #include <stdio.h>     /* printf() */
       #include <stdlib.h>    /* atoi() */
       int vars[26];
       %}

       Stmt    = - e:Expr EOL                  { printf("%d\n", e); }
               | ( !EOL . )* EOL               { printf("error\n"); }

       Expr    = i:ID ASSIGN s:Sum             { $$ = vars[i] = s; }
               | s:Sum                         { $$ = s; }

       Sum     = l:Product
                       ( PLUS  r:Product       { l += r; }
                       | MINUS r:Product       { l -= r; }
                       )*                      { $$ = l; }

       Product = l:Value
                       ( TIMES  r:Value        { l *= r; }
                       | DIVIDE r:Value        { l /= r; }
                       )*                      { $$ = l; }

       Value   = i:NUMBER                      { $$ = atoi(yytext); }
               | i:ID !ASSIGN                  { $$ = vars[i]; }
               | OPEN i:Expr CLOSE             { $$ = i; }

       NUMBER  = < [0-9]+ >    -               { $$ = atoi(yytext); }
       ID      = < [a-z]  >    -               { $$ = yytext[0] - 'a'; }
       ASSIGN  = '='           -
       PLUS    = '+'           -
       MINUS   = '-'           -
       TIMES   = '*'           -
       DIVIDE  = '/'           -
       OPEN    = '('           -
       CLOSE   = ')'           -

       -       = [ \t]*
       EOL     = '\n' | '\r\n' | '\r' | ';'

       %%

       int main()
       {
         while (yyparse())
           ;
         return 0;
       }

LEG GRAMMAR FOR LEG GRAMMARS

   The grammar for leg grammars is shown below.  This will both illustrate
   and formalise the above description.

       grammar =       -
                       ( declaration | definition )+
                       trailer? end-of-file

       declaration =   '%{' < ( !'%}' . )* > RPERCENT

       trailer =       '%%' < .* >

       definition =    identifier EQUAL expression SEMICOLON?

       expression =    sequence ( BAR sequence )*

       sequence =      error+

       error =         prefix ( TILDE action )?

       prefix =        AND action
       |               ( AND | NOT )? suffix

       suffix =        primary ( QUERY | STAR | PLUS )?

       primary =       identifier COLON identifier !EQUAL
       |               identifier !EQUAL
       |               OPEN expression CLOSE
       |               literal
       |               class
       |               DOT
       |               action
       |               BEGIN
       |               END

       identifier =    < [-a-zA-Z_][-a-zA-Z_0-9]* > -

       literal =       ['] < ( !['] char )* > ['] -
       |               ["] < ( !["] char )* > ["] -

       class =         '[' < ( !']' range )* > ']' -

       range =         char '-' char | char

       char =          '\\' [abefnrtv'"\[\]\\]
       |               '\\' [0-3][0-7][0-7]
       |               '\\' [0-7][0-7]?
       |               !'\\' .

       action =        '{' < braces* > '}' -

       braces =        '{' braces* '}'
       |               !'}' .

       EQUAL =         '=' -
       COLON =         ':' -
       SEMICOLON =     ';' -
       BAR =           '|' -
       AND =           '&' -
       NOT =           '!' -
       QUERY =         '?' -
       STAR =          '*' -
       PLUS =          '+' -
       OPEN =          '(' -
       CLOSE =         ')' -
       DOT =           '.' -
       BEGIN =         '<' -
       END =           '>' -
       TILDE =         '~' -
       RPERCENT =      '%}' -

       - =             ( space | comment )*
       space =         ' ' | '\t' | end-of-line
       comment =       '#' ( !end-of-line . )* end-of-line
       end-of-line =   '\r\n' | '\n' | '\r'
       end-of-file =   !.

CUSTOMISING THE PARSER

   The following symbols can  be  redefined  in  declaration  sections  to
   modify the generated parser code.

   YYSTYPE
          The  semantic  value  type.   The  pseudo-variable  '$$' and the
          identifiers 'bound' to rule results with the colon operator  ':'
          should  all  be  considered as being declared to have this type.
          The default value is 'int'.

   YYPARSE
          The name of the main entry point to  the  parser.   The  default
          value is 'yyparse'.

   YYPARSEFROM
          The  name  of  an  alternative  entry point to the parser.  This
          function expects one argument: the function corresponding to the
          rule  from  which  the  search  for  a  match should begin.  The
          default is 'yyparsefrom'.  Note that yyparse() is defined as

              int yyparse() { return yyparsefrom(yy_foo); }

          where 'foo' is the name of the first rule in the grammar.

   YY_INPUT(buf, result, max_size)
          This macro is invoked by the parser to obtain more  input  text.
          buf  points  to an area of memory that can hold at most max_size
          characters.  The macro should copy input text to  buf  and  then
          assign  the  integer  variable  result to indicate the number of
          characters copied.  If no more input  is  available,  the  macro
          should  assign  0  to result.  By default, the YY_INPUT macro is
          defined as follows.

              #define YY_INPUT(buf, result, max_size)        \
              {                                              \
                int yyc= getchar();                          \
                result= (EOF == yyc) ? 0 : (*(buf)= yyc, 1); \
              }

          Note that  if  YY_CTX_LOCAL  is  defined  (see  below)  then  an
          additional  first  argument,  containing  the parser context, is
          passed to YY_INPUT.

   YY_DEBUG
          If this symbols is defined then additional code will be included
          in  the parser that prints vast quantities of arcane information
          to the standard error while the parser is running.

   YY_BEGIN
          This macro is invoked to mark the start of input text that  will
          be  made  available in actions as 'yytext'.  This corresponds to
          occurrences of '<' in the grammar.   These  are  converted  into
          predicates that are expected to succeed.  The default definition

              #define YY_BEGIN (yybegin= yypos, 1)

          therefore  saves  the  current  input  position  and  returns  1
          ('true') as the result of the predicate.

   YY_END This macros corresponds to '>' in the grammar.  Again, it  is  a
          predicate  so  the  default  definition saves the input position
          before 'succeeding'.

              #define YY_END (yyend= yypos, 1)

   YY_PARSE(T)
          This  macro  declares  the  parser  entry  points  (yyparse  and
          yyparsefrom) to be of type T.  The default definition

              #define YY_PARSE(T) T

          leaves  yyparse()  and yyparsefrom() with global visibility.  If
          they should not be externally visible  in  other  source  files,
          this macro can be redefined to declare them 'static'.

              #define YY_PARSE(T) static T

   YY_CTX_LOCAL
          If  this  symbol  is  defined  during compilation of a generated
          parser then global parser state will be kept in a  structure  of
          type  'yycontext'  which  can  be  declared as a local variable.
          This allows multiple instances of parsers to coexist and  to  be
          thread-safe.  The parsing function yyparse() will be declared to
          expect a first argument of type 'yycontext *',  an  instance  of
          the  structure  holding  the  global state for the parser.  This
          instance must be  allocated  and  initialised  to  zero  by  the
          client.  A trivial but complete example is as follows.

              #include <stdio.h>

              #define YY_CTX_LOCAL

              #include "the-generated-parser.peg.c"

              int main()
              {
                yycontext ctx;
                memset(&ctx, 0, sizeof(yycontext));
                while (yyparse(&ctx));
                return 0;
              }

          Note  that  if this symbol is undefined then the compiled parser
          will statically allocate its global state and  will  be  neither
          reentrant  nor thread-safe.  Note also that the parser yycontext
          structure is initialised automatically the first time  yyparse()
          is called; this structure must therefore be properly initialised
          to zero before the first call to yyparse().

   YY_CTX_MEMBERS
          If  YY_CTX_LOCAL  is  defined  (see  above)   then   the   macro
          YY_CTX_MEMBERS can be defined to expand to any additional member
          field declarations that the client would like  included  in  the
          declaration of the 'yycontext' structure type.  These additional
          members are otherwise ignored  by  the  generated  parser.   The
          instance  of  'yycontext'  associated  with the currently-active
          parser is available within actions as the pointer variable yy.

   YY_BUFFER_SIZE
          The initial size of the text buffer, in bytes.  The  default  is
          1024  and  the  buffer size is doubled whenever required to meet
          demand during parsing.  An  application  that  typically  parses
          much  longer  strings  could  increase this to avoid unnecessary
          buffer reallocation.

   YY_STACK_SIZE
          The initial size of the variable and action stacks.  The default
          is 128, which is doubled whenever required to meet demand during
          parsing.  Applications that have  deep  call  stacks  with  many
          local  variables,  or  that  perform many actions after a single
          successful match,  could  increase  this  to  avoid  unnecessary
          buffer reallocation.

   YY_MALLOC(YY, SIZE)
          The  memory  allocator  for  all  parser-related  storage.   The
          parameters are the current yycontext structure and the number of
          bytes to allocate.  The default definition is: malloc(SIZE)

   YY_REALLOC(YY, PTR, SIZE)
          The  memory  reallocator  for dynamically-grown storage (such as
          text buffers and  variable  stacks).   The  parameters  are  the
          current  yycontext  structure, the previously-allocated storage,
          and the number of bytes to which that storage should  be  grown.
          The default definition is: realloc(PTR, SIZE)

   YY_FREE(YY, PTR)
          The   memory   deallocator.   The  parameters  are  the  current
          yycontext structure and the storage to deallocate.  The  default
          definition is: free(PTR)

   YYRELEASE
          The  name  of the function that releases all resources held by a
          yycontext structure.  The default value is 'yyrelease'.

   The following variables can be referred to within actions.

   char *yybuf
          This variable points to the parser's input buffer used to  store
          input text that has not yet been matched.

   int yypos
          This  is  the  offset  (in  yybuf)  of  the next character to be
          matched and consumed.

   char *yytext
          The most recent matched text delimited by '<' and '>' is  stored
          in this variable.

   int yyleng
          This variable indicates the number of characters in 'yytext'.

   yycontext *yy
          This  variable  points to the instance of 'yycontext' associated
          with the currently-active parser.

   Programs that wish to release  all  the  resources  associated  with  a
   parser can use the following function.

   yyrelease(yycontext*yy)
          Returns  all  parser-allocated storage associated with yy to the
          system.  The storage will be reallocated on  the  next  call  to
          yyparse().

   Note  that  the  storage  for  the  yycontext structure itself is never
   allocated or reclaimed implicitly.  The application must allocate these
   structures  in  automatic storage, or use calloc() and free() to manage
   them explicitly.  The example in the following section demonstrates one
   approach to resource management.

LEG EXAMPLE: EXTENDING THE PARSER'S CONTEXT

   The yy variable passed to actions contains the state of the parser plus
   any additional fields defined by YY_CTX_MEMBERS.  Theses fields can  be
   used  to  store  application-specific  information  that is global to a
   particular call of yyparse().   A  trivial  but  complete  leg  example
   follows  in  which  the yycontext structure is extended with a count of
   the number of newline characters seen in the input so far (the  grammar
   otherwise  consumes  and  ignores  the  entire  input).   The caller of
   yyparse() uses count to print the number of lines of  input  that  were
   read.

       %{
       #define YY_CTX_LOCAL 1
       #define YY_CTX_MEMBERS \
         int count;
       %}

       Char    = ('\n' | '\r\n' | '\r')        { yy->count++ }
               | .

       %%

       #include <stdio.h>
       #include <string.h>

       int main()
       {
           /* create a local parser context in automatic storage */
           yycontext yy;
           /* the context *must* be initialised to zero before first use*/
           memset(&yy, 0, sizeof(yy));

           while (yyparse(&yy))
               ;
           printf("%d newlines\n", yy.count);

           /* release all resources associated with the context */
           yyrelease(&yy);

           return 0;
       }

DIAGNOSTICS

   peg  and  leg  warn  about  the following conditions while converting a
   grammar into a parser.

   syntax error
          The input grammar was malformed in some way.  The error  message
          will  include  the  text  about to be matched (often backed up a
          huge amount from the actual location of the error) and the  line
          number of the most recently considered character (which is often
          the real location of the problem).

   rule 'foo' used but not defined
          The grammar referred to a rule named 'foo' but no definition for
          it  was  given.   Attempting  to  use  the generated parser will
          likely result in errors from the linker due to undefined symbols
          associated with the missing rule.

   rule 'foo' defined but not used
          The grammar defined a rule named 'foo' and then ignored it.  The
          code associated with the  rule  is  included  in  the  generated
          parser which will in all other respects be healthy.

   possible infinite left recursion in rule 'foo'
          There  exists  at  least one path through the grammar that leads
          from the rule 'foo' back to (a recursive invocation of) the same
          rule without consuming any input.

   Left  recursion, especially that found in standards documents, is often
   'direct' and implies trivial repetition.

       # (6.7.6)
       direct-abstract-declarator =
           LPAREN abstract-declarator RPAREN
       |   direct-abstract-declarator? LBRACKET assign-expr? RBRACKET
       |   direct-abstract-declarator? LBRACKET STAR RBRACKET
       |   direct-abstract-declarator? LPAREN param-type-list? RPAREN

   The recursion can easily be eliminated by converting the parts  of  the
   pattern following the recursion into a repeatable suffix.

       # (6.7.6)
       direct-abstract-declarator =
           direct-abstract-declarator-head?
           direct-abstract-declarator-tail*

       direct-abstract-declarator-head =
           LPAREN abstract-declarator RPAREN

       direct-abstract-declarator-tail =
           LBRACKET assign-expr? RBRACKET
       |   LBRACKET STAR RBRACKET
       |   LPAREN param-type-list? RPAREN

CAVEATS

   A  parser  that  accepts empty input will always succeed.  Consider the
   following example, not atypical of a first attempt to write a PEG-based
   parser:

       Program = Expression*
       Expression = "whatever"
       %%
       int main() {
         while (yyparse())
           puts("success!");
         return 0;
       }

   This  program  loops forever, no matter what (if any) input is provided
   on stdin.  Many fixes are possible, the easiest being  to  insist  that
   the  parser  always  consumes some non-empty input.  Changing the first
   line to

       Program = Expression+

   accomplishes this.  If the parser is expected  to  consume  the  entire
   input,  then  explicitly  requiring  the  end-of-file  is  also  highly
   recommended:

       Program = Expression+ !.

   This works because the parser will only fail to match  ("!"  predicate)
   any  character  at all ("." expression) when it attempts to read beyond
   the end of the input.

BUGS

   You have to type 'man peg' to read the manual page for leg(1).

   The 'yy' and 'YY' prefixes cannot be changed.

   Left recursion is detected in the input  grammar  but  is  not  handled
   correctly in the generated parser.

   Diagnostics  for  errors  in  the  input  grammar  are  obscure and not
   particularly helpful.

   The operators ! and ~ should really be named the other way around.

   Several  commonly-used  lex(1)  features  (yywrap(),  yyin,  etc.)  are
   completely absent.

   The  generated  parser  does not contain '#line' directives to direct C
   compiler errors back to the grammar description when appropriate.

AUTHOR

   peg,  leg and this manual page were written by Ian Piumarta (first-name
   at last-name dot com) while investigating the viability of regular  and
   parsing-expression   grammars   for  efficiently  extracting  type  and
   signature information from C header files.

   Please send bug reports and suggestions for improvements to the  author
   at the above address.

Opportunity

Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.

Free Software

Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.

Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.

Free Books

The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.

Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.

Education

Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.

Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.