Linux Manual Pages

Free Software * Books

Source Code

Free Media

Linux

re2c(1)

NAME

   re2c - convert regular expressions to C/C++ code

SYNOPSIS

   re2c [OPTIONS] FILE

DESCRIPTION

   re2c  is  a  lexer  generator  for  C/C++.  It finds regular expression
   specifications inside of  C/C++  comments  and  replaces  them  with  a
   hard-coded  DFA.  The  user must supply some interface code in order to
   control and customize the generated DFA.

OPTIONS

   -? -h --help
          Invoke a short help.

   -b --bit-vectors
          Implies -s. Use bit vectors as  well  in  the  attempt  to  coax
          better  code out of the compiler. Most useful for specifications
          with more  than  a  few  keywords  (e.g.  for  most  programming
          languages).

   -c --conditions
          Used to support (f)lex-like condition support.

   -d --debug-output
          Creates  a  parser  that  dumps  information  about  the current
          position and in which state the  parser  is  while  parsing  the
          input.  This is useful to debug parser issues and states. If you
          use this switch you need to  define  a  macro  YYDEBUG  that  is
          called  like  a  function with two parameters: void YYDEBUG (int
          state, char current).  The first parameter receives the state or
          -1  and  the  second parameter receives the input at the current
          cursor.

   -D --emit-dot
          Emit Graphviz dot data. It can then be processed with  e.g.  dot
          -Tpng  input.dot  >  output.png.  Please note that scanners with
          many states may crash dot.

   -e --ecb
          Generate a parser that supports EBCDIC. The generated  code  can
          deal  with  any  character up to 0xFF. In this mode re2c assumes
          that input character size is 1 byte. This switch is incompatible
          with -w, -x, -u and -8.

   -f --storable-state
          Generate a scanner with support for storable state.

   -F --flex-syntax
          Partial  support  for flex syntax. When this flag is active then
          named definitions must be surrounded by curly braces and can  be
          defined  without  an  equal sign and the terminating semi colon.
          Instead names are treated as direct double quoted strings.

   -g --computed-gotos
          Generate a scanner that utilizes GCC's  computed  goto  feature.
          That  is  re2c generates jump tables whenever a decision is of a
          certain complexity (e.g. a lot of if  conditions  are  otherwise
          necessary).  This  is  only useable with GCC and produces output
          that cannot be compiled with any other compiler. Note that  this
          implies  -b  and that the complexity threshold can be configured
          using the inplace configuration cgoto:threshold.

   -i --no-debug-info
          Do not output #line information. This is useful  when  you  want
          use  a CMS tool with the re2c output which you might want if you
          do not require your users to have re2c themselves when  building
          from your source.

   -o OUTPUT --output=OUTPUT
          Specify the OUTPUT file.

   -r --reusable
          Allows  reuse  of  scanner definitions with /*!use:re2c */ after
          /*!rules:re2c */.  In this mode no /*!re2c */ block and  exactly
          one /*!rules:re2c */ must be present.  The rules are being saved
          and used by every /*!use:re2c  */  block  that  follows.   These
          blocks    can   contain   inplace   configurations,   especially
          re2c:flags:e,  re2c:flags:w,  re2c:flags:x,   re2c:flags:u   and
          re2c:flags:8.   That  way  it  is  possible  to  create the same
          scanner multiple times for different character types,  different
          input   mechanisms   or   different   output   mechanisms.   The
          /*!use:re2c */ blocks can also  contain  additional  rules  that
          will be appended to the set of rules in /*!rules:re2c */.

   -s --nested-ifs
          Generate  nested ifs for some switches. Many compilers need this
          assist to generate better code.

   -t HEADER --type-header=HEADER
          Create a HEADER file that contains  types  for  the  (f)lex-like
          condition support. This can only be activated when -c is in use.

   -u --unicode
          Generate  a  parser that supports UTF-32. The generated code can
          deal with any valid Unicode character up to  0x10FFFF.  In  this
          mode  re2c  assumes  that  input character size is 4 bytes. This
          switch is incompatible with -e, -w, -x and -8. This implies -s.

   -v --version
          Show version information.

   -V --vernum
          Show the version as a number XXYYZZ.

   -w --wide-chars
          Generate a parser that supports UCS-2. The  generated  code  can
          deal  with  any  valid  Unicode character up to 0xFFFF.  In this
          mode re2c assumes that input character size  is  2  bytes.  This
          switch is incompatible with -e, -x, -u and -8. This implies -s.

   -x --utf-16
          Generate  a  parser that supports UTF-16. The generated code can
          deal with any valid Unicode character up to  0x10FFFF.  In  this
          mode  re2c  assumes  that  input character size is 2 bytes. This
          switch is incompatible with -e, -w, -u and -8. This implies -s.

   -8 --utf-8
          Generate a parser that supports UTF-8. The  generated  code  can
          deal  with  any  valid Unicode character up to 0x10FFFF. In this
          mode re2c assumes that input character  size  is  1  byte.  This
          switch is incompatible with -e, -w, -x and -u.

   --case-insensitive
          All  strings  are  case  insensitive,  so  all "-expressions are
          treated in the same way '-expressions are.

   --case-inverted
          Invert the meaning of single and  double  quoted  strings.  With
          this  switch  single quotes are case sensitive and double quotes
          are case insensitive.

   --no-generation-date
          Suppress date output in the generated file.

   --no-generation-date
          Suppress version output in the generated file.

   --encoding-policy POLICY
          Specify how re2c must treat Unicode surrogates.  POLICY  can  be
          one  of  the  following:  fail  (abort with error when surrogate
          encountered), substitute  (silently  substitute  surrogate  with
          error  code  point  0xFFFD),  ignore (treat surrogates as normal
          code points). By default re2c ignores surrogates  (for  backward
          compatibility). Unicode standard says that standalone surrogates
          are invalid code points, but different  libraries  and  programs
          treat them differently.

   --input INPUT
          Specify  re2c  input  API.  INPUT  can  be one of the following:
          default, custom.

   -S --skeleton
          Instead of embedding  re2c-generated  code  into  C/C++  source,
          generate  a self-contained program for the same DFA. Most useful
          for correctness and performance testing.

   --empty-class POLICY
          What to do if user inputs empty character class. POLICY  can  be
          one  of  the  following:  match-empty (match empty input: pretty
          illogical, but this is the default for  backwards  compatibility
          reason),   match-none  (fail  to  match  on  any  input),  error
          (compilation  error).  Note  that  there  are  various  ways  to
          construct     empty     class,     e.g:     [],    [^\x00-\xFF],
          [\x00-\xFF][\x00-\xFF].

   --dfa-minimization <table | moore>
          Internal algorithm used by re2c to  minimize  DFA  (defaults  to
          moore).   Both  table  filling  and  Moore's  algorithms  should
          produce identical DFA (up to states relabelling).  Table filling
          algorithm  is  much simpler and slower; it serves as a reference
          implementation.

   -1 --single-pass
          Deprecated and does nothing (single pass is by default now).

   -W     Turn on all warnings.

   -Werror
          Turn warnings into errors. Note that this option  along  doesn't
          turn  on  any warnings, it only affects those warnings that have
          been turned on so far or will be turned on later.

   -W<warning>
          Turn on individual warning.

   -Wno-<warning>
          Turn off individual warning.

   -Werror-<warning>
          Turn on individual warning and treat it as error  (this  implies
          -W<warning>).

   -Wno-error-<warning>
          Don't  treat this particular warning as error. This doesn't turn
          off the warning itself.

   -Wcondition-order
          Warn if the generated program makes implicit  assumptions  about
          condition  numbering.  One  should  use either -t, --type-header
          option or  /*!types:re2c*/  directive  to  generate  mapping  of
          condition  names  to  numbers  and  use  autogenerated condition
          names.

   -Wempty-character-class
          Warn if regular expression contains empty character class.  From
          the rational point of view trying to match empty character class
          makes no sense: it should always fail.  However,  for  backwards
          compatibility  reasons  re2c  allows  empty  character class and
          treats it as empty string. Use --empty-class  option  to  change
          default behaviour.

   -Wmatch-empty-string
          Warn  if regular expression in a rule is nullable (matches empty
          string). If DFA runs in a loop and empty match is  unintentional
          (input  position  in not advanced manually), lexer may get stuck
          in eternal loop.

   -Wswapped-range
          Warn if range lower bound is greater that upper  bound.  Default
          re2c behaviour is to silently swap range bounds.

   -Wundefined-control-flow
          Warn if some input strings cause undefined control flow in lexer
          (the faulty patterns are reported). This is the  most  dangerous
          and  common  mistake.  It  can be easily fixed by adding default
          rule * (this rule has the lowest priority, matches any code unit
          and consumes exactly one code unit).

   -Wuseless-escape
          Warn  if  a  symbol is escaped when it shouldn't be.  By default
          re2c silently ignores escape, but this may as  well  indicate  a
          typo or an error in escape sequence.

INTERFACE CODE

   The  user  must  supply interface code either in the form of C/C++ code
   (macros,  functions,  variables,  etc.)  or  in  the  form  of  INPLACE
   CONFIGURATIONS.   Which  symbols must be defined and which are optional
   depends on a particular use case.

   YYCONDTYPE
          In -c mode you can use -t to generate a file that  contains  the
          enumeration  used  as conditions. Each of the values refers to a
          condition of a rule set.

   YYCTXMARKER
          l-value of type YYCTYPE *.  The generated  code  saves  trailing
          context  backtracking  information in YYCTXMARKER. The user only
          needs to define this  macro  if  a  scanner  specification  uses
          trailing context in one or more of its regular expressions.

   YYCTYPE
          Type  used  to hold an input symbol (code unit). Usually char or
          unsigned char for ASCII, EBCDIC and UTF-8,  unsigned  short  for
          UTF-16 or UCS-2 and unsigned int for UTF-32.

   YYCURSOR
          l-value  of  type  YYCTYPE  *  that  points to the current input
          symbol. The generated code  advances  YYCURSOR  as  symbols  are
          matched.  On  entry,  YYCURSOR  is assumed to point to the first
          character of the current token. On exit, YYCURSOR will point  to
          the first character of the following token.

   YYDEBUG (state, current)
          This  is only needed if the -d flag was specified. It allows one
          to easily debug the generated parser by calling a  user  defined
          function for every state. The function should have the following
          signature: void YYDEBUG (int state,  char  current).  The  first
          parameter  receives  the  state  or  -1 and the second parameter
          receives the input at the current cursor.

   YYFILL (n)
          The generated code "calls"" YYFILL (n)  when  the  buffer  needs
          (re)filling:   at   least  n  additional  characters  should  be
          provided. YYFILL (n) should adjust YYCURSOR,  YYLIMIT,  YYMARKER
          and  YYCTXMARKER  as  needed.  Note that for typical programming
          languages n will be the length of the longest keyword plus  one.
          The user can place a comment of the form /*!max:re2c*/ to insert
          YYMAXFILL definition that is set to the maximum length value.

   YYGETCONDITION ()
          This define is used to get the condition prior to  entering  the
          scanner code when using -c switch. The value must be initialized
          with a value from the enumeration YYCONDTYPE type.

   YYGETSTATE ()
          The user only needs to define this macro  if  the  -f  flag  was
          specified.  In  that case, the generated code "calls" YYGETSTATE
          () at the very beginning of the scanner in order to  obtain  the
          saved  state.  YYGETSTATE  ()  must return a signed integer. The
          value must be either -1, indicating that the scanner is  entered
          for  the  first  time, or a value previously saved by YYSETSTATE
          (s). In the second case,  the  scanner  will  resume  operations
          right after where the last YYFILL (n) was called.

   YYLIMIT
          Expression  of  type  YYCTYPE * that marks the end of the buffer
          YYLIMIT[-1] is the last character in the buffer). The  generated
          code  repeatedly  compares YYCURSOR to YYLIMIT to determine when
          the buffer needs (re)filling.

   YYMARKER
          l-value  of  type  YYCTYPE  *.    The   generated   code   saves
          backtracking  information  in YYMARKER. Some easy scanners might
          not use this.

   YYMAXFILL
          This will be automatically defined by  /*!max:re2c*/  blocks  as
          explained above.

   YYSETCONDITION (c)
          This  define  is  used to set the condition in transition rules.
          This is only being used when -c is active and  transition  rules
          are being used.

   YYSETSTATE (s)
          The  user  only  needs  to  define this macro if the -f flag was
          specified. In that case, the generated code  "calls"  YYSETSTATE
          just before calling YYFILL (n). The parameter to YYSETSTATE is a
          signed integer that uniquely identifies the specific instance of
          YYFILL  (n)  that is about to be called. Should the user wish to
          save the state of the scanner and have YYFILL (n) return to  the
          caller,  all  he  has  to do is store that unique identifer in a
          variable. Later, when the scannered is  called  again,  it  will
          call YYGETSTATE () and resume execution right where it left off.
          The  generated  code  will  contain  both  YYSETSTATE  (s)   and
          YYGETSTATE even if YYFILL (n) is being disabled.

SYNTAX

   Code for re2c consists of a set of RULES, NAMED DEFINITIONS and INPLACE
   CONFIGURATIONS.

   RULES
   Rules consist of a regular expression (see REGULAR  EXPRESSIONS)  along
   with  a  block of C/C++ code that is to be executed when the associated
   regular expression is matched. You can either start the  code  with  an
   opening  curly  brace  or  the  sequence :=. When the code with a curly
   brace then re2c counts the brace  depth  and  stops  looking  for  code
   automatically.  Otherwise  curly  braces are not allowed and re2c stops
   looking for code at the first line that does not begin with whitespace.
   If two or more rules overlap, the first rule is preferred.
      regular-expression { C/C++ code }

      regular-expression := C/C++ code

   There is one special rule: default rule *
      * { C/C++ code }

      * := C/C++ code

   Note  that default rule * differs from [^]: default rule has the lowest
   priority, matches any code unit (either valid or  invalid)  and  always
   consumes  one  character;  while  [^] matches any valid code point (not
   code  unit)  and  can  consume  multiple  code  units.  In  fact,  when
   variable-length  encoding  is used, * is the only possible way to match
   invalid input character (see ENCODINGS for details).

   If -c is active then each regular expression is preceded by a  list  of
   comma  separated condition names. Besides normal naming rules there are
   two special cases: <*> (such rules are merged to all conditions) and <>
   (such  the  rule cannot have an associated regular expression, its code
   is merged to all actions). Non empty rules may further more specify the
   new  condition.  In  that case re2c will generate the necessary code to
   change the condition automatically. Rules can use :=> as a shortcut  to
   automatically  generate code that not only sets the new condition state
   but also continues execution with the new state. A shortcut rule should
   not be used in a loop where there is code between the start of the loop
   and the re2c block unless re2c:cond:goto is  changed  to  continue.  If
   code  is  necessary  before all rules (though not simple jumps) you can
   doso by using <!> pseudo-rules.
      <condition-list> regular-expression { C/C++ code }

      <condition-list> regular-expression := C/C++ code

      <condition-list> * { C/C++ code }

      <condition-list> * := C/C++ code

      <condition-list> regular-expression => condition { C/C++ code }

      <condition-list> regular-expression => condition := C/C++ code

      <condition-list> * => condition { C/C++ code }

      <condition-list> * => condition := C/C++ code

      <condition-list> regular-expression :=> condition

      <*> regular-expression { C/C++ code }

      <*> regular-expression := C/C++ code

      <*> * { C/C++ code }

      <*> * := C/C++ code

      <*> regular-expression => condition { C/C++ code }

      <*> regular-expression => condition := C/C++ code

      <*> * => condition { C/C++ code }

      <*> * => condition := C/C++ code

      <*> regular-expression :=> condition

      <> { C/C++ code }

      <> := C/C++ code

      <> => condition { C/C++ code }

      <> => condition := C/C++ code

      <> :=> condition

      <> :=> condition

      <! condition-list> { C/C++ code }

      <! condition-list> := C/C++ code

      <!> { C/C++ code }

      <!> := C/C++ code

   NAMED DEFINITIONS
   Named definitions are of the form:
      name = regular-expression;

   If -F is active, then named definitions are also of the form:
      name { regular-expression }

   INPLACE CONFIGURATIONS
   re2c:condprefix = yyc;
          Allows one to specify the prefix used for condition labels. That
          is  this  text  is  prepended  to  any  condition  label  in the
          generated output file.

   re2c:condenumprefix = yyc;
          Allows one to specify the prefix used for condition values. That
          is  this  text  is  prepended to any condition enum value in the
          generated output file.

   re2c:cond:divider = /* *********************************** */ ;
          Allows one to customize the devider for  condition  blocks.  You
          can  use  @@  to  put the name of the condition or customize the
          placeholder using re2c:cond:divider@cond.

   re2c:cond:divider@cond = @@;
          Specifies  the  placeholder  that  will  be  replaced  with  the
          condition name in re2c:cond:divider.

   re2c:cond:goto = goto @@; ;
          Allows  one to customize the condition goto statements used with
          :=> style rules. You can use @@ to put the name of the condition
          or  ustomize  the placeholder using re2c:cond:goto@cond. You can
          also change this to continue;, which would allow you to continue
          with  the  next loop cycle including any code between loop start
          and re2c block.

   re2c:cond:goto@cond = @@;
          Spcifies  the  placeholder  that  will  be  replaced  with   the
          condition label in re2c:cond:goto.

   re2c:indent:top = 0;
          Specifies  the  minimum number of indentation to use. Requires a
          numeric value greater than or equal zero.

   re2c:indent:string = \t ;
          Specifies the string to use for indentation. Requires  a  string
          that  should  contain  only  whitespace unless you need this for
          external tools. The easiest way to specify spaces is to  enclude
          them  in  single  or  double  quotes.   If  you  do not want any
          indentation at all you can simply set this to "".

   re2c:yych:conversion = 0;
          When this setting is non zero, then re2c automatically generates
          conversion  code  whenever yych gets read. In this case the type
          must be defined using re2c:define:YYCTYPE.

   re2c:yych:emit = 1;
          Generation of yych can be suppressed by setting this to 0.

   re2c:yybm:hex = 0;
          If set to zero then  a  decimal  table  is  being  used  else  a
          hexadecimal table will be generated.

   re2c:yyfill:enable = 1;
          Set  this  to  zero  to  suppress generation of YYFILL (n). When
          using this be sure to verify that the generated scanner does not
          read  behind input. Allowing this behavior might introduce sever
          security issues to you programs.

   re2c:yyfill:check = 1;
          This can be set 0 to suppress output of the pre condition  using
          YYCURSOR  and  YYLIMIT  which  becomes  useful  when  YYLIMIT  +
          YYMAXFILL is always accessible.

   re2c:define:YYFILL = YYFILL ;
          Substitution for YYFILL. Note that  by  default  re2c  generates
          argument  in  braces  and semicolon after YYFILL. If you need to
          make YYFILL an arbitrary  statement  rather  than  a  call,  set
          re2c:define:YYFILL:naked      to      non-zero      and      use
          re2c:define:YYFILL@len to  denote  formal  parameter  inside  of
          YYFILL body.

   re2c:define:YYFILL@len = @@ ;
          Any  occurrence  of  this text inside of YYFILL will be replaced
          with the actual argument.

   re2c:yyfill:parameter = 1;
          Controls argument in braces after YYFILL. If zero,  agrument  is
          omitted.    If    non-zero,   argument   is   generated   unless
          re2c:define:YYFILL:naked is set to non-zero.

   re2c:define:YYFILL:naked = 0;
          Controls argument in braces and semicolon after YYFILL. If zero,
          both  agrument  and semicolon are omitted. If non-zero, argument
          is generated unless re2c:yyfill:parameter is  set  to  zero  and
          semicolon is generated unconditionally.

   re2c:startlabel = 0;
          If  set  to  a non zero integer then the start label of the next
          scanner blocks will be generated even if not used by the scanner
          itself.  Otherwise the normal yy0 like start label is only being
          generated if needed. If set to a text value then  a  label  with
          that  text  will  be  generated regardless of whether the normal
          start label is being used or not. This setting is being reset to
          0 after a start label has been generated.

   re2c:labelprefix = yy ;
          Allows  one to change the prefix of numbered labels. The default
          is yy and can be set any string that is a valid label.

   re2c:state:abort = 0;
          When not zero and switch -f is active then the YYGETSTATE  block
          will  contain  a  default case that aborts and a -1 case is used
          for initialization.

   re2c:state:nextlabel = 0;
          Used when -f is active to control whether the  YYGETSTATE  block
          is  followed  by  a yyNext: label line.  Instead of using yyNext
          you can usually also use configuration  startlabel  to  force  a
          specific  start  label or default to yy0 as start label. Instead
          of using a dedicated label it is often better  to  separate  the
          YYGETSTATE  code  from  the  actual  scanner  code  by placing a
          /*!getstate:re2c*/ comment.

   re2c:cgoto:threshold = 9;
          When -g is active this value specifies the complexity  threshold
          that triggers generation of jump tables rather than using nested
          if's and decision bitfields. The threshold is compared against a
          calculated  estimation  of  if-s  needed where every used bitmap
          divides the threshold by 2.

   re2c:yych:conversion = 0;
          When the input uses signed characters and -s or -b switches  are
          in  effect  re2c  allows  one  to  automatically  convert to the
          unsigned character type that is then necessary for its  internal
          single  character.  When this setting is zero or an empty string
          the  conversion  is  disabled.  Using  a  non  zero  number  the
          conversion is taken from YYCTYPE. If that is given by an inplace
          configuration that value is being used.  Otherwise  it  will  be
          (YYCTYPE)  and  changes  to  that  configuration  are  no longer
          possible. When this setting is  a  string  the  braces  must  be
          specified.  Now  assuming  your input is a char * buffer and you
          are using above  mentioned  switches  you  can  set  YYCTYPE  to
          unsigned char and this setting to either 1 or (unsigned char).

   re2c:define:YYCONDTYPE = YYCONDTYPE ;
          Enumeration used for condition support with -c mode.

   re2c:define:YYCTXMARKER = YYCTXMARKER ;
          Allows one to overwrite the define YYCTXMARKER and thus avoiding
          it by setting the value to the actual code needed.

   re2c:define:YYCTYPE = YYCTYPE ;
          Allows one to overwrite the define YYCTYPE and thus avoiding  it
          by setting the value to the actual code needed.

   re2c:define:YYCURSOR = YYCURSOR ;
          Allows one to overwrite the define YYCURSOR and thus avoiding it
          by setting the value to the actual code needed.

   re2c:define:YYDEBUG = YYDEBUG ;
          Allows one to overwrite the define YYDEBUG and thus avoiding  it
          by setting the value to the actual code needed.

   re2c:define:YYGETCONDITION = YYGETCONDITION ;
          Substitution  for  YYGETCONDITION.  Note  that  by  default re2c
          generates      braces      after       YYGETCONDITION.       Set
          re2c:define:YYGETCONDITION:naked to non-zero to omit braces.

   re2c:define:YYGETCONDITION:naked = 0;
          Controls  braces  after  YYGETCONDITION.  If  zero,  braces  are
          omitted. If non-zero, braces are generated.

   re2c:define:YYSETCONDITION = YYSETCONDITION ;
          Substitution for  YYSETCONDITION.  Note  that  by  default  re2c
          generates argument in braces and semicolon after YYSETCONDITION.
          If you need to make YYSETCONDITION an arbitrary statement rather
          than  a  call,  set re2c:define:YYSETCONDITION:naked to non-zero
          and  use  re2c:define:YYSETCONDITION@cond   to   denote   formal
          parameter inside of YYSETCONDITION body.

   re2c:define:YYSETCONDITION@cond = @@ ;
          Any  occurrence  of  this  text inside of YYSETCONDITION will be
          replaced with the actual argument.

   re2c:define:YYSETCONDITION:naked = 0;
          Controls argument in braces and semicolon after  YYSETCONDITION.
          If  zero,  both agrument and semicolon are omitted. If non-zero,
          both argument and semicolon are generated.

   re2c:define:YYGETSTATE = YYGETSTATE ;
          Substitution for YYGETSTATE. Note that by default re2c generates
          braces  after  YYGETSTATE.  Set  re2c:define:YYGETSTATE:naked to
          non-zero to omit braces.

   re2c:define:YYGETSTATE:naked = 0;
          Controls braces after YYGETSTATE. If zero, braces  are  omitted.
          If non-zero, braces are generated.

   re2c:define:YYSETSTATE = YYSETSTATE ;
          Substitution for YYSETSTATE. Note that by default re2c generates
          argument in braces and semicolon after YYSETSTATE. If  you  need
          to  make  YYSETSTATE  an arbitrary statement rather than a call,
          set   re2c:define:YYSETSTATE:naked   to   non-zero    and    use
          re2c:define:YYSETSTATE@cond to denote formal parameter inside of
          YYSETSTATE body.

   re2c:define:YYSETSTATE@state = @@ ;
          Any occurrence  of  this  text  inside  of  YYSETSTATE  will  be
          replaced with the actual argument.

   re2c:define:YYSETSTATE:naked = 0;
          Controls  argument  in braces and semicolon after YYSETSTATE. If
          zero, both agrument and semicolon are omitted. If non-zero, both
          argument and semicolon are generated.

   re2c:define:YYLIMIT = YYLIMIT ;
          Allows  one to overwrite the define YYLIMIT and thus avoiding it
          by setting the value to the actual code needed.

   re2c:define:YYMARKER = YYMARKER ;
          Allows one to overwrite the define YYMARKER and thus avoiding it
          by setting the value to the actual code needed.

   re2c:label:yyFillLabel = yyFillLabel ;
          Allows one to overwrite the name of the label yyFillLabel.

   re2c:label:yyNext = yyNext ;
          Allows one to overwrite the name of the label yyNext.

   re2c:variable:yyaccept = yyaccept;
          Allows one to overwrite the name of the variable yyaccept.

   re2c:variable:yybm = yybm ;
          Allows one to overwrite the name of the variable yybm.

   re2c:variable:yych = yych ;
          Allows one to overwrite the name of the variable yych.

   re2c:variable:yyctable = yyctable ;
          When  both  -c and -g are active then re2c uses this variable to
          generate a static jump table for YYGETCONDITION.

   re2c:variable:yystable = yystable ;
          Deprecated.

   re2c:variable:yytarget = yytarget ;
          Allows one to overwrite the name of the variable yytarget.

   REGULAR EXPRESSIONS
   "foo"  literal string "foo". ANSI-C escape sequences can be used.

   'foo'  literal    string    "foo"    (characters    [a-zA-Z]    treated
          case-insensitive). ANSI-C escape sequences can be used.

   [xyz]  character class; in this case, regular expression matches either
          x, y, or z.

   [abj-oZ]
          character class with a range in it; matches  a,  b,  any  letter
          from j through o or Z.

   [^class]
          inverted character class.

   r \ s  match  any  r which isn't s. r and s must be regular expressions
          which can be expressed as character classes.

   r*     zero or more occurrences of r.

   r+     one or more occurrences of r.

   r?     optional r.

   (r)    r; parentheses are used to override precedence.

   r s    r followed by s (concatenation).

   r | s  either r or s (alternative).

   r / s  r but only if it is followed by s. Note that s is  not  part  of
          the  matched  text.  This  type  of regular expression is called
          "trailing context". Trailing context can only be the  end  of  a
          rule and not part of a named definition.

   r{n}   matches r exactly n times.

   r{n,}  matches r at least n times.

   r{n,m} matches r at least n times, but not more than m times.

   .      match any character except newline.

   name   matches named definition as specified by name only if -F is off.
          If -F is active then this behaves like it was enclosed in double
          quotes and matches the string "name".

   Character  classes and string literals may contain octal or hexadecimal
   character definitions and the following set of  escape  sequences:  
,
   	,  \f,  \n,  \r,  \t,  \v,  \\.  An  octal  character is defined by a
   backslash followed by its three octal digits (e.g. \377).   Hexadecimal
   characters from 0 to 0xFF are defined by backslash, a lower cased x and
   two hexadecimal digits (e.g. \x12). Hexadecimal characters  from  0x100
   to  0xFFFF are defined by backslash, a lower cased \u or an upper cased
   \X and four hexadecimal digits (e.g. \u1234).   Hexadecimal  characters
   from  0x10000 to 0xFFFFffff are defined by backslash, an upper cased \U
   and eight hexadecimal digits (e.g. \U12345678).

   The only portable "any" rule is the default rule *.

SCANNER WITH STORABLE STATES

   When the -f flag is specified, re2c generates a scanner that can  store
   its  current  state,  return to the caller, and later resume operations
   exactly where it left off.

   The default operation of re2c is a "pull" model, where the scanner asks
   for  extra  input whenever it needs it. However, this mode of operation
   assumes that the scanner is the "owner" the parsing loop, and that  may
   not always be convenient.

   Typically,  if  there  is  a  preprocessor  ahead of the scanner in the
   stream, or for that matter any other procedural  source  of  data,  the
   scanner  cannot "ask" for more data unless both scanner and source live
   in a separate threads.

   The -f flag is useful for just this situation:  it  lets  users  design
   scanners  that  work  in  a "push" model, i.e. where data is fed to the
   scanner chunk by chunk. When the scanner runs out of data  to  consume,
   it  just  stores  its  state, and return to the caller. When more input
   data is fed to the scanner, it resumes operations exactly where it left
   off.

   Changes needed compared to the "pull" model:

   * User has to supply macros YYSETSTATE () and YYGETSTATE (state).

   * The  -f option inhibits declaration of yych and yyaccept. So the user
     has to declare these. Also the user has to save  and  restore  these.
     In  the  example  examples/push_model/push.re  these  are declared as
     fields of the (C++) class of which the scanner is a method,  so  they
     do not need to be saved/restored explicitly. For C they could e.g. be
     made macros  that  select  fields  from  a  structure  passed  in  as
     parameter.  Alternatively, they could be declared as local variables,
     saved with YYFILL (n) when it decides to return and restored at entry
     to  the  function. Also, it could be more efficient to save the state
     from YYFILL (n) because YYSETSTATE (state) is called unconditionally.
     YYFILL  (n) however does not get state as parameter, so we would have
     to store state in a local variable by YYSETSTATE (state).

   * Modify YYFILL (n) to return (from the function calling  it)  if  more
     input is needed.

   * Modify  caller  to  recognise  if  more  input  is needed and respond
     appropriately.

   * The generated code will contain  a  switch  block  that  is  used  to
     restores  the last state by jumping behind the corrspoding YYFILL (n)
     call. This code is automatically generated in the epilog of the first
     /*!re2c  */  block.  It  is  possible  to  trigger  generation of the
     YYGETSTATE () block earlier by placing a /*!getstate:re2c*/  comment.
     This  is  especially  useful  when the scanner code should be wrapped
     inside a loop.

   Please see examples/push_model/push.re for "push"  model  scanner.  The
   generated  code can be tweaked using inplace configurations state:abort
   and state:nextlabel.

SCANNER WITH CONDITION SUPPORT

   You can preceed regular expressions with a list of condition names when
   using  the  -c  switch.  In this case re2c generates scanner blocks for
   each  conditon.  Where  each  of  the  generated  blocks  has  its  own
   precondition.  The  precondition  is  given  by  the  interface  define
   YYGETCONDITON() and must be of type YYCONDTYPE.

   There are two special rule types. First, the rules of the condition <*>
   are  merged  to all conditions (note that they have lower priority than
   other rules of that condition). And second  the  empty  condition  list
   allows  one  to provide a code block that does not have a scanner part.
   Meaning it does not allow any regular expression. The  condition  value
   referring  to this special block is always the one with the enumeration
   value 0. This way the  code  of  this  special  rule  can  be  used  to
   initialize  a  scanner.  It is in no way necessary to have these rules:
   but sometimes it is helpful to have a dedicated uninitialized condition
   state.

   Non  empty  rules  allow  one to specify the new condition, which makes
   them  transition  rules.  Besides  generating  calls  for  the   define
   YYSETCONDTITION no other special code is generated.

   There  is  another kind of special rules that allow one to prepend code
   to any code block of all rules of a certain set of conditions or to all
   code  blocks  to  all rules. This can be helpful when some operation is
   common among rules. For instance this can be used to store  the  length
   of  the  scanned  string.  These  special  setup  rules  start  with an
   exclamation mark followed by either a list of conditions <!  condition,
   ...  >  or  a  star <!*>. When re2c generates the code for a rule whose
   state does not have a setup rule and a star'd setup  rule  is  present,
   than that code will be used as setup code.

ENCODINGS

   re2c  supports  the  following encodings: ASCII (default), EBCDIC (-e),
   UCS-2 (-w), UTF-16 (-x), UTF-32 (-u) and UTF-8 (-8).  See also  inplace
   configuration re2c:flags.

   The following concepts should be clarified when talking about encoding.
   Code point is an abstract  number,  which  represents  single  encoding
   symbol.  Code unit is the smallest unit of memory, which is used in the
   encoded text (it corresponds to one character in the input stream). One
   or  more  code  units  can  be needed to represent a single code point,
   depending on the encoding. In fixed-length encoding, each code point is
   represented  with  equal  number  of  code  units.  In  variable-length
   encoding, different code  points  can  be  represented  with  different
   number of code units.

   ASCII  is  a  fixed-length encoding. Its code space includes 0x100 code
          points, from 0 to 0xFF.  One  code  point  is  represented  with
          exactly  one  1-byte  code unit, which has the same value as the
          code point. Size of YYCTYPE must be 1 byte.

   EBCDIC is a fixed-length encoding. Its code space includes  0x100  code
          points,  from  0  to  0xFF.  One  code point is represented with
          exactly one 1-byte code unit, which has the same  value  as  the
          code point. Size of YYCTYPE must be 1 byte.

   UCS-2  is a fixed-length encoding. Its code space includes 0x10000 code
          points, from 0 to 0xFFFF. One code  point  is  represented  with
          exactly  one  2-byte  code unit, which has the same value as the
          code point. Size of YYCTYPE must be 2 bytes.

   UTF-16 is a variable-length  encoding.  Its  code  space  includes  all
          Unicode  code  points,  from  0  to  0xD7FF  and  from 0xE000 to
          0x10FFFF. One code point is represented with one or  two  2-byte
          code units. Size of YYCTYPE must be 2 bytes.

   UTF-32 is  a fixed-length encoding. Its code space includes all Unicode
          code points, from 0 to 0xD7FF and from 0xE000 to  0x10FFFF.  One
          code  point  is  represented  with exactly one 4-byte code unit.
          Size of YYCTYPE must be 4 bytes.

   UTF-8  is a variable-length  encoding.  Its  code  space  includes  all
          Unicode  code  points,  from  0  to  0xD7FF  and  from 0xE000 to
          0x10FFFF. One code point is represented with  sequence  of  one,
          two,  three or four 1-byte code units. Size of YYCTYPE must be 1
          byte.

   In Unicode, values from range 0xD800 to  0xDFFF  (surrogates)  are  not
   valid  Unicode  code  points,  any encoded sequence of code units, that
   would map to  Unicode  code  points  in  the  range  0xD800-0xDFFF,  is
   ill-formed.  The  user  can  control  how  re2c  treats such ill-formed
   sequences with --encoding-policy <policy> flag (see  OPTIONS  for  full
   explanation).

   For  some  encodings,  there  are code units, that never occur in valid
   encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must
   check  for  invalid input, the only true way to do so is to use default
   rule *. Note, that full range rule [^] won't catch invalid  code  units
   when  variable-length  encoding  is  used  ([^]  means  "all valid code
   points", while default rule * means "all possible code units").

GENERIC INPUT API

   re2c usually operates on input using pointer-like primitives  YYCURSOR,
   YYMARKER, YYCTXMARKER and YYLIMIT.

   Generic  input  API  (enabled with --input custom switch) allows one to
   customize input  operations.  In  this  mode,  re2c  will  express  all
   operations on input in terms of the following primitives:

                 
                 YYPEEK ()        get      current     input 
                                  character                  
                 
                 YYSKIP ()        advance   to   the    next 
                                  character                  
                 
                 YYBACKUP ()      backup    current    input 
                                  position                   
                 
                 YYBACKUPCTX ()   backup    current    input 
                                  position    for   trailing 
                                  context                    
                 
                 YYRESTORE ()     restore   current    input 
                                  position                   
                 
                 YYRESTORECTX ()  restore    current   input 
                                  position   for    trailing 
                                  context                    
                 
                 YYLESSTHAN (n)   check if less than n input 
                                  characters are left        
                 

   A couple of useful links that provide some examples:

   1. http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html

   2. http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html

AUTHORS

   Peter Bumbulis   peter@csg.uwaterloo.ca

   Brian Young      bayoung@acm.org

   Dan Nuffer       nuffer@users.sourceforge.net

   Marcus Boerger   helly@users.sourceforge.net

   Hartmut Kaiser   hkaiser@users.sourceforge.net

   Emmanuel Mogenet mgix@mgix.com

   Ulya Trofimovich skvadrik@gmail.com

VERSION INFORMATION

   This manpage describes re2c version 0.16, package date 21 Jan 2016.

                                                                   RE2C(1)

Opportunity

Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.

Free Software

Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.

Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.

Free Books

The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.

Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.

Education

Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.

Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.