ispell - format of ispell dictionaries and affix files


   Ispell(1)  requires  two files to define the language that it is spell-
   checking.  The first file is a  dictionary  containing  words  for  the
   language, and the second is an "affix" file that defines the meaning of
   special flags in  the  dictionary.   The  two  files  are  combined  by
   buildhash  (see  ispell(1))  and  written  to  a hash file which is not
   described here.

   A raw ispell  dictionary  (either  the  main  dictionary  or  your  own
   personal dictionary) contains a list of words, one per line.  Each word
   may optionally be followed by a slash ("/")  and  one  or  more  flags,
   which  modify  the  root  word  as  explained  below.  Depending on the
   options with which ispell was built, case may or may not be significant
   in  either the root word or the flags, independently.  Specifically, if
   the compile-time option CAPITALIZATION is defined, case is  significant
   in  the  root  word;  if not, case is ignored in the root word.  If the
   compile-time option MASKBITS is set to a value of 32, case  is  ignored
   in the flags; otherwise case is significant in the flags.  Contact your
   system administrator or ispell maintainer for more information (or  use
   the -vv flag to find out).  The dictionary should be sorted with the -f
   flag  of  sort(1)  before  the  hash  file  is  built;  this  is   done
   automatically  by  munchlist(1),  which  is  the normal way of building

   If the dictionary contains words that have string characters  (see  the
   affix-file  documentation  below),  they  must be written in the format
   given by the defstringtype statement in the affix file.  This  will  be
   the  case  for  most  non-English  languages.   Be  careful to use this
   format, rather than that of your favorite formatter, when adding  words
   to  a dictionary.  (If you add words to your personal dictionary during
   an ispell session, they will automatically be converted to the  correct
   format.   This  feature  can be used to convert an entire dictionary if

               echo qqqqq > dummy.dict
               buildhash dummy.dict affix-file dummy.hash
               awk '{print "*"}END{print "#"}' old-dict-file \
               | ispell -a -T old-dict-string-type \
                 -d ./dummy.hash -p ./new-dict-file \
                 > /dev/null
               rm dummy.*

   The case of the root word  controls  the  case  of  words  accepted  by
   ispell, as follows:

   (1)    If the root word appears only in lower case (e.g., bob), it will
          be accepted in lower case, capitalized, or all capitals.

   (2)    If the root word appears capitalized (e.g., Robert), it will not
          be  accepted in all-lower case, but will be accepted capitalized
          or all in capitals.

   (3)    If the root word appears all in capitals (e.g., UNIX),  it  will
          only be accepted all in capitals.

   (4)    If  the  root  word appears with a "funny" capitalization (e.g.,
          ITCorp), a word  will  be  accepted  only  if  it  follows  that
          capitalization, or if it appears all in capitals.

   (5)    More  than  one  capitalization of a root word may appear in the
          dictionary.  Flags from different capitalizations  are  combined
          by OR-ing them together.

   Redundant  capitalizations  (e.g.,  bob  and  Bob)  will be combined by
   buildhash and by ispell (for personal dictionaries), and can be removed
   from a raw dictionary by munchlist.

   For example, the dictionary:


   will  accept  bob,  Bob, BOB, Robert, ROBERT, UNIX, ITcorp, ITCorp, and
   ITCORP, and will reject all others.  Some of the unacceptable forms are
   bOb, robert, Unix, and ItCorp.

   As  mentioned  above,  root  words in any dictionary may be extended by
   flags.  Each flag is a single alphabetic character, which represents  a
   prefix or suffix that may be added to the root to form a new word.  For
   example, in an English dictionary the D flag can be added to  bathe  to
   make bathed.  Since flags are represented as a single bit in the hashed
   dictionary, this results in significant space savings.   The  munchlist
   script  will  reduce  an  existing  raw dictionary by adding flags when

   When a word is extended with an affix, the affix will be accepted  only
   if  it  appears  in  the  same  case  as  the initial (prefix) or final
   (suffix) letter of the word.  Thus, for example, the  entry  UNIX/M  in
   the  main  dictionary  (M  means add an apostrophe and an "s" to make a
   possessive) would accept UNIX'S but would reject UNIX's.  If UNIX's  is
   legal,  it  must appear as a separate dictionary entry, and it will not
   be combined by munchlist.  (In general, you don't need to  worry  about
   these  things;  munchlist  guarantees  that  its output dictionary will
   accept the same set of words as its input, so all you have to do is add
   words  to  the  dictionary and occasionally run munchlist to reduce its

   As  mentioned,  the  affix  definition  file  describes   the   affixes
   associated  with particular flags.  It also describes the character set
   used by the language.

   Although the affix-definition grammar is designed for  a  line-oriented
   layout,  it  is actually a free-format yacc grammar and can be laid out
   weirdly if you want.  Comments are started by a pound (sharp) sign (#),
   and  continue to the end of the line.  Backslashes are supported in the
   usual fashion (\nnn, plus specials \n, \r, \t, \v, \f, 	, and the  new
   hex format \xnn).  Any character with special meaning to the parser can
   be changed to an uninterpreted token by backslashing it;  for  example,
   you  can  declare  a  flag named 'asterisk' or 'colon' with flag \*: or
   flag \::.

   The grammar will be presented in a top-down fashion, with discussion of
   each element.  An affix-definition file must contain exactly one table:

          table     :    [headers] [prefixes] [suffixes]

   At  least one of prefixes and suffixes is required.  They can appear in
   either order.

          headers   :    [ options ] char-sets

   The headers describe options global to this  dictionary  and  language.
   These  include the character sets to be used and the formatter, and the
   defaults for certain ispell flags.

          options : { fmtr-stmt | opt-stmt | flag-stmt | num-stmt }

   The options statements define the defaults for certain ispell flags and
   for the character sets used by the formatters.

          fmtr-stmt :    { nroff-stmt | tex-stmt }

   A  fmtr-stmt  describes  characters  that  have  special  meaning  to a
   formatter.   Normally,  this  statement  is  not  necessary,  but  some
   languages  may  have  preempted the usual defaults for use as language-
   specific characters.  In this case, these statements  may  be  used  to
   redefine the special characters expected by the formatter.

          nroff-stmt     :    { nroffchars | troffchars } string

   The  nroffchars  statement allows redefinition of certain nroff control
   characters.  The string given must be exactly five characters long, and
   must list substitutions for the left and right parentheses ("()") , the
   period ("."), the backslash ("\"), and the asterisk ("*").  (The  right
   parenthesis  is  not currently used, but is included for completeness.)
   For example, the statement:

          nroffchars {}.\\*

   would replace the left and right parentheses with left and right  curly
   braces  for  purposes of parsing nroff/troff strings, with no effect on
   the others (admittedly a contrived example).  Note that  the  backslash
   is escaped with a backslash.

          tex-stmt  :    { TeXchars | texchars } string

   The TeXchars statement allows redefinition of certain TeX/LaTeX control
   characters.  The string given must be exactly thirteen characters long,
   and must list substitutions for the left and right parentheses ("()") ,
   the left and right square brackets ("[]"), the  left  and  right  curly
   braces  ("{}"), the left and right angle brackets ("<>"), the backslash
   ("\"), the dollar sign ("$"), the asterisk ("*"),  the  period  or  dot
   ("."), and the percent sign ("%").  For example, the statement:

          texchars ()\[]<\><\>\\$*.%

   would replace the functions of the left and right curly braces with the
   left and  right  angle  brackets  for  purposes  of  parsing  TeX/LaTeX
   constructs,  while  retaining their functions for the tib bibliographic
   preprocessor.  Note that the backslash, the left  square  bracket,  and
   the right angle bracket must be escaped with a backslash.

          opt-stmt  :    { cmpnd-stmt | aff-stmt }

          cmpnd-stmt     :    compoundwords compound-opt

          aff-stmt       :    allaffixes on-or-off

          on-or-off :    { on | off }

          compound-opt : { on-or-off | controlled character }

   An  opt-stmt  controls  certain  ispell  defaults  that  are  best made
   language-specific.  The allaffixes statement controls the  default  for
   the  -P  and  -m  options  to ispell.  If allaffixes is turned off (the
   default),  ispell  will  default  to  the  behavior  of  the  -P  flag:
   root/affix suggestions will only be made if there are no "near misses".
   If allaffixes is turned on, ispell will default to the behavior of  the
   -m flag: root/affix suggestions will always be made.  The compoundwords
   statement controls the default for the -B and -C options to ispell.  If
   compoundwords  is  turned off (the default), ispell will default to the
   behavior of the -B flag: run-together words will be reported as errors.
   If  compoundwords  is turned on, ispell will default to the behavior of
   the -C flag: run-together words will be considered as compounds if both
   are in the dictionary.  This is useful for languages such as German and
   Norwegian, which form large numbers of  compound  words.   Finally,  if
   compoundwords  is  set  to  controlled, only words marked with the flag
   indicated by character (which should not be  otherwise  used)  will  be
   allowed  to  participate  in  compound  formation.  Because this option
   requires the flags to  be  specified  in  the  dictionary,  it  is  not
   available from the command line.

          flag-stmt :    flagmarker character

   The  flagmarker  statement  describes  the  character  which is used to
   separate affix flags from the root word in a raw dictionary file.  This
   must be a character which is not found in any word (including in string
   characters; see below).  The default is "/" because this  character  is
   not normally used to represent special characters in any language.

          num-stmt  :    compoundmin digit

   The  compoundmin statement controls the length of the two components of
   a compound word.  This only has an effect if compoundwords is turned on
   or  if  the  -C  flag  is given to ispell.  In that case, only words at
   least as long as the given minimum will be accepted as components of  a
   compound.  The default is 3 characters.

          char-sets :    norm-sets [ alt-sets ]

   The  character-set section describes the characters that can be part of
   a word, and defines their collating order.   There  must  always  be  a
   definition  of  "normal" character sets;  in addition, there may be one
   or more partial definitions of "alternate" sets  which  are  used  with
   various text formatters.

          norm-sets :    [ deftype ] charset-group

   A  "normal" character set may optionally begin with a definition of the
   file suffixes that make use of this set.  Following  this  are  one  or
   more character-set declarations.

          deftype : defstringtype name deformatter suffix*

   The  defstringtype  declaration  gives  a  list  of file suffixes which
   should make use of the default string characters defined as part of the
   base character set; it is only necessary if string characters are being
   defined.  The name  parameter  is  a  string  giving  the  unique  name
   associated  with  these suffixes; often it is a formatter name.  If the
   formatter is a member of the troff family, "nroff" should be  used  for
   the name associated with the most popular macro package; members of the
   TeX family should use "tex".  Other names may  be  chosen  freely,  but
   they  should be kept simple, as they are used in ispell 's -T switch to
   specify a formatter type.   The  deformatter  parameter  specifies  the
   deformatting  style  to  use  when  processing  files  with  the  given
   suffixes.  Currently, this must be either tex  or  nroff.   The  suffix
   parameters are a whitespace-separated list of strings which, if present
   at the end of a filename, indicate that the associated  set  of  string
   characters  should  be used by default for this file.  For example, the
   suffix list for the troff family typically includes  suffixes  such  as
   ".ms", ".me", ".mm", etc.

          charset-group :     { char-stmt | string-stmt | dup-stmt}*

   A  char-stmt  describes  single  characters;  a  string-stmt  describes
   characters that must appear together as a  string,  and  which  usually
   represent  a  single character in the target language.  Either may also
   describe conversion between upper and lower case.  A dup-stmt  is  used
   to  describe  alternate  forms  of  string characters, so that a single
   dictionary may be  used  with  several  formatting  programs  that  use
   different conventions for representing non-ASCII characters.

          char-stmt :    wordchars character-range
                    |    wordchars lowercase-range uppercase-range
                    |    boundarychars character-range
                    |    boundarychars lowercase-range uppercase-range
          string-stmt    :    stringchar string
                    |    stringchar lowercase-string uppercase-string

   Characters  described  with  the boundarychars statement are considered
   part of a word only if they appear singly, embedded between  characters
   declared  with the wordchars or stringchar statements.  For example, if
   the hyphen is a boundary character (useful in French), the string "foo-
   bar" would be a single word, but "-foo" would be the same as "foo", and
   "foo--bar" would be two words separated by non-word characters.

   If two ranges or strings are given in a char-stmt or  string-stmt,  the
   first  describes  characters  that are interpreted as lowercase and the
   second describes uppercase.  In the case of a stringchar statement, the
   two  strings  must  be  of  the  same  length.   Also,  in a stringchar
   statement, the actual strings may contain both uppercase and characters
   themselves without difficulty; for instance, the statement

          stringchar     "\\*(sS"  "\\*(Ss"

   is  legal  and will not interfere with (or be interfered with by) other
   declarations of of "s" and "S" as lower and upper case, respectively.

   A final note on  string  characters:  some  languages  collate  certain
   special  characters  as  if they were strings.  For example, the German
   "a-umlaut" is traditionally sorted as if it were "ae".  Ispell  is  not
   capable  of  this;  each  character  must  be  treated as an individual
   entity.  So in certain cases, ispell will sort a list of words  into  a
   different  order  than  the  standard "dictionary" order for the target

          alt-sets  :    alttype [ alt-stmt* ]

   Because different formatters use different notations to represent  non-
   ASCII  characters,  ispell must be aware of the representations used by
   these formatters.  These are  declared  as  alternate  sets  of  string

          alttype   :    altstringtype name suffix*

   The  altstringtype  statement  introduces  each  set  by  declaring the
   associated formatter name and filename suffix list.  This name and list
   are  interpreted  exactly  as  in  the  defstringtype  statement above.
   Following this header are one  or  more  alt-stmts  which  declare  the
   alternate string characters used by this formatter.

          alt-stmt       :    altstringchar alt-string std-string

   The  altstringchar  statement  describes  alternate representations for
   string characters.   For  example,  the  -mm  macro  package  of  troff
   represents  the  German "a-umlaut" as a\*:, while TeX uses the sequence
   \"a.  If the troff versions are declared as the standard versions using
   stringchar, the TeX versions may be declared as alternates by using the

          altstringchar  \\\"a     a\\*

   When the altstringchar statement is used to  specify  alternate  forms,
   all  forms  for  a  particular formatter must be declared together as a
   group.  Also, each formatter or macro package must provide  a  complete
   set  of  characters,  both  upper-  and  lower-case,  and the character
   sequences  used  for  each  formatter  must  be  completely   distinct.
   Character  sequences  which  describe upper- and lower-case versions of
   the same printable character must also be the same length.  It  may  be
   necessary  to  define  some new macros for a given formatter to satisfy
   these restrictions.  (The current version of buildhash does not enforce
   these restrictions, but failure to obey them may result in errors being
   introduced into files that are processed with ispell.)

   An important minor point is that ispell  assumes  that  all  characters
   declared as wordchars or boundarychars will occupy exactly one position
   on the terminal screen.

   A single character-set statement can declare either a single  character
   or  a contiguous range of characters.  A range is given as in egrep and
   the shell: [a-z] means lowercase  alphabetics;  [^a-z]  means  all  but
   lowercase, etc.  All character-set statements are combined (unioned) to
   produce the final list of characters that may be part of a  word.   The
   collating  order  of  the  characters  is defined by the order of their
   declaration; if a range is used, the characters are considered to  have
   been  declared  in ASCII order.  Characters that have case are collated
   next to each other, with the uppercase character first.

   The character-declaration statements have  a  rather  strange  behavior
   caused by its need to match each lowercase character with its uppercase
   equivalent.  In any given wordchars  or  boundarychars  statement,  the
   characters  in  each  range  are  first  sorted  into  ASCII  collating
   sequence, then matched one-for-one with  the  other  range.   (The  two
   ranges  must  have  the same number of characters).  Thus, for example,
   the two statements:

          wordchars [aeiou] [AEIOU]
          wordchars [aeiou] [UOIEA]

   would produce exactly the same effect.  To get the vowels to  match  up
   "wrong", you would have to use separate statements:

          wordchars a U
          wordchars e O
          wordchars i I
          wordchars o E
          wordchars u A

   which would cause uppercase 'e' to be 'O', and lowercase 'O' to be 'e'.
   This should normally be a problem only with languages which  have  been
   forced  to  use  a strange ASCII collating sequence.  If your uppercase
   and lowercase letters both collate in the  same  order,  you  shouldn't
   have to worry about this "feature".

   The prefixes and suffixes sections have exactly the same syntax, except
   for the introductory keyword.

          prefixes  :    prefixes flagdef*
          suffixes  :    suffixes flagdef*
          flagdef   :    flag [*|~] char : repl*

   A prefix or suffix table consists of an introductory keyword and a list
   of  flag  definitions.   Flags  can be defined more than once, in which
   case the definitions are combined.  Each  flag  controls  one  or  more
   repls  (replacements) which are conditionally applied to the beginnings
   or endings of various words.

   Flags  are  named  by  a  single  character  char.   Depending   on   a
   configuration option, this character can be either any uppercase letter
   (the  default  configuration)  or  any  7-bit  ASCII  character.   Most
   languages should be able to get along with just 26 flags.

   A  flag  character  may be prefixed with one or more option characters.
   (If you wish to use one of the option characters as a  flag  character,
   simply enclose it in double quotes.)

   The  asterisk  (*)  option  means that this flag participates in cross-
   product formation.  This only matters if the file contains both  prefix
   and  suffix  tables.   If  so, all prefixes and suffixes marked with an
   asterisk will be applied in all cross-combinations to  the  root  word.
   For  example,  consider  the  root  fix  with  prefixes pre and in, and
   suffixes es and ed.   If  all  flags  controlling  these  prefixes  and
   suffixes  are  marked  with an asterisk, then the single root fix would
   also generate prefix, prefixes, prefixed, infix, infixes, infixed, fix,
   fixes,  and  fixed.  Cross-product formation can produce a large number
   of words quickly, some of which may  be  illegal,  so  watch  out.   If
   cross-products  produce illegal words, munchlist will not produce those
   flag combinations, and the flag will not be useful.

          repl :    condition* > [ - strip-string , ] append-string

   The ~ option specifies that the associated flag is only active  when  a
   compound  word  is  being  formed.   This  is useful in a language like
   German, where the form of a word sometimes changes inside a compound.

   A repl is a conditional rule for  modifying  a  root  word.   Up  to  8
   conditions  may  be  specified.   If  the conditions are satisfied, the
   rules on the right-hand side of the repl are applied, as follows:

   (1)    If a strip-string is  given,  it  is  first  stripped  from  the
          beginning or ending (as appropriate) of the root word.

   (2)    Then the append-string is added at that point.

   For  example,  the  condition  .  means "any word", and the condition Y
   means "any word ending in Y".  The following (suffix) replacements:

          .    >    MENT
          Y    >    -Y,IES

   would change induce to inducement and fly  to  flies.   (If  they  were
   controlled  by  the  same  flag, they would also change fly to flyment,
   which might not be what was wanted.  Munchlist can be used  to  protect
   against this sort of problem; see the command sequence given below.)

   No  matter how much you might wish it, the strings on the right must be
   strings of specific characters, not ranges.   The  reasons  are  rooted
   deeply in the way ispell works, and it would be difficult or impossible
   to provide for more flexibility.  For example, you might wish to write:

          [EY] >    -[EY],IES

   This will not work.  Instead, you must use two separate rules:

          E    >    -E,IES
          Y    >    -Y,IES

   The application of repls  can  be  restricted  to  certain  words  with

          condition :    { . | character | range }

   A  condition is a restriction on the characters that adjoin, and/or are
   replaced by, the right-hand side of the repl.  Up to 8  conditions  may
   be  given,  which  should be enough context for anyone.  The right-hand
   side will be applied only if the conditions in the repl are  satisfied.
   The  conditions also implicitly define a length; roots shorter than the
   number of conditions will not pass the test.  (As  a  special  case,  a
   condition  of  a  single  dot "." defines a length of zero, so that the
   rule  applies  to  all  words  indiscriminately).    This   length   is
   independent of the separate test that insists that all flags produce an
   output word length of at least four.

   Conditions that are single characters  should  be  separated  by  white
   space.  For example, to specify words ending in "ED", write:

          E D  >    -ED,ING        # As in covered > covering

   If you write:

          ED   >    -ED,ING

   the effect will be the same as:

          [ED] >    -ED,ING

   As  a  final  minor,  but  important  point,  it is sometimes useful to
   rebuild a dictionary file  using  an  incompatible  suffix  file.   For
   example,  suppose  you expanded the "R" flag to generate "er" and "ers"
   (thus making the Z flag somewhat obsolete).  To build a new  dictionary
   newdict  that,  using  newaffixes, will accept exactly the same list of
   words as the old list olddict did using oldaffixes, the  -c  switch  of
   munchlist is useful, as in the following example:

          $ munchlist -c oldaffixes -l newaffixes olddict > newdict

   If  you  use this procedure, your new dictionary will always accept the
   same list the original did, even if you  badly  screwed  up  the  affix
   file.  This is because munchlist compares the words generated by a flag
   with the original word list, and refuses to use any flags that generate
   illegal  words.  (But don't forget that the munchlist step takes a long
   time and eats up temporary file space).


   As an example of conditional suffixes, here is the specification of the
   S flag from the English affix file:

          flag *S:
              [^AEIOU]Y  >    -Y,IES    # As in imply > implies
              [AEIOU]Y   >    S         # As in convey > conveys
              [SXZH]     >    ES        # As in fix > fixes
              [^SXZHY]   >    S         # As in bat > bats

   The  first  line applies to words ending in Y, but not in vowel-Y.  The
   second takes care of the vowel-Y words.  The third then  handles  those
   words  that  end  in a sibilant or near-sibilant, and the last picks up
   everything else.

   Note that the conditions are written very carefully so that they  apply
   to  disjoint  sets  of words.  In particular, note that the fourth line
   excludes words ending in Y as well as the obvious SXZH.  Otherwise,  it
   would convert "imply" into "implys".

   Although  the  English  affix  file does not do so, you can also have a
   flag generate more than one variation on a root word.  For example,  we
   could extend the English "R" flag as follows:

          flag *R:
             E           >    R         # As in skate > skater
             E           >    RS        # As in skate > skaters
             [^AEIOU]Y   >    -Y,IER    # As in multiply > multiplier
             [^AEIOU]Y   >    -Y,IERS   # As in multiply > multipliers
             [AEIOU]Y    >    ER        # As in convey > conveyer
             [AEIOU]Y    >    ERS       # As in convey > conveyers
             [^EY]       >    ER        # As in build > builder
             [^EY]       >    ERS       # As in build > builders

   This  flag  would  generate  both  "skater" and "skaters" from "skate".
   This capability can be very useful in languages that make use of  noun,
   verb,  and  adjective endings.  For instance, one could define a single
   flag that generated all of the German "weak" verb endings.



                                 local                           ISPELL(5)


Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.

Free Software

Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.

Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.

Free Books

The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.

Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.


Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.

Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.