locatedb(5)


NAME

   locatedb - front-compressed file name database

DESCRIPTION

   This  manual  page  documents the format of file name databases for the
   GNU version of locate.  The file name databases contain lists of  files
   that  were  in  particular directory trees when the databases were last
   updated.

   There can be multiple databases.   Users  can  select  which  databases
   locate  searches  using an environment variable or command line option;
   see locate(1).  The system administrator can choose the  file  name  of
   the  default  database,  the  frequency  with  which  the databases are
   updated, and the directories for which they contain entries.  Normally,
   file  name  databases  are  updated  by  running  the  updatedb program
   periodically, typically nightly; see updatedb(1).

GNU LOCATE02 database format

   This is the default format of  databases  produced  by  updatedb.   The
   updatedb  program  runs frcode to compress the list of file names using
   front-compression, which reduces the database size by a factor of 4  to
   5.   Front-compression  (also  known  as incremental encoding) works as
   follows.

   The database entries are a sorted list (case-insensitively, for  users'
   convenience).   Since the list is sorted, each entry is likely to share
   a prefix (initial string) with the previous entry.  Each database entry
   begins  with  an  signed  offset-differential  count byte, which is the
   additional number of characters of prefix of the preceding entry to use
   beyond the number that the preceding entry is using of its predecessor.
   (The counts can be negative.)  Following the count is a null-terminated
   ASCII remainder --- the part of the name that follows the shared prefix.

   If  the  offset-differential  count  is  larger than can be stored in a
   signed byte (127), the byte has the value 0x80 (binary  10000000)  and
   the  actual  count  follows  in a 2-byte word, with the high byte first
   (network byte order).  This count can also be negative  (the  sign  bit
   being in the first of the two bytes).

   Every  database begins with a dummy entry for a file called `LOCATE02',
   which locate checks for to  ensure  that  the  database  file  has  the
   correct format; it ignores the entry in doing the search.

   Databases  cannot  be  concatenated together, even if the first (dummy)
   entry is trimmed from all but the first database.  This is because  the
   offset-differential  count  in  the  first  entry  of  the  second  and
   following databases will be wrong.

   In the future, the data within the locate database may not be sorted in
   any  particular  order.   To  obtain sorted results, pipe the output of
   locate through sort -f.

slocate database format

   The slocate program uses a database format similar to,  but  not  quite
   the  same as, GNU locate.  The first byte of the database specifies its
   security level.  If the security level is 0, slocate will  read,  match
   and  print  filenames  on  the basis of the information in the database
   only.  However, if the security level byte is 1, slocate omits  entries
   from  its  output  if  the invoking user is unable to access them.  The
   second byte of the database is zero.  The second byte  is  followed  by
   the  first  database  entry.   The  first  entry in the database is not
   preceded by  any  differential  count  or  dummy  entry.   Instead  the
   differential count for the first item is assumed to be zero.

   Starting  with  the  second  entry  (if  any)  in the database, data is
   interpreted as for the GNU LOCATE02 format.

Old Locate Database format

   There is also an old database format, used  by  Unix  locate  and  find
   programs  and earlier releases of the GNU ones.  updatedb runs programs
   called bigram and code to produce old-format databases.  The old format
   differs  from  the above description in the following ways.  Instead of
   each entry starting with an offset-differential count byte  and  ending
   with a null, byte values from 0 through 28 indicate offset-differential
   counts from -14 through 14.  The byte  value  indicating  that  a  long
   offset-differential  count  follows  is  0x1e (30), not 0x80.  The long
   counts are stored in host byte order, which is not necessarily  network
   byte order, and host integer word size, which is usually 4 bytes.  They
   also represent a count 14 less than their value.   The  database  lines
   have  no  termination  byte; the start of the next line is indicated by
   its first byte having a value  30.

   In addition, instead of starting with a dummy entry, the  old  database
   format  starts  with  a  256  byte table containing the 128 most common
   bigrams in the file list.  A bigram is a pair of adjacent bytes.  Bytes
   in  the  database that have the high bit set are indexes (with the high
   bit cleared) into the bigram table.  The bigram and offset-differential
   count  coding makes these databases 20--25% smaller than the new format,
   but makes them not 8-bit clean.  Any byte in a file name that is in the
   ranges  used  for  the  special  codes is replaced in the database by a
   question mark, which not coincidentally is the shell wildcard to  match
   a single character.

EXAMPLE

   Input to frcode:
   /usr/src
   /usr/src/cmd/aardvark.c
   /usr/src/cmd/armadillo.c
   /usr/tmp/zoo

   Length of the longest prefix of the preceding entry to share:
   0 /usr/src
   8 /cmd/aardvark.c
   14 rmadillo.c
   5 tmp/zoo

   Output  from  frcode, with trailing nulls changed to newlines and count
   bytes made printable:
   0 LOCATE02
   0 /usr/src
   8 /cmd/aardvark.c
   6 rmadillo.c
   -9 tmp/zoo

   (6 = 14 - 8, and -9 = 5 - 14)

SEE ALSO

   find(1), locate(1), locatedb(5), xargs(1), Finding  Files  (on-line  in
   Info, or printed)

BUGS

   The   best   way   to   report   a   bug   is   to   use  the  form  at
   http://savannah.gnu.org/bugs/?group=findutils.  The reason for this  is
   that  you  will  then  be able to track progress in fixing the problem.
   Other comments about locate(1)  and  about  the  findutils  package  in
   general  can  be  sent  to the bug-findutils mailing list.  To join the
   list, send email to bug-findutils-request@gnu.org.

                                                               LOCATEDB(5)





Opportunity


Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.





Free Software


Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.


Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.





Free Books


The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.


Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.





Education


Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.


Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.