Linux Manual Pages

Free Software * Books

Source Code

Free Media

Linux

pgreplay(1)

NAME

   pgreplay - PostgreSQL log file replayer for performance tests

SYNOPSIS

   pgreplay [parse options] [replay options] [-d level] [infile]
   pgreplay -f [parse options] [-o outfile] [-d level] [infile]
   pgreplay -r [replay options] [-d level] [infile]

DESCRIPTION

   pgreplay reads a PostgreSQL log file (not a WAL file), extracts the SQL
   statements and executes them  in  the  same  order  and  relative  time
   against  a  PostgreSQL  database  cluster.   A final report gives you a
   useful statistical analysis of your workload and its execution.

   In the first form, the log file infile is replayed at the  time  it  is
   read.

   With the -f option, pgreplay will not execute the statements, but write
   them to a 'replay file' outfile that can be  replayed  with  the  third
   form.

   With  the -r option, pgreplay will execute the statements in the replay
   file infile that was created by the second form.

   If the execution of statements gets behind schedule,  warning  messages
   are  issued  that  indicate that the server cannot handle the load in a
   timely fashion.  The idea is to replay a real-world  database  workload
   as exactly as possible.

   To  create  a  log file that can be parsed by pgreplay, you need to set
   the following parameters in postgresql.conf:

          log_min_messages=error (or more)
          log_min_error_statement=log (or more)
          log_connections=on
          log_disconnections=on
          log_line_prefix='%m|%u|%d|%c|' (if you don't use CSV logging)
          log_statement='all'
          lc_messages must be set to English (encoding does not matter)
          bytea_output=escape (from version 9.0 on, only if  you  want  to
          replay the log on 8.4 or earlier)

   The  database  cluster against which you replay the SQL statements must
   be a clone of the database cluster that generated  the  logs  from  the
   time immediately before the logs were generated.

   pgreplay is useful for performance tests, particularly in the following
   situations:

   *   You want to compare the performance of your PostgreSQL  application
       on different hardware or different operating systems.

   *   You  want  to  upgrade your database and want to make sure that the
       new database version does not suffer from  performance  regressions
       that affect you.

   Moreover, pgreplay can give you some feeling as to how your application
   might scale by allowing you to try to replay the workload at  a  higher
   speed.   Be  warned,  though, that 500 users working at double speed is
   not really the same as 1000 users working at normal speed.

OPTIONS

   Parse options:
   -c     Specifies that the  log  file  is  in  'csvlog'  format  (highly
          recommended) and not in 'stderr' format.

   -b timestamp
          Only  log  entries  greater  or  equal to that timestamp will be
          parsed.  The format is YYYY-MM-DD HH:MM:SS.FFF like in  the  log
          file.  An optional time zone part will be ignored.

   -e timestamp
          Only log entries less or equal to that timestamp will be parsed.
          The format is YYYY-MM-DD HH:MM:SS.FFF like in the log file.   An
          optional time zone part will be ignored.

   -q     Specifies  that  a  backslash  in  a  simple string literal will
          escape  the   following   single   quote.    This   depends   on
          configuration  options  like  standard_conforming_strings and is
          the default for server version 9.0 and less.

   Replay options:
   -h hostname
          Host name where the  target  database  cluster  is  running  (or
          directory where the UNIX socket can be found). Defaults to local
          connections.
          This works just like the -h option of psql.

   -p port
          TCP port where the target database cluster can be reached.

   -W password
          By default, pgreplay assumes that the target database cluster is
          configured  for trust authentication. With the -W option you can
          specify a password that will  be  used  for  all  users  in  the
          cluster.

   -s factor
          Speed  factor  for  replay,  by default 1. This can be any valid
          positive floating point number. A factor less than 1 will replay
          the  workload  in  'slow  motion', while a factor greater than 1
          means 'fast forward'.

   -E encoding
          Specifies the encoding of the log file, which will  be  used  as
          client  encoding  during  replay. If it is omitted, your default
          client encoding will be used.

   -j     If all connections are idle, jump  ahead  to  the  next  request
          instead of sleeping. This will speed up replay. Execution delays
          will still be reported correctly, but replay statistics will not
          contain the idle time.

   Output options:
   -o outfile
          specifies  the  replay file where the statements will be written
          for later replay.

   Debug options:
   -d level
          Specifies the trace level (between 1 and 3).  Increasing  levels
          will  produce  more  detailed information about what pgreplay is
          doing.

   -v     Prints the program version and exits.

ENVIRONMENT

   PGHOST Specifies the default value for the -h option.

   PGPORT Specifies the default value for the -p option.

   PGCLIENTENCODING
          Specifies the default value for the -E option.

LIMITATIONS

   pgreplay can only replay what is logged by PostgreSQL.  This  leads  to
   some limitations:

   *   COPY statements will not be replayed, because the copy data are not
       logged.

   *   Fast-path API function  calls  are  not  logged  and  will  not  be
       replayed.  Unfortunately, this includes the Large Object API.

   *   Since  the log file is always in the server encoding (which you can
       specify with the -E switch of pgreplay),  all  SET  client_encoding
       statements will be ignored.

   *   Since  the  preparation  time  of prepared statements is not logged
       (unless log_min_messages is debug2 or more), these statements  will
       be  prepared  immediately  before  they  are  first executed during
       replay.

   *   Because the log file  contains  only  text,  query  parameters  and
       return values will always be in text and never in binary format. If
       you use binary mode to, say, transfer large binary  data,  pgreplay
       can cause significantly more network traffic than the original run.

   *   Sometimes, if a connection takes longer to complete, the session ID
       unexpectedly changes  in  the  PostgreSQL  log  file.  This  causes
       pgreplay  to  treat the session as two different ones, resulting in
       an additional connection. This is arguably a bug in PostgreSQL.

AUTHOR

   Written by Laurenz Albe <laurenz.albe@wien.gv.at>.

Opportunity

Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.

Free Software

Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.

Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.

Free Books

The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.

Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.

Education

Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.

Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.