pgreplay - PostgreSQL log file replayer for performance tests
pgreplay [parse options] [replay options] [-d level] [infile] pgreplay -f [parse options] [-o outfile] [-d level] [infile] pgreplay -r [replay options] [-d level] [infile]
pgreplay reads a PostgreSQL log file (not a WAL file), extracts the SQL statements and executes them in the same order and relative time against a PostgreSQL database cluster. A final report gives you a useful statistical analysis of your workload and its execution. In the first form, the log file infile is replayed at the time it is read. With the -f option, pgreplay will not execute the statements, but write them to a 'replay file' outfile that can be replayed with the third form. With the -r option, pgreplay will execute the statements in the replay file infile that was created by the second form. If the execution of statements gets behind schedule, warning messages are issued that indicate that the server cannot handle the load in a timely fashion. The idea is to replay a real-world database workload as exactly as possible. To create a log file that can be parsed by pgreplay, you need to set the following parameters in postgresql.conf: log_min_messages=error (or more) log_min_error_statement=log (or more) log_connections=on log_disconnections=on log_line_prefix='%m|%u|%d|%c|' (if you don't use CSV logging) log_statement='all' lc_messages must be set to English (encoding does not matter) bytea_output=escape (from version 9.0 on, only if you want to replay the log on 8.4 or earlier) The database cluster against which you replay the SQL statements must be a clone of the database cluster that generated the logs from the time immediately before the logs were generated. pgreplay is useful for performance tests, particularly in the following situations: * You want to compare the performance of your PostgreSQL application on different hardware or different operating systems. * You want to upgrade your database and want to make sure that the new database version does not suffer from performance regressions that affect you. Moreover, pgreplay can give you some feeling as to how your application might scale by allowing you to try to replay the workload at a higher speed. Be warned, though, that 500 users working at double speed is not really the same as 1000 users working at normal speed.
Parse options: -c Specifies that the log file is in 'csvlog' format (highly recommended) and not in 'stderr' format. -b timestamp Only log entries greater or equal to that timestamp will be parsed. The format is YYYY-MM-DD HH:MM:SS.FFF like in the log file. An optional time zone part will be ignored. -e timestamp Only log entries less or equal to that timestamp will be parsed. The format is YYYY-MM-DD HH:MM:SS.FFF like in the log file. An optional time zone part will be ignored. -q Specifies that a backslash in a simple string literal will escape the following single quote. This depends on configuration options like standard_conforming_strings and is the default for server version 9.0 and less. Replay options: -h hostname Host name where the target database cluster is running (or directory where the UNIX socket can be found). Defaults to local connections. This works just like the -h option of psql. -p port TCP port where the target database cluster can be reached. -W password By default, pgreplay assumes that the target database cluster is configured for trust authentication. With the -W option you can specify a password that will be used for all users in the cluster. -s factor Speed factor for replay, by default 1. This can be any valid positive floating point number. A factor less than 1 will replay the workload in 'slow motion', while a factor greater than 1 means 'fast forward'. -E encoding Specifies the encoding of the log file, which will be used as client encoding during replay. If it is omitted, your default client encoding will be used. -j If all connections are idle, jump ahead to the next request instead of sleeping. This will speed up replay. Execution delays will still be reported correctly, but replay statistics will not contain the idle time. Output options: -o outfile specifies the replay file where the statements will be written for later replay. Debug options: -d level Specifies the trace level (between 1 and 3). Increasing levels will produce more detailed information about what pgreplay is doing. -v Prints the program version and exits.
PGHOST Specifies the default value for the -h option. PGPORT Specifies the default value for the -p option. PGCLIENTENCODING Specifies the default value for the -E option.
pgreplay can only replay what is logged by PostgreSQL. This leads to some limitations: * COPY statements will not be replayed, because the copy data are not logged. * Fast-path API function calls are not logged and will not be replayed. Unfortunately, this includes the Large Object API. * Since the log file is always in the server encoding (which you can specify with the -E switch of pgreplay), all SET client_encoding statements will be ignored. * Since the preparation time of prepared statements is not logged (unless log_min_messages is debug2 or more), these statements will be prepared immediately before they are first executed during replay. * Because the log file contains only text, query parameters and return values will always be in text and never in binary format. If you use binary mode to, say, transfer large binary data, pgreplay can cause significantly more network traffic than the original run. * Sometimes, if a connection takes longer to complete, the session ID unexpectedly changes in the PostgreSQL log file. This causes pgreplay to treat the session as two different ones, resulting in an additional connection. This is arguably a bug in PostgreSQL.
Written by Laurenz Albe <laurenz.albe@wien.gv.at>.
Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.
Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.
Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.
Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.
The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.
Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.
Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.
Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.