Linux Manual Pages

Free Software * Books

Source Code

Free Media

Linux

metaphlan2(1)

NAME

   metaphlan2   -   METAgenomic   PHyLogenetic  ANalysis  for  metagenomic
   taxonomic profiling

SYNOPSIS

   metaphlan2
    --input_type        {fastq,fasta,multifasta,multifastq,bowtie2out,sam}
   [--mpa_pkl   MPA_PKL]   [--bowtie2db   METAPHLAN_BOWTIE2_DB]  [--bt2_ps
   BowTie2 presets] [--bowtie2_exe BOWTIE2_EXE]  [--bowtie2out  FILE_NAME]
   [--no_map]   [--tmp_dir]   [--tax_lev  TAXONOMIC_LEVEL]  [--min_cu_len]
   [--min_alignment_len]     [--ignore_viruses]      [--ignore_eukaryotes]
   [--ignore_bacteria]   [--ignore_archaea]  [--stat_q]  [--ignore_markers
   IGNORE_MARKERS] [--avoid_disqm] [--stat] [-t ANALYSIS  TYPE]  [--nreads
   NUMBER_OF_READS]  [--pres_th  PRESENCE_THRESHOLD]  [--clade] [--min_ab]
   [-h] [-o output file] [--sample_id_key name]  [--sample_id  value]  [-s
   sam_output_file]  [--biom  biom_output]  [--mdelim  mdelim] [--nproc N]
   [-v] [INPUT_FILE] [OUTPUT_FILE]

DESCRIPTION

   MetaPhlAn 2 clade-abundance estimation
   The basic usage of MetaPhlAn 2 consists in the  identification  of  the
   clades  (from phyla to species and strains in particular cases) present
   in the metagenome obtained from a microbiome sample and their  relative
   abundance.    This    correspond   to   the   default   analysis   type
   (--analysis_type rel_ab).

   *      Profiling a metagenome from raw reads:

          metaphlan2 metagenome.fastq --input_type fastq

   *      You  can  take  advantage  of  multiple  CPUs   and   save   the
          intermediate BowTie2 output for re-running

          MetaPhlAn extremely quickly:
          metaphlan2  metagenome.fastq --bowtie2out metagenome.bowtie2.bz2
          --nproc 5 --input_type fastq

   *      If you already mapped your  metagenome  against  the  marker  DB
          (using  a previous MetaPhlAn run), you can obtain the results in
          few seconds by using the previously saved --bowtie2out file  and
          specifying the input (--input_type bowtie2out):

          metaphlan2   metagenome.bowtie2.bz2   --nproc   5   --input_type
          bowtie2out

   *      You can also provide an externally  BowTie2-mapped  SAM  if  you
          specify  this  format  with --input_type. Two steps: first apply
          BowTie2 and then feed MetaPhlAn2 with the obtained sam:

          bowtie2 --sam-no-hd --sam-no-sq  --no-unal  --very-sensitive  -S
          metagenome.sam  -x  /usr/share/metaphlan2/db_v20/mpa_v20_m200 -U
          metagenome.fastq metaphlan2 metagenome.sam  --input_type  sam  >
          profiled_metagenome.txt

   *      Multiple alternative ways to pass the input are also available:

          cat metagenome.fastq | metaphlan2 --input_type fastq
          tar xjf metagenome.tar.bz2 --to-stdout | metaphlan2 --input_type
          fastq
          metaphlan2 --input_type fastq < metagenome.fastq
          metaphlan2 --input_type fastq <(bzcat metagenome.fastq.bz2)
          metaphlan2  --input_type  fastq   <(zcat   metagenome_1.fastq.gz
          metagenome_2.fastq.gz)

   *      We  can  also  natively handle paired-end metagenomes, and, more
          generally, metagenomes stored in multiple files (but you need to
          specify the --bowtie2out parameter):

          metaphlan2   metagenome_1.fastq,metagenome_2.fastq  --bowtie2out
          metagenome.bowtie2.bz2 --nproc 5 --input_type fastq

   MetaPhlAn 2 strain tracking
   MetaPhlAn 2 introduces the capability of charachterizing  organisms  at
   the   strain  level  using  non  aggregated  marker  information.  Such
   capability comes with several slightly different flavours and are a way
   to  perform  strain  tracking  and  comparison across multiple samples.
   Usually, MetaPhlAn 2 is first ran with the default  --analysis_type  to
   profile  the  species present in the community, and then a strain-level
   profiling  can  be  performed  to  zoom-in  into  specific  species  of
   interest.  This  operation  can be performed quickly as it exploits the
   --bowtie2out intermediate  file  saved  during  the  execution  of  the
   default analysis type.

   *      The  following  command will output the abundance of each marker
          with a RPK (reads per kil-base) higher  0.0.  (we  are  assuming
          that  metagenome_outfmt.bz2  has  been generated before as shown
          above).

          metaphlan2 -t marker_ab_table metagenome_outfmt.bz2 --input_type
          bowtie2out > marker_abundance_table.txt

          The  obtained  RPK  can  be  optionally  normalized by the total
          number of reads in the metagenome to guarantee fair  comparisons
          of  abundances  across  samples.  The  number  of  reads  in the
          metagenome needs to be passed with the '--nreads' argument

   *      The list of markers present in the sample can be  obtained  with
          '-t marker_pres_table'

          metaphlan2     -t     marker_pres_table    metagenome_outfmt.bz2
          --input_type bowtie2out > marker_abundance_table.txt

          The --pres_th argument (default 1.0) set the minimum  RPK  value
          to consider a marker present

   *      The  list  '-t  clade_profiles'  analysis  type reports the same
          information of '-t marker_ab_table' but the markers are reported
          on a clade-by-clade basis.

          metaphlan2  -t clade_profiles metagenome_outfmt.bz2 --input_type
          bowtie2out > marker_abundance_table.txt

   *      Finally, to obtain all markers present for a specific clade  and
          all its subclades, the '-t clade_specific_strain_tracker' should
          be used. For example, the following  command  is  reporting  the
          presence/absence  of the markers for the B. fragulis species and
          its strains the optional argument --min_ab specifies the minimum
          clade abundance for reporting the markers

          $    metaphlan2    -t    clade_specific_strain_tracker   --clade
          s__Bacteroides_fragilis    metagenome_outfmt.bz2    --input_type
          bowtie2out > marker_abundance_table.txt

OPTIONS

   positional arguments
   INPUT_FILE
          the input file can be:

   *      a fastq file containing metagenomic reads

          OR

   *      a BowTie2 produced SAM file.

          OR

   *      an  intermediary  mapping  file of the metagenome generated by a
          previous MetaPhlAn run

          If the input file is missing, the script assumes that the  input
          is   provided   using   the  standard  input,  or  named  pipes.
          IMPORTANT:  the  type  of  input  needs  to  be  specified  with
          --input_type

   OUTPUT_FILE
          the  tab-separated  output  file of the predicted taxon relative
          abundances [stdout if not present]

   Required arguments
   --input_type {fastq,fasta,multifasta,multifastq,bowtie2out,sam}
          set whether the input is  the  multifasta  file  of  metagenomic
          reads  or  the  SAM file of the mapping of the reads against the
          MetaPhlAn db.  [default 'automatic', i.e. the script will try to
          guess the input format]

   Mapping arguments:
   --mpa_pkl MPA_PKL
          the metadata pickled MetaPhlAn file

   --bowtie2db METAPHLAN_BOWTIE2_DB
          The  BowTie2  database  file of the MetaPhlAn database.  Used if
          --input_type is fastq, fasta, multifasta, or multifastq

   --bt2_ps BowTie2 presets
          presets options for BowTie2 (applied only when a multifasta file
          is provided) The choices enabled in MetaPhlAn are:

   *      sensitive

   *      very-sensitive

   *      sensitive-local

   *      very-sensitive-local

          [default very-sensitive]

   --bowtie2_exe BOWTIE2_EXE
          Full path and name of the BowTie2 executable. This option allows
          MetaPhlAn to reach the executable even when it  is  not  in  the
          system PATH or the system PATH is unreachable

   --bowtie2out FILE_NAME
          The file for saving the output of BowTie2

   --no_map
          Avoid storing the --bowtie2out map file

   --tmp_dir
          the  folder  used  to  store  temporary files [default is the OS
          dependent tmp dir]

   Post-mapping arguments
   --tax_lev TAXONOMIC_LEVEL
          The taxonomic level for the relative abundance output:
          'a' : all taxonomic levels
          'k' : kingdoms
          'p' : phyla only
          'c' : classes only
          'o' : orders only
          'f' : families only
          'g' : genera only
          's' : species only
          [default 'a']

   --min_cu_len
          minimum total nucleotide length for the markers in a  clade  for
          estimating   the   abundance   without   considering   sub-clade
          abundances [default 2000]

   --min_alignment_len
          The sam records for aligned reads with the longest  subalignment
          length  smaller than this threshold will be discarded.  [default
          None]

   --ignore_viruses
          Do not profile viral organisms

   --ignore_eukaryotes
          Do not profile eukaryotic organisms

   --ignore_bacteria
          Do not profile bacterial organisms

   --ignore_archaea
          Do not profile archeal organisms

   --stat_q
          Quantile value for the robust average [default 0.1]

   --ignore_markers IGNORE_MARKERS
          File containing a list of markers to ignore.

   --avoid_disqm
          Deactivate the procedure  of  disambiguating  the  quasi-markers
          based on the marker abundance pattern found in the sample. It is
          generally recommended too keep the disambiguation  procedure  in
          order to minimize false positives

   --stat EXPERIMENTAL!   Statistical   approach   for  converting  marker
          abundances into clade abundances
          'avg_g'  : clade global (i.e. normalizing all markers  together)
          average
          'avg_l'  : average of length-normalized marker counts
          'tavg_g' : truncated clade global average at --stat_q quantile
          'tavg_l'  :  trunated average of length-normalized marker counts
          (at --stat_q)
          'wavg_g' : winsorized clade global average (at --stat_q)
          'wavg_l' : winsorized average of length-normalized marker counts
          (at --stat_q)
          'med'    : median of length-normalized marker counts
          [default tavg_g]

   Additional analysis types and arguments
   -t ANALYSIS TYPE
          Type of analysis to perform:

   *      rel_ab: profiling a metagenomes in terms of relative abundances

   *      rel_ab_w_read_stats:   profiling   a  metagenomes  in  terms  of
          relative abundances and estimate the number of reads coming from
          each clade.

   *      reads_map:  mapping  from  reads to clades (only reads hitting a
          marker)

   *      clade_profiles: normalized marker  counts  for  clades  with  at
          least a non-null marker

   *      marker_ab_table:  normalized  marker counts (only when > 0.0 and
          normalized by metagenome size if --nreads is specified)

   *      marker_counts: non-normalized marker counts  [use  with  extreme
          caution]

   *      marker_pres_table:   list  of  markers  present  in  the  sample
          (threshold at 1.0 if not differently specified with --pres_th

          [default 'rel_ab']

   --nreads NUMBER_OF_READS
          The total number of reads in the original metagenome. It is used
          only  when  -t  marker_table  is  specified  for normalizing the
          length-normalized counts with the metagenome size  as  well.  No
          normalization applied if --nreads is not specified

   --pres_th PRESENCE_THRESHOLD
          Threshold   for   calling   a   marker   present   by   the   -t
          marker_pres_table option

   --clade
          The clade for clade_specific_strain_tracker analysis

   --min_ab
          The  minimum  percentage  abundace  for   the   clade   in   the
          clade_specific_strain_tracker analysis

   -h, --help
          show this help message and exit

   Output arguments
   -o output file, --output_file output file
          The output file (if not specified as positional argument)

   --sample_id_key name
          Specify  the  sample  ID  key  for  this  analysis.  Defaults to
          '#SampleID'.

   --sample_id value
          Specify  the  sample  ID  for   this   analysis.   Defaults   to
          'Metaphlan2_Analysis'.

   -s sam_output_file, --samout sam_output_file
          The sam output file

   --biom biom_output, --biom_output_file biom_output
          If  requesting  biom file output: The name of the output file in
          biom format

   --mdelim mdelim, --metadata_delimiter_char mdelim
          Delimiter for bug metadata: - defaults to pipe. e.g. the pipe in
          k__Bacteria|p__Proteobacteria

   Other arguments
   --nproc N
          The number of CPUs to use for parallelizing the mapping [default
          1, i.e. no parallelism]

   -v, --version
          Prints the current MetaPhlAn version and exit

AUTHOR

   The   code   of   MetaPhlAn    was    rwitten    by    Nicola    Segata
   (nicola.segata@unitn.it), Duy Tin Truong (duytin.truong@unitn.it).

   This  manpage  was written by Andreas Tille for the Debian distribution
   and can be used for any other usage of the program.

Opportunity

Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.

Free Software

Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.

Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.

Free Books

The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.

Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.

Education

Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.

Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.