lamboot(1)


NAME

   lamboot - Start a LAM multicomputer.

SYNOPSIS

   lamboot [-b] [-d] [-h] [-H] [-l] [-s] [-v] [-V] [-x] [-nn] [-np] [-c
          conf file] [-prefix /lam/install/path/] [-sessionprefix value]
          [-sessionsuffix value] [-withlamprefixpath value] [-ssi key
          value] [bhost]

OPTIONS

   -b      Assume local and remote shell are the same.  This means that
           only one remote shell invocation is used to each node.  If -b
           is not used, two remote shell invocations are used to each
           node.

   -d      Turn on debugging output.  This implies -v.

   -h      Print the command help menu.

   -l      Delay hostname-to-IP-address resolution.

   -prefix Use the LAM installation specified in /lam/install/path/.  Not
           compatible with LAM/MPI versions prior to 7.1.

   -s      Close stdio on the local node.

   -ssi key value
           Send arguments to various SSI modules.  See the "SSI" section,
           below.

   -v      Be verbose.

   -x      Run in fault tolerant mode.

   -H      Do not display the command header.

   -nn     Don't add "-n" to the remote agent command line

   -np     Do not force the execution of $HOME/.profile on remote hosts

   -session-prefix value
           Set the session prefix, overriding LAM_MPI_SESSION_PREFIX.

   -session-suffix value
           Set the session suffix, overriding LAM_MPI_SESSION_SUFFIX.

   -withlamprefixpath value
           Override the internal installation path.  For internal use
           only, do not use unless you know what you are doing.

ENVIRONMENT VARIABLES

   LAM_MPI_SESSION_PREFIX

   LAM_MPI_SESSION_SUFFIX
             It is possible to change the session directory used by
             LAM/MPI, normally of the form:

   tmpdir/lam-username@hostname[-suffix]

   tmpdir    will be set to LAM_MPI_SESSION_PREFIX if set.  Otherwise, it
             will fall back to the value of TMPDIR.  If neither of these
             are set, the default is /tmp.

   suffix    can be overridden by the LAM_MPI_SESSION_SUFFIX environment
             variable.  If LAM_MPI_SESSION_SUFFIX is not set and LAM is
             running under a supported batch scheduling system, $suffix
             will be a value unique to the currently running job.

DESCRIPTION

   The lamboot tool starts the LAM software on each of the machines
   specified in the boot schema, bhost.  The boot schema specifies the
   hostnames of nodes to be used in the run-time MPI environment, and
   optionally lists how may CPUs LAM may used on each node.  The user may
   wish to first run the recon(1) tool to verify that LAM can be started.

   Starting LAM is a three step procedure.  In the first step, hboot(1) is
   invoked on each of the specified machines.  Then each machine allocates
   a dynamic port and communicates it back to lamboot which collects them.
   In the third step, lamboot gives each machine the list of
   machines/ports in order to form a fully connected topology.  If any
   machine was not able to start, or if a timeout period expires before
   the first step completes, lamboot invokes lamwipe(1) to terminate LAM
   and reports the error.

   The bhost file is a LAM boot schema written in the host file syntax.
   See bhost(5).  Instead of the command line, a boot schema can be
   specified in the LAMBHOST environment variable.  Otherwise a default
   file, lam-bhost.def, is used.  LAM searches for bhost first in the
   local directory and then in the installation directory under etc/.

   In addition, lamboot uses a process schema for the individual LAM
   nodes.  A process schema (see conf(5)) is a description of the
   processes which constitute the operating system on a node.  In general,
   the system administrator maintains this file -- LAM/MPI users will
   generally not need to change this file.  It is also possible for the
   user to customize the LAM software with a private process schema.

   The bhost file
   The format of the bhost file is documented in the bhost(5) man page.

   lamboot will resolve all names in bhost on the node in which lamboot
   was invoked (the origin node).  After that, LAM will only use IP
   addresses, not names.  Specifically, the name resolution configuration
   on all other nodes is not used.  Hence, the the origin node must be
   able to resolve all the names in bhost to addresses that are reachable
   by all other nodes.

   A common mistake is to list localhost (or any name that resolves to the
   special address 127.0.0.1 -- the loopback TCP/IP device) in a bhost
   file that contains other nodes.  In this case, the address 127.0.0.1
   would be sent to each of the other nodes as the address of the origin
   node.  If the other nodes try to use 127.0.0.1 to contact the origin
   node, they will actually be contacting themselves, and would eventually
   timeout and fail.

   The IP addresses obtained from bhost are used for LAM's meta messages:
   startup and shutdown of jobs, out-of-band messages used for
   coordination, etc.  The amount of traffic is fairly low (unless using
   the "lamd" mode of MPI message passing, in which case all MPI traffic
   will also utilize LAM's meta messages for transport -- see mpirun(1)).
   When using the TCP RPI, these IP addresses are also used for MPI
   message passing via direct sockets between each pair of nodes.

   A common case is where a "master" node has multiple network interface
   cards (NICs) -- one that is connected to a public network, and one that
   is connected to a private network where parallel jobs are to be run.
   To include the master node in a bhost file, the IP name (or address) of
   the NIC on the private network should be listed in bhost.  This ensures
   that all the other nodes can reach the master node on the private
   network.

   As another example, some configurations have multiple TCP/IP NICs in
   each node of a parallel job.  One NIC is considered "slow" (e.g.,
   10Mbps), while the other is considered "fast" (e.g., 100Mbps).  It is
   desirable to allow LAM to take advantage of the higher bandwidth on the
   "fast" network for MPI messages.  As such, bhost should list the IP
   names (or addresses) of all the "fast" NICs.  However, if the LAM RPI
   does not use TCP/IP (e.g., the Myrinet/GM RPI), the bhost file should
   probably list the "slow" NICs so that LAM's meta message traffic does
   not cause overhead and potentially detract from performance on the
   "fast" network from other high-performance applications.

   Delaying hostname lookups
   Normally, name resolution of hostnames is done on the machines where
   lamboot is invoked.  This is done for optimization reasons, so that the
   list of hostnames only needs to be resolved once (potentially
   minimizing the amount of DNS or other hostname-lookup network traffic).

   However, in some non-uniform networking environments, this is not
   sufficient because each host may have a different IP address on each of
   its peers.  For example, host A may have address Z on host B, but have
   address Y on host C.

   The -l option to lamboot will cause LAM to distribute hostnames to each
   node rather than a fully resolved set of IP addresses.  Hence, each
   node where LAM is booted will do its own name resolution on the list of
   hostnames.

   SSI (System Services Interface)
   The -ssi switch allows the passing of parameters to various SSI
   modules.  LAM's SSI modules are described in detail in lamssi(7).  SSI
   modules have direct impact on MPI programs because they allow tunable
   parameters to be set at run time (such as which boot device driver to
   use, what parameters to pass to that driver, etc.).

   The -ssi switch takes two arguments: key and value.  The key argument
   generally specifies which SSI module will receive the value.  For
   example, the key "boot" is used to select which RPI to be used for
   starting processes on remote nodes.  The value argument is the value
   that is passed.  For example:

   lamboot -ssi boot tm
       Tells LAM to use the "tm" boot module for native launching in
       PBSPro / OpenPBS environments (the tm boot module does not require
       a boot schema).

   lamboot -ssi boot rsh -ssi rsh_agent "ssh -x" boot_schema
       Tells LAM to use the "rsh" boot module, and tells the rsh module to
       use "ssh -x" as the specific agent to launch executables on remote
       nodes.

   And so on.  LAM's boot SSI modules are described in lamssi_boot(7).
   This page should be consulted for specific actions that are taken by,
   and how to tweak the run-time behavior of each boot module.

   The -ssi switch can be used multiple times to specify different key
   and/or value arguments.  If the same key is specified more than once,
   the values are concatenated with a comma (",") separating them.

   Note that the -ssi switch is simply a shortcut for setting environment
   variables.  The same effect may be accomplished by setting
   corresponding environment variables before running lamboot.  The form
   of the environment variables that LAM sets are: LAM_MPI_SSI_key=value.

   Note that the -ssi switch overrides any previously set environment
   variables.  Also note that unknown key arguments are still set as
   environment variable -- they are not checked (by lamwipe) for
   correctness.  Illegal or incorrect value arguments may or may not be
   reported -- it depends on the specific SSI module.

   Remote Executable Invocation
   All tweakable aspects of launching executables on remote nodes during
   lamboot are discussed in lamssi(7) and lamssi_boot(7).  Topics include
   (but are not limited to): discovery of remote shell, run-time overrides
   of the agent use to launch remote executables (e.g., rsh and ssh), etc.

   Closing stdio
   The stdio of each LAM daemon on a remote host that is launched by
   lamboot is closed by default.  Normally, the stdio of the LAM daemon
   launched on the local host is left open so that the internal LAM
   tstdio(3) package works properly.  However, it is sometimes desirable
   to close the stdio of the local LAM daemon as well.  For example:

          rsh somenode lamboot -s hostfile

   This is because rsh waits for two conditions before exiting: lamboot to
   exit, and stdout / stderr to be closed.  Without -s, stdout / stderr
   would not be closed, and rsh (and ssh) will hang even though lamboot
   had completed.  -s causes the stdout / stderr of the local LAM daemon
   to be closed upon invocation, which will allow rsh to complete.  Using
   -s will not affect lamboot in any other way, but it will prevent the
   tstdio(3) package from working properly.

   Fault Tolerance
   If the -x option is given, LAM runs in fault tolerant mode.  In this
   mode, nodes exchange ``heart beat'' messages periodically to make sure
   all nodes are running and the links connecting them are operational.
   When a node's heart beats stop, it is declared ``dead'' and all LAM
   nodes (and processes) are notified.  This allows users to write fault
   tolerant applications that can degrade gracefully, or fully recover by
   replacing the defunct node with another (see lamgrow(1)).  Since this
   mode introduces a performance penalty, it is not activated by default.

EXAMPLES

   lamboot -v
       Start LAM on the machines described in the default boot schema.
       Report about important steps as they are done.

   lamboot -d hostfile
       Start LAM on the machines described in file hostfile.  Provide
       incredibly detailed reports on what is happening at each stage in
       the boot process.

   lamboot mynodes
       Start LAM on the machines described in the boot schema mynodes.
       Operate silently.

FILES

   laminstalldir/etc/lam-bhost.def   default boot schema file, where
                                     "laminstalldir" is the directory
                                     where LAM/MPI was installed

   laminstalldir/etc/lam-conf.lamd   default process schema file for LAM
                                     nodes

SEE ALSO

   recon(1), lamwipe(1), hboot(1), tstdio(3), bhost(5), conf(5), lam-
   helpfile(5), lamssi(7), lamssi_boot(7)





Opportunity


Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.





Free Software


Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.


Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.





Free Books


The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.


Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.





Education


Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.


Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.