Difference between revisions of "OpenMPI"

From HCL
Jump to: navigation, search
(MCA parameter files)
 
(6 intermediate revisions by 2 users not shown)
Line 7: Line 7:
 
  btl_tcp_if_exclude = lo,eth1
 
  btl_tcp_if_exclude = lo,eth1
  
== Running applications on Multiprocessors/Multicores ==
+
== Handling SSH key issues ==
Process can be bound to specific sockets and cores on nodes by choosing right options of mpirun.
+
 
* [http://www.open-mpi.org/doc/v1.4/man1/mpirun.1.php#sect9 Process binding]
+
This trick avoids a confirmation message asking "yes" when asked by SSH if a host should be added to known_hosts:
* [http://www.open-mpi.org/doc/v1.4/man1/mpirun.1.php#sect10 Rankfiles]
 
  
== Debugging applications on Multiprocessors/Multicores ==
+
    ssh -q -o StrictHostKeyChecking=no
  
* [http://www.open-mpi.org/faq/?category=debugging#serial-debuggers Serial debugger (eg:gdb)]
+
So with OpenMPI it can be used as
** 1. Attach to individual MPI processes after they are running.
 
      For example, launch your MPI application as normal with mpirun. Then login to the node(s) where your application is running and use the --pid option to gdb to attach to your application.
 
      An inelegant-but-functional technique commonly used with this method is to insert the following code in your application where you want to attach:
 
  
      {
+
    mpirun --mca plm_rsh_agent "ssh -q -o StrictHostKeyChecking=no"
          int i = 0;
 
          char hostname[256];
 
          gethostname(hostname, sizeof(hostname));
 
          printf("PID %d on %s ready for attach\n", getpid(), hostname);
 
          fflush(stdout);
 
          while (0 == i)
 
          sleep(5);
 
      }
 
  
      This code will output a line to stdout outputting the name of the host where the process is running and the PID to attach to. It will then spin on the sleep() function forever waiting for you to attach
+
== Running applications on Multiprocessors/Multicores ==
      with a debugger. Using sleep() as the inside of the loop means that the processor won't be pegged at 100% while waiting for you to attach.
+
Process can be bound to specific sockets and cores on nodes by choosing right options of mpirun.
      Once you attach with a debugger, go up the function stack until you are in this block of code (you'll likely attach during the sleep()) then set the variable i to a nonzero value. With GDB, the syntax
+
* [http://www.open-mpi.org/doc/v1.4/man1/mpirun.1.php#sect9 Process binding]
      is:
+
* [http://www.open-mpi.org/doc/v1.4/man1/mpirun.1.php#sect10 Rankfile]
        (gdb) set var i = 7
 
  
      Then set a breakpoint after your block of code and continue execution until the breakpoint is hit. Now you have control of your live MPI application and use the full functionality of the debugger.
+
== PERUSE ==
      You can even add conditionals to only allow this "pause" in the application for specific MPI processes (e.g., MPI_COMM_WORLD rank 0, or whatever process is misbehaving).
+
[[Media:current_peruse_spec.pdf|PERUSE Specification]]
** 2. Use mpirun to launch xterms (or equivalent) with serial debuggers.
 
      shell$ mpirun -np 4 xterm -e gdb my_mpi_application
 

Latest revision as of 11:45, 22 August 2012

http://www.open-mpi.org/faq/

MCA parameter files

If you want to permanently use some MCA parameter settings, you can create a file $HOME/.openmpi/mca-params.conf, e.g.:

cat $HOME/.openmpi/mca-params.conf
btl_tcp_if_exclude = lo,eth1

Handling SSH key issues

This trick avoids a confirmation message asking "yes" when asked by SSH if a host should be added to known_hosts:

   ssh -q -o StrictHostKeyChecking=no 

So with OpenMPI it can be used as

   mpirun --mca plm_rsh_agent "ssh -q -o StrictHostKeyChecking=no"

Running applications on Multiprocessors/Multicores

Process can be bound to specific sockets and cores on nodes by choosing right options of mpirun.

PERUSE

PERUSE Specification