
   Description of the test
   =======================

   Test performing the heterogeneous block-cyclic matrix-matrix 
   multiplication.

   The data distribution is 1D CONTIGUOUS block-cyclic distribution
   of matrices. Generalized block elements are assigned contiguously to
   the matrices.

   For example given N=16, 
   p=4 (0, 1, 2, 3), 
   g=8, r=1, and speeds are (1, 2, 2, 3), the distribution
   is:
   0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0
   1 1 1 1 1 1 1 1  1 1 1 1 1 1 1 1
   1 1 1 1 1 1 1 1  1 1 1 1 1 1 1 1
   2 2 2 2 2 2 2 2  2 2 2 2 2 2 2 2
   2 2 2 2 2 2 2 2  2 2 2 2 2 2 2 2
   3 3 3 3 3 3 3 3  3 3 3 3 3 3 3 3
   3 3 3 3 3 3 3 3  3 3 3 3 3 3 3 3
   3 3 3 3 3 3 3 3  3 3 3 3 3 3 3 3
   0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0
   1 1 1 1 1 1 1 1  1 1 1 1 1 1 1 1
   1 1 1 1 1 1 1 1  1 1 1 1 1 1 1 1
   2 2 2 2 2 2 2 2  2 2 2 2 2 2 2 2
   2 2 2 2 2 2 2 2  2 2 2 2 2 2 2 2
   3 3 3 3 3 3 3 3  3 3 3 3 3 3 3 3
   3 3 3 3 3 3 3 3  3 3 3 3 3 3 3 3
   3 3 3 3 3 3 3 3  3 3 3 3 3 3 3 3
   
   CONDITIONS
   ----------
   N must be a multiple of g and r (matrices are divided into
   whole number of generalized blocks).
   g must be a multiple of p.

   Files
   -----

   Load_balance.h   ----\ code to determine the block panel
   Load_balance.c   ----/ for the processor grid. This block panel
                          is then cyclically distributed.
 
   mxm_i.h          ----\
   mxm_i.c             -- Code for the HEHE matrix-matrix multiplication
   mxm.c            ----/ algorithm.

   simple.c         ----> Performance model definitions

   counter.h        ----> Contains the parameters:
                          N=Size of the problem
                          r=granularity or communication-to-computation ratio
                          p=Number of processes
                          q=Generalized block size along the row/column

   HOW TO RUN
   ----------
   shell$ hmpibcast mxm.c mxm_i.c mxm_i.h simple.c counter.h Load_balance.c

   shell$ hmpiload -o mxm mxm.c 

   shell$ hmpirun mxm
   N=512, p=4, block size=16, time(sec)=2.628
