
   Description of the test
   ========================

   This test executes the parallel matrix-matrix multiplication
   using two-dimensional grid of processes. 
   The matrix distribution used is two-dimensional 
   homogeneous block-cyclic distribution of matrices

   The computation and communication times of each process involved 
   in computations are returned. The computation times of all the processes 
   involved in computations are expected to be the same.

   The test uses pure MPI

   CONDITIONS
   ----------
   N must be a multiple of lcm(gx, gy) and r (matrices are divided into
   whole number of generalized blocks).
   gx is the generalized block size along the row
   gy is the generalized block size along the row
   gx must be a multiple of p.
   gy must be a multiple of q.

   Files
   -----

   mxm_i.h         ----> header containing the function declarations and variable declarations
   mxm_i.c         ----> Contains the algorithm of parallel matrix-matrix multiplication 
                         using homogeneous block-cyclic distribution of matrices
   mxm.c           ----> contains the main
   Load_balance.c  ----> Matrices A, B, and C are distributed homogeneneously amongst the processes
                         (processes are assumed to have the same speed)
   counter.h       ----> Contains the parameters
                         N=Size of the matrix to solve
                         r=granularity or communication-to-computation ratio (values of 16, 32 typical)
                         generalised_block_size_row=Generalized block size along the row
                         generalised_block_size_col=Generalized block size along thecolumn
                         p=number of processes along the process row
                         q=number of processes along the process column

   HOW TO RUN
   ----------

   shell$ hmpibcast mxm.c mxm_i.c mxm_i.h Load_balance.c counter.h

   shell$ hmpiload -o mxm mxm.c 

   shell$ hmpirun mxm
   Process 1, computation time=1.9913755762, communication_time=0.0573699329
   Process 2, computation time=2.0084749985, communication_time=0.0401931818
   Process 3, computation time=1.9868741535, communication_time=0.0636878246
   Process 0, computation time=1.9909653791, communication_time=0.0582450704
   N=768, p=2, q=2, r=8, grow=12, gcol=12, time(sec)=2.093113
