
   Description of the test
   ========================

   This test executes the parallel matrix-matrix multiplication
   using two-dimensional grid of processes. 

   The matrix distribution used is two-dimensional 
   homogeneous block-cyclic distribution of matrices

   The test uses pure MPI

   CONDITIONS
   ----------
   N must be a multiple of lcm(gx, gy) and r (matrices are divided into
   whole number of generalized blocks).
   gx is the generalized block size along the row
   gy is the generalized block size along the row
   gx must be a multiple of p.
   gy must be a multiple of q.

   Files
   -----

   mxm_i.h         ----> header containing the function declarations and variable declarations
   mxm_i.c         ----> Contains the algorithm of parallel matrix-matrix multiplication 
                         using homogeneous block-cyclic distribution of matrices
   mxm.c           ----> contains the main
   Load_balance.c  ----> Matrices A, B, and C are distributed homogeneneously amongst the processes
                         (processes are assumed to have the same speed)
   counter.h       ----> Contains the parameters
                         N=Size of the matrix to solve
                         r=granularity or communication-to-computation ratio (values of 16, 32 typical)
                         generalised_block_size_row=Generalized block size along the row
                         generalised_block_size_col=Generalized block size along thecolumn
                         p=number of processes along the process row
                         q=number of processes along the process column

   HOW TO RUN
   ----------

   shell$ hmpibcast mxm.c mxm_i.c mxm_i.h Load_balance.c counter.h

   shell$ hmpiload -o mxm mxm.c 

   shell$ hmpirun mxm
   N=384, p=2, q=2, r=4, grow=12, gcol=12, time(sec)=0.323113
