
   Description of the test
   ========================

   This test executes the parallel matrix-matrix multiplication
   using two-dimensional grid of processes.

   The matrix distribution used is two-dimensional 
   heterogeneous block-cyclic distribution of matrices

   The test takes as its input the number of processes along the row 
   and the number of processes along the column

   CONDITIONS
   ----------
   N must be a multiple of lcm(gx, gy) and r (matrices are divided into
   whole number of generalized blocks).
   gx is the generalized block size along the row
   gy is the generalized block size along the column

   Files
   -----
   ParallelAxB.mpc ----> Performance model definition
   mxm_i.h         ----> header containing the function declarations and variable declarations
   mxm_i.c         ----> Contains the algorithm of parallel matrix-matrix multiplication 
                         using heterogeneous block-cyclic distribution of matrices
   mxm.c           ----> contains the main
   Load_balance.c  ----> Matrices A, B, and C are distributed amongst the processes
                         (Proportional to the speeds of the processors)
   counter.h       ----> Contains the parameters
                         N=Size of the matrix to solve
                         r=granularity or communication-to-computation ratio (values of 16, 32 typical)
                         Optimal_generalised_block_size_row=generalized block size along the row.
                         Optimal_generalised_block_size_col=generalized block size along the column.
                         p=Number of processes along the row
                         q=Number of processes along the column

   HOW TO RUN
   ----------
shell$ hmpicc ParallelAxB.mpc

shell$ hmpibcast mxm.c mxm_i.c mxm_i.h ParallelAxB.c counter.h Load_balance.c

shell$ hmpiload -o mxm mxm.c

shell$ hmpirun mxm
N=2048, p=2, q=2, block size row=32, block size column=32, time(sec)=21.221
