
   Description of the test
   ========================

   This test executes the parallel matrix-matrix multiplication
   using two-dimensional grid of processes. 
   The matrix distribution used is two-dimensional 
   heterogeneous distribution of matrices proposed by Kalinov and Lastovetsky
   in their paper

   Kalinov A., and Lastovetsky A. (2001), "Heterogeneous Distribution of 
   Computations Solving Linear Algebra Problems on Networks of 
   Heterogeneous Computers", Journal of Parallel and Distributed Computing, 61, 4, pp. 520-535.

   The function HMPI_Timeof is used to find the optimal generalized block size
   along the row and along the column.

   CONDITIONS
   ----------
   N must be a multiple r.
   
   Files
   -----

   ParallelAxB.mpc ----> Performance model definition for the 2D algorithm
                         of parallel matrix-matrix multiplication using 
                         2D heterogeneous block-cyclic distribution of matrices
   ParallelAxB.c   ----> Generated code of the performance model definition
   Load_balance.c  ----> Contains calls to the HPDL partitioning library to partition the 
                         matrix given the speeds of the processors
   mxm_i.h         ----> header containing the function declarations and variable declarations
   mxm_i.c         ----> Contains the call to HMPI_Timeof, which determines the optimal
                         generalized block sizes
   mxm.c           ----> contains the main
   counter.h       ----> Contains the parameters
                         N=Size of the matrix to solve
                         r=granularity or communication-to-computation ratio (values of 16, 32 typical)
                         p=Number of processes along the row
                         q=Number of processes along the column

   HOW TO RUN
   ----------
   shell$ hmpicc ParallelAxB.mpc

   shell$ hmpibcast mxm.c mxm_i.c mxm_i.h ParallelAxB.c Load_balance.c counter.h

   shell$ hmpiload -o mxm mxm.c 

   shell$ hmpirun mxm
Recon finished
Updated processor performances are: 2825028 2825028 2825028 2825028
=========row block size=2, column block size=2=============
TIMEOF: time=19.114855
===================================
=========row block size=2, column block size=3=============
TIMEOF: time=25.486473
===================================
=========row block size=2, column block size=4=============
TIMEOF: time=38.229710

....

=========row block size=4, column block size=2=============
TIMEOF: time=19.114855
===================================
=========row block size=4, column block size=3=============
TIMEOF: time=12.743237

....

=========row block size=60, column block size=20=============
TIMEOF: time=19.114855
===================================
=========row block size=60, column block size=30=============
TIMEOF: time=19.114855
===================================
=========row block size=60, column block size=60=============
TIMEOF: time=19.114855
===================================


Optimal generalised block size row = 4, Optimal generalised block size col = 3

