1. BRIEF DESCRIPTION OF THE APPLICATION

Let us consider a regular application multiplying 2 dense square
nxn matrices X and Y.

Our application will use a number of virtual processors, each of
which computes a number of rows of the resulting matrix Z.  Both
dimension n of matrices and the number of virtual processors
involved in computations are defined in run time.

The application implements the following scheme:

    Initializing X and Y on the virtual host-processor

    Creating network 'w'

    Scattering rows of X over virtual processors of network 'w' 

    Broadcasting Y over virtual processors of network 'w'

    Parallel computing submatrices of Z

    Gathering the resulting matrix Z on the virtual host-processor 


2. COMMENTS ON THE mpC CODE

Formal parameters 'x', 'y', 'z' and 'n' of basic function 'MxM'
belong to the virtual host-processor. It is meant that 'n' holds
the dimension of matrices. It is also meant that 'x', 'y' and 'z'
point to nxn-member arrays holding matrices X, Y and Z
correspondingly.

Library nodal function 'MPC_Processors_static_info' is called on
the entire computing space returning the number of actual
processors and their relative performances.  So, after this call
replicated variable 'nprocs' will hold the number of actual
processors, and replicated array 'powers' will hold their
relative performances.

Nodal function 'Partition' is called on the entire computing
space. Based on relative performances of actual processors, this
function computes how many rows of the resulting matrix will be
computed by every actual processor. So, after this call
'nrows[i]' will hold the number of rows computed by i-th actual
processor.

The type of the automatic network 'w' is defined completely only
in run time. Network 'w', which executes the rest of computations
and communications, is defined in such a way, that the more
powerful the virtual processor, the greater number of rows it
computes. The mpC environ- ment will ensure the optimal mapping
of the virtual pro- cessors constituting 'w' into a set of
processes constituting the entire computing space. So, just one
process from pro- cesses running on each of actual processors
will be involved in multiplication of matrices, and the more
powerful the actual processor, the greater number of rows its
process will compute.

Network function 'ParMult' is called on net- work 'w'. In this
call, topological argument '[w]nprocs' specifies a network type
as an instance of parametrized network type 'SimpleNet', and
network argument 'w' specifies a region of the computing space
treated by func- tion 'ParMult' as a network of this type.

The header of the definition of function 'ParMult' declares
identifier 'v' of a network being a spe- cial network formal
parameter of the function. Since net- work 'v' has a parametrized
type, topological parameter 'p' is also declared in this
header. In the function body, special formal parameter 'p' is
treated as an unmodifiable variable of type 'int' replicated over
network formal parameter 'v'.  The rest of formal parameters
(regular formal parameters) of the function are also distributed
over 'v'.

Actually, 'p' holds the number of virtual processors in net- work
'v', 'n' holds the dimension of matrices, 'r' points to p-member
array, i-th element of which holds the number of rows of the
resulting matrix that i-th virtual processor of network 'v'
computes. Each component of 'dy' points to an array to contain
nxn matrix Y. Each component of 'dz' points to an array to
contain the rows of Z computed on the corresponding virtual
processor of 'v'. Each component of 'dx' points to an array to
contain the rows of X used in computations on the corresponding
virtual processor. In addition, throughout the function execution
the components of 'dx', 'dy', 'dz' belonging to the parent of
network 'v' are reputed to point to arrays holding matrices X, Y
and Z correspondingly.

The call to so-called embedded network function 'MPC_Bcast'
broadcasts matrix Y from the parent of 'v' to all virtual
processors of 'v'. As a result, each component of the distributed
array pointed by 'dy' will contain this matrix.

After then a few asynchronous statements are placed. They form
two p-member arrays 'd' and 'l' distributed over 'v'. After their
execution, 'l[i]' will hold the number of elements in the portion
of the resulting matrix which is computed by the i-th virtual
processor of 'v', and 'd[i'] will hold the displacement which
corresponds to this portion in the resulting matrix.
Equivalently, 'l[i]' will hold the number of elements in
the portion of matrix X which is used by i-th virtual processor
of 'v', and 'd[i]' will hold the displacement which corresponds
to this portion in matrix X.  Each component of 'c' will hold the
number of elements in the portion of the resulting matrix which
is computed by the corresponding virtual processor (equivalently,
the number of elements in the portion of matrix X which is used
by this virtual processor).

The call to embedded network function 'MPC_Scatter' scatters
matrix X from the parent of 'v' to all virtual processors of
'v'. As a result, each component of 'dx' will point to an array
containing the corresponding portion of matrix X.
   
The call to nodal function 'SeqMult' on 'v' computes the
corresponding portions of the resulting matrix on each of its
virtual processors in parallel ('SeqMult' implements traditional
sequential algorithm of matrix multipli- cation).

Finally, the call to embedded network function 'MPC_Gather'
gathers resulting matrix Z each virtual processor of 'v' sending
its portion of the result to the parent of 'v'.


3. FORMAT OF THE INPUT FILE

The application offers to input the dimension of matrices from
the console of the host workstation.
 
Input file (if any) should contain elements of matrices X and Y
correspondingly. Elements of each matrix should be ordered
lexicographically.

The application uses some default matrices if an input file is
absent.


4. HOW TO RUN THE APPLICATION

Let the virtual parallel machine to run the application has been
already opened. To produce two target C files, it is necessary
to type:

mpcc mxm.mpc

Note. Use the absolute application name if 'mxm.mpc' is not in
the current directory.

The above command will produce file 'mxm.c' in the current
directory. To make the file accessible to the mpC programming
environment, it is necessary to copy it into the $MPCLOAD
directory:

cp mxm.c $MPCLOAD

To broadcast these files from the host workstation to all
workstations constituting the distributed memory machine, it is
necessary to type:
 
mpcbcast mxm.c

To produce executable 'mxm' on each workstation of the
distributed memory machine, it is necessary to type:

mpcload -het -o mxm mxm.c -lm

Finally, to run the application, it is necessary to type: 

mpcrun mxm -- <input_file>

or

mpcrun mxm

to use default matrices.

Note. Use the absolute name of the input file if it is placed in
a directory other then the directory which was a current
directory when you open your virtual parallel machine.
