Abstract | High Performance Computing (HPC) has grown to encompass many new
architectures and algorithms. The Top500 list, which ranks the world's
fastest supercomputers every six months, shows this trend towards a va-
riety of heterogeneous architectures - particularly multicores and general
purpose Graphical Processing Units (GPUs). Heterogeneity, whether it is
in computational power or communication interconnect, provides new chal-
lenges in programming and algorithm development. The general trend has
been to adapt algorithms used on homogeneous parallel systems for use in
the new heterogeneous parallel systems. However, assumptions carried over
from those homogeneous systems are not always applicable to heterogeneous
systems.
Linear algebra matrix operations are widely used in scientific computing
and are an area of significant HPC study. To parallelise matrix operations
over many nodes in an HPC system, each processor is given a section of the
matrix to compute. These sections are collectively called the data partition.
Linear algebra operations, such as matrix matrix multiplication (MMM) and
LU factorisation, use data partitioning based on the original homogeneous
algorithms. Specifically, each processor is assigned a rectangular sub matrix.
The primary motivation of this work is to question whether the rectangular
data partitioning is optimal for heterogeneous systems.
This thesis will show the rectangular data partitioning is not universally
optimal when applied to the heterogeneous case. The major contribution
will be a new method for creating optimal data partitions, called the Push
Technique. This method is used to make small, incremental changes to a
data partition, while guaranteeing not to worsen it. The end result is a small
number of potentially optimal data partitions, called candidates. These candidates are then analysed for differing numbers of processors and topologies.
The validity of the Push Technique is verified analytically and experimentally.
The optimal data partition for matrix operations is found for systems of
two and three heterogeneous processors, including differing communication
topologies. A methodology is outlined for applying the Push Technique to
matrix computations other than MMM, such as LU Factorisation, and for
larger numbers of heterogeneous processors. |