Optimal Partitioning for Parallel Matrix Computation on a Small Number of Abstract Heterogeneous Processors

TitleOptimal Partitioning for Parallel Matrix Computation on a Small Number of Abstract Heterogeneous Processors
Publication TypeThesis
Year of Publication2014
AuthorsDeFlumere, A.
Thesis TypePhD
AdvisorLastovetsky, A.
Academic DepartmentSchool of Computer Science and Informatics
UniversityUniversity College Dublin
Number of Pages161
Date Published09/2014
AbstractHigh Performance Computing (HPC) has grown to encompass many new architectures and algorithms. The Top500 list, which ranks the world's fastest supercomputers every six months, shows this trend towards a va- riety of heterogeneous architectures - particularly multicores and general purpose Graphical Processing Units (GPUs). Heterogeneity, whether it is in computational power or communication interconnect, provides new chal- lenges in programming and algorithm development. The general trend has been to adapt algorithms used on homogeneous parallel systems for use in the new heterogeneous parallel systems. However, assumptions carried over from those homogeneous systems are not always applicable to heterogeneous systems. Linear algebra matrix operations are widely used in scientifi c computing and are an area of significant HPC study. To parallelise matrix operations over many nodes in an HPC system, each processor is given a section of the matrix to compute. These sections are collectively called the data partition. Linear algebra operations, such as matrix matrix multiplication (MMM) and LU factorisation, use data partitioning based on the original homogeneous algorithms. Specifi cally, each processor is assigned a rectangular sub matrix. The primary motivation of this work is to question whether the rectangular data partitioning is optimal for heterogeneous systems. This thesis will show the rectangular data partitioning is not universally optimal when applied to the heterogeneous case. The major contribution will be a new method for creating optimal data partitions, called the Push Technique. This method is used to make small, incremental changes to a data partition, while guaranteeing not to worsen it. The end result is a small number of potentially optimal data partitions, called candidates. These candidates are then analysed for diff ering numbers of processors and topologies. The validity of the Push Technique is verifi ed analytically and experimentally. The optimal data partition for matrix operations is found for systems of two and three heterogeneous processors, including diff ering communication topologies. A methodology is outlined for applying the Push Technique to matrix computations other than MMM, such as LU Factorisation, and for larger numbers of heterogeneous processors.
AshleyPhDThesis.pdf3.22 MB