Optimal Partitioning for Parallel Matrix Computation on a Small Number of Abstract Heterogeneous Processors

Title	Optimal Partitioning for Parallel Matrix Computation on a Small Number of Abstract Heterogeneous Processors
Publication Type	Thesis
Year of Publication	2014
Authors	DeFlumere, A.
Thesis Type	PhD
Advisor	Lastovetsky, A.
Academic Department	School of Computer Science and Informatics
University	University College Dublin
City	Dublin
Number of Pages	161
Date Published	09/2014
Abstract	High Performance Computing (HPC) has grown to encompass many new architectures and algorithms. The Top500 list, which ranks the world's fastest supercomputers every six months, shows this trend towards a va- riety of heterogeneous architectures - particularly multicores and general purpose Graphical Processing Units (GPUs). Heterogeneity, whether it is in computational power or communication interconnect, provides new chal- lenges in programming and algorithm development. The general trend has been to adapt algorithms used on homogeneous parallel systems for use in the new heterogeneous parallel systems. However, assumptions carried over from those homogeneous systems are not always applicable to heterogeneous systems. Linear algebra matrix operations are widely used in scientific computing and are an area of significant HPC study. To parallelise matrix operations over many nodes in an HPC system, each processor is given a section of the matrix to compute. These sections are collectively called the data partition. Linear algebra operations, such as matrix matrix multiplication (MMM) and LU factorisation, use data partitioning based on the original homogeneous algorithms. Specifically, each processor is assigned a rectangular sub matrix. The primary motivation of this work is to question whether the rectangular data partitioning is optimal for heterogeneous systems. This thesis will show the rectangular data partitioning is not universally optimal when applied to the heterogeneous case. The major contribution will be a new method for creating optimal data partitions, called the Push Technique. This method is used to make small, incremental changes to a data partition, while guaranteeing not to worsen it. The end result is a small number of potentially optimal data partitions, called candidates. These candidates are then analysed for differing numbers of processors and topologies. The validity of the Push Technique is verified analytically and experimentally. The optimal data partition for matrix operations is found for systems of two and three heterogeneous processors, including differing communication topologies. A methodology is outlined for applying the Push Technique to matrix computations other than MMM, such as LU Factorisation, and for larger numbers of heterogeneous processors.

Attachment	Size
AshleyPhDThesis.pdf	3.22 MB

15086 reads
XML
BibTex
Google Scholar