# Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors Based on Functional Performance Models

In this paper we present a new data partitioning algorithm

to improve the performance of parallel matrix multiplication of dense

square matrices on heterogeneous clusters. Existing algorithms either use

single speed performance models which are too simplistic or they do not

attempt to minimise the total volume of communication. The Functional

performance model (FPM) is more realistic then single speed models be-

cause it integrates many important features of heterogeneous processors

such as the processor heterogeneity, the heterogeneity of memory struc-

ture, and the effects of paging. To load balance the computations the

new algorithm uses FPMs to compute the area of the rectangle that is

assigned to each processor. The total volume of communication is then

minimised by choosing a shape and ordering so that the sum of the half-

perimeters is minimised. Experimental results demonstrate that this new

algorithm can reduce the total execution time of parallel matrix multi-

plication in comparison to existing algorithms.

Attachment | Size |
---|---|

Matrix_Multiplication_Heterogeneous_Heteropar2011.pdf | 653.33 KB |

- 3325 reads