Performance-Driven Methods and Tools for Optimization of Collective Communication on Complex Networks

TitlePerformance-Driven Methods and Tools for Optimization of Collective Communication on Complex Networks
Publication TypeThesis
Year of Publication2014
AuthorsDichev, K.
Thesis TypePhD Thesis
AdvisorLastovetsky, A.
Academic DepartmentSchool of Computer Science and Informatics
UniversityUniversity College Dublin
Number of Pages104
Date Published01/2014
AbstractModern clusters of computers are becoming more and more heterogeneous not only in terms of their processing units, but also in terms of the underlying network. In grid networks, it is common to combine optic fiber with Ethernet or Infiniband networks. These distributed resources have varying network properties, but even supercomputers using vendor-specific interconnects are often heterogeneous in terms of both latency and achievable bandwidth between different process pairs. In this sense, network heterogeneity is a general problem, with a different magnitude for different domains. The performance of MPI collective communication operations (e.g. broadcasts) depends strongly on awareness of the properties of such networks. The advantages of topology-aware collective communication (in regard to the network) have been clearly demonstrated in the grid computing domain; this aspect is increasingly important in the domain of supercomputing. Providing network topology to collective communication should not be the task of the application programmer; parallel programs need to be written in a network-oblivious way. For example, the Message Passing Interface was not designed to require any provisioning of network topology. But it is widely recognized that topology awareness is needed for optimal performance. In modern MPI implementations this feature can be included in a transparent way. In this thesis, we investigate and solve a number of issues when designing efficient collective communication for complex platforms. We first focus on the technical difficulties of running and configuring MPI for complex grid environments. Grids are accessible and attractive to many researchers, but difficult to use in the context of message passing. We propose solutions to both technical and configuration problems. Then we proceed to develop a novel method of measuring performance, in particular achievable bandwidth, on a large scale in complex networks. The method is inspired by peer-to-peer protocols like BitTorrent, and their adaptive nature. The resulting data represents a simple performance model. We then use data analysis techniques like clustering methods to recognize bandwidth clusters. We also design a hierarchical clustering algorithm, which reconstructs the network as a hierarchy. This hierarchy can be interpreted as a network topology. We are also able to reconstruct topology as a tree in an alternative method. Overall, this process results in a generic technique to produce topology from performance, independent of the underlying network technology. To complete the process of designing efficient communication middleware, we also describe how both performance and topology can be used as input for performance- or topology-aware collective communication. Topology-aware communication has been studied in the past, and we outline some general hierarchical solutions. In addition, we use a flexible software tool, which separates between performance models and general collective algorithms.
KirilDichev-PhD.pdf1.62 MB