Optimization of data parallel applications for heterogeneous and hierarchical HPC platforms based on multicores and multi-GPUs

This talk will focus on methods for finding the optimal configuration of data parallel applications on clusters of heterogeneous nodes based on multicores and multi-GPUs. The methods assume that the optimized kernels for local computations on GPUs and multicore CPUs are available. The methods represent the target platform as a hierarchical cluster of abstract heterogeneous uniprocessors, each representing a group of tightly coupled computing devices relatively independent of other such groups. Accurate application-specific performance models of this cluster are used for optimal distribution of computations balancing the load of all computing devices. Techniques for building such performance models and data partitioning algorithms with these models will be presented in detail.

slides1.82 MB