This talk will focus on methods for finding the optimal configuration of data parallel applications on clusters of heterogeneous nodes based on multicores and multi-GPUs. The methods assume that the optimized kernels for local computations on GPUs and multicore CPUs are available. The methods represent the target platform as a hierarchical cluster of abstract heterogeneous uniprocessors, each representing a group of tightly coupled computing devices relatively independent of other such groups.