CPM: A software tool for Communication Performance Modelling
Parallel matrix multiplication
Heterogeneous parallel matrix product:
- Estimates the execution time of gsl_blas_dgemm on all processors
- Scatters the row blocks of matrix A proportionally to the speed of processors (scatterv)
- Broadcasts matrix B
- Gathers matrix C (gatherv)
Implemented with the following collective communications:
- native
- Hockney_dfs_binomial_min
- Hockney_dfs_binomial_max