CPM: A software tool for Communication Performance Modelling

Parallel matrix multiplication

Heterogeneous parallel matrix product:

  1. Estimates the execution time of gsl_blas_dgemm on all processors
  2. Scatters the row blocks of matrix A proportionally to the speed of processors (scatterv)
  3. Broadcasts matrix B
  4. Gathers matrix C (gatherv)

Implemented with the following collective communications:

  • native
  • Hockney_dfs_binomial_min
  • Hockney_dfs_binomial_max