CPM: A software tool for Communication Performance Modelling

Model-based collectives

Shared library libcpm_coll.so implements different model-based algorithms of collectives:

Command-line arguments:

  • verbose verbose mode (default: no)
  • sgv scatterv/gatherv mode
    • 0 propagation (default)
    • 1 broadcast
    • 2 allover
  • model S communication model: Hockney, PLogP, LMO (required for model-based collectives)
  • file S model data file (required for model-based collectives)

In order to preserve the original MPI interface, all model-based implementations use the global variables that provide model parameters. The interface of the implementation of a collective communication operation Y based on the model X includes the following components:

  • int X_Y(standard args) is the model-based collective operation itself (for example, Hockney_Scatter_bfs_binomial_min)
  • X_model* X_model_instance is a global variable providing the model parameters (must be available at all processors in the MPI communicator)
  • int X_initialize(MPI_Comm, X_model* model), int X_finalize(MPI_Comm) are functions responsible for allocation and deallocation of the model instance at all processors. The model argument encapsulates the model parameters obtained either from a file or the model builder.

collectives.dot

All model-based algorithms are divided into two groups: model-specific and generic.

Model-specific collectives depend on certain communication performance models, using parameters specific for these models only. For example, LMO_Gather_split_flat directly uses the LMO model global variable and its threshold parameters to split the medium size messages and perform a series of linear gathers with small messages, in order to avoid escalations of the execution time on the clusters with the TCP/IP communication layer. A model-specific collective operation Y is implemented in the following way:

int X_Y(standard args) {
    if (condition with X_model_instance->param)
        return ...;
}

Generic collectives depend on the prediction of the execution time of some communication operation (communication primitive); they are parameterized by predictions, which can be provided by any model. The communication primitive can be either the collective operation itself or some other simple operation. See Generic model-based collectives.

Tools

Part of MPIBlib

  • MPIBlib/tools/collective - performs a universal or operation-specific collective benchmark with given accuracy and efficiency
  • MPIBlib/tools/collective_test - verifies implementations of collective communication operations

Tests