CPM: A software tool for Communication Performance Modelling
Model-based collectives
Shared library libcpm_coll.so
implements different model-based algorithms of collectives:
- LinInterp-based collective operations
- Hockney-based collective operations
- LogGP-based collective operations
- PLogP-based collective operations
- LMO-based collective operations
Command-line arguments:
- verbose verbose mode (default: no)
- sgv scatterv/gatherv mode
- 0 propagation (default)
- 1 broadcast
- 2 allover
- model S communication model: Hockney, PLogP, LMO (required for model-based collectives)
- file S model data file (required for model-based collectives)
In order to preserve the original MPI interface, all model-based implementations use the global variables that provide model parameters. The interface of the implementation of a collective communication operation Y
based on the model X
includes the following components:
int X_Y(standard args)
is the model-based collective operation itself (for example, Hockney_Scatter_bfs_binomial_min)X_model* X_model_instance
is a global variable providing the model parameters (must be available at all processors in the MPI communicator)int X_initialize(MPI_Comm, X_model* model), int X_finalize(MPI_Comm)
are functions responsible for allocation and deallocation of the model instance at all processors. Themodel
argument encapsulates the model parameters obtained either from a file or the model builder.
All model-based algorithms are divided into two groups: model-specific and generic.
Model-specific collectives depend on certain communication performance models, using parameters specific for these models only. For example, LMO_Gather_split_flat directly uses the LMO model global variable and its threshold parameters to split the medium size messages and perform a series of linear gathers with small messages, in order to avoid escalations of the execution time on the clusters with the TCP/IP communication layer. A model-specific collective operation Y
is implemented in the following way:
int X_Y(standard args) { if (condition with X_model_instance->param) return ...; }
Generic collectives depend on the prediction of the execution time of some communication operation (communication primitive); they are parameterized by predictions, which can be provided by any model. The communication primitive can be either the collective operation itself or some other simple operation. See Generic model-based collectives.
Tools
Part of MPIBlib
- MPIBlib/tools/collective - performs a universal or operation-specific collective benchmark with given accuracy and efficiency
- MPIBlib/tools/collective_test - verifies implementations of collective communication operations