CPM: A software tool for Communication Performance Modelling

LMO: an advanced heterogeneous communication performance model

Classes

struct  LMO_model

Defines

#define LMO_C3(n)   (n) * ((n) - 1) * ((n) - 2) / 6
#define LMO_IJK2INDEX(n, i, j, k)   (2 * (n) - (i) - 1) * ((i) - 1) * (i) / 4 + (2 * (n) - (i) - (j) + 1) * ((j) - (i)) / 2 + (k) - (j) - 1

Functions

LMO_modelLMO_alloc (int n)
void LMO_free (LMO_model *model)
void LMO_read (FILE *stream, LMO_model **model)
void LMO_write (FILE *stream, const LMO_model *model)
void LMO_estimate (MPI_Comm comm, MPIB_precision precision, MPIB_msgset msgset, int parallel, LMO_model **model)
double LMO_predict_p2p (void *_this, int i, int j, int M)
double LMO_predict_scatter_flat (void *_this, int root, int M)
double LMO_predict_gather_flat (void *_this, int root, int M)
void HLMO_estimate (MPI_Comm comm, MPIB_precision precision, MPIB_msgset msgset, LMO_model **model)
double HLMO_predict_p2p (void *_this, int i, int j, int m)
double HLMO_predict_scatter_flat (void *_this, int root, int m, int n)

Detailed Description

This module provides building of the LMO model and estimation of the execution time of point-to-point and collective communication operations.


Define Documentation

#define LMO_C3 (   n  )     (n) * ((n) - 1) * ((n) - 2) / 6

$ C_n^3 $

#define LMO_IJK2INDEX (   n,
  i,
  j,
  k 
)    (2 * (n) - (i) - 1) * ((i) - 1) * (i) / 4 + (2 * (n) - (i) - (j) + 1) * ((j) - (i)) / 2 + (k) - (j) - 1

The index of the $ (i, j, k) $ element in the array of $ C_n^3 $ elements, $ i < j < k < n $. $ \displaystyle\frac{C_{n - 1}^2 + C_{n - i}^2}{2} i + \frac{(n - i + 1) + (n - j)}{2} (j - i) + (k - j - 1) $.


Function Documentation

LMO_model* LMO_alloc ( int  n  ) 

Allocates the LMO model.

Parameters:
n number of processors
Returns:
LMO model
void LMO_free ( LMO_model model  ) 

Frees the LMO model.

Parameters:
model LMO model
void LMO_read ( FILE *  stream,
LMO_model **  model 
)

Reads the LMO model.

void LMO_write ( FILE *  stream,
const LMO_model model 
)

Writes the LMO model.

void LMO_estimate ( MPI_Comm  comm,
MPIB_precision  precision,
MPIB_msgset  msgset,
int  parallel,
LMO_model **  model 
)

Estimates the parameters of the LMO model.

  1. Measures one-to-many execution time for the message sizes $ 0 < M < max\_size $ and performs the Bai & Perron algorithm over the F statistic for the data to find the $ S $ breakpoint in the piecewise linear regression $ T $ ~ $ M $.
  2. Measures many-to-one execution time for the message sizes $ 0 < M < max\_size $ and performs the Bai & Perron algorithm over the F statistic for the data to find the $ M_2 $ breakpoint in the piecewise linear regression $ T $ ~ $ 1 $.
  3. In the loop, measures many-to-one execution time for the message sizes $ 0 < M < m $ with the stride reduced twice each time. $ m $ is a message size for which a tenfold escalation of the execution time has been observed at the previous step. As the stride reaches 1 byte, $ m $ is truncated to a kb value, which will be $ M_1 $.
  4. Finds the fixed processing delays and latencies.
    For each 3 nodes $ i < j < k $, measures execution times and solves systems of equations:
    $ \begin{cases} T_{ij}(0) = 2 (C_i + L_{ij} + C_j) & i \xlongleftrightarrow[0]{0} j \\ T_{jk}(0) = 2 (C_j + L_{jk} + C_k) & j \xlongleftrightarrow[0]{0} k \\ T_{ik}(0) = 2 (C_i + L_{ik} + C_k) & i \xlongleftrightarrow[0]{0} k \\ T_{ijk}(0) = 2 (2 C_i + \displaystyle\max_{x = j, k}(L_{ix} + C_x)) & i \xlongleftrightarrow[0]{0} jk \\ T_{jik}(0) = 2 (2 C_j + \displaystyle\max_{x = i, k}(L_{jx} + C_x)) & j \xlongleftrightarrow[0]{0} ik \\ T_{kij}(0) = 2 (2 C_k + \displaystyle\max_{x = i, j}(L_{kx} + C_x)) & k \xlongleftrightarrow[0]{0} ij \\ \end{cases} $
    averages solutions:
    $ T_{ijk}(0) = 2 (2 C_i + \displaystyle\max_{x = j, k}(L_{ix} + C_x)) = 2 C_i + \displaystyle\max_{x = j, k}T_{ix}(0) $
    $ \begin{cases} C_i = (T_{ijk}(0) - \displaystyle\max_{x = j, k}T_{ix}(0)) / 2 \\ C_j = (T_{jik}(0) - \displaystyle\max_{x = i, k}T_{jx}(0)) / 2 \\ C_j = (T_{kij}(0) - \displaystyle\max_{x = i, j}T_{kx}(0)) / 2 \\ L_{ij} = T_{ij}(0) / 2 - C_i - C_j \\ L_{jk} = T_{jk}(0) / 2 - C_j - C_k \\ L_{ik} = T_{ik}(0) / 2 - C_i - C_k \\ \end{cases} $
    and checks confidence intervals.
  5. Finds the variable processing delays and transmission rates.
    For each 3 nodes $ i < j < k $, solves systems of equations:
    $ \begin{cases} T_{ij}(M) = 2 (C_i + L_{ij} + C_j + M (t_i + \displaystyle\frac{1}{\beta_{ij}} + t_j)) & i \xlongleftrightarrow[M]{M} j \\ T_{jk}(M) = 2 (C_j + L_{jk} + C_k + M (t_j + \displaystyle\frac{1}{\beta_{jk}} + t_k)) & j \xlongleftrightarrow[M]{M} k \\ T_{ik}(M) = 2 (C_i + L_{ik} + C_k + M (t_i + \displaystyle\frac{1}{\beta_{ik}} + t_k)) & i \xlongleftrightarrow[M]{M} k \\ T_{ijk}(M) = 2 (2 C_i + M t_i) + \displaystyle\max_{x = j, k}(2 (L_{ix} + C_x) + M (\frac{1}{\beta_{ix}} + t_x)) & i \xlongleftrightarrow[0]{M} jk \\ T_{jik}(M) = 2 (2 C_j + M t_j) + \displaystyle\max_{x = i, k}(2 (L_{jx} + C_x) + M (\frac{1}{\beta_{jx}} + t_x)) & j \xlongleftrightarrow[0]{M} ik \\ T_{kij}(M) = 2 (2 C_k + M t_k) + \displaystyle\max_{x = i, j}(2 (L_{kx} + C_x) + M (\frac{1}{\beta_{kx}} + t_x)) & k \xlongleftrightarrow[0]{M} ij \\ \end{cases} $
    averages solutions:
    $ T_{ijk}(M) = 2 (2 C_i + M t_i) + \displaystyle\max_{x = j, k}(2 (L_{ix} + C_x) + M (\frac{1}{\beta_{ix}} + t_x)) = 2 C_i + M t_i + \displaystyle\max_{x = j, k}(T_{ix}(0) + T_{ix}(M)) / 2 $
    $ \begin{cases} t_i = (T_{ijk}(M) - \displaystyle\max_{x = j, k}(T_{ix}(0) + T_{ix}(M)) / 2 - 2 C_i) / M \\ t_j = (T_{jik}(M) - \displaystyle\max_{x = i, k}(T_{jx}(0) + T_{jx}(M)) / 2 - 2 C_j) / M \\ t_j = (T_{kij}(M) - \displaystyle\max_{x = i, j}(T_{kx}(0) + T_{kx}(M)) / 2 - 2 C_k) / M \\ \displaystyle\frac{1}{\beta_{ij}} = (T_{ij}(M) / 2 - C_i - L_{ij} - C_j) / M - t_i - t_j \\ \displaystyle\frac{1}{\beta_{jk}} = (T_{jk}(M) / 2 - C_j - L_{jk} - C_k) / M - t_j - t_k \\ \displaystyle\frac{1}{\beta_{ik}} = (T_{ik}(M) / 2 - C_i - L_{ik} - C_k) / M - t_i - t_k \\ \end{cases} $
    and checks confidence intervals.
    Note:
    R must be initialized by LMO_init_R at root beforehand.
    Parameters:
    comm communicator
    precision measurement precision
    msgset message set
    parallel several non-overlapped point-to-point communications at the same time if non-zero
    model LMO model (significant only at root)
double LMO_predict_p2p ( void *  _this,
int  i,
int  j,
int  M 
)

Predicts the execution time of point-to-point communication. The execution time of $ i \xlongrightarrow{M} j $ is equal to
$ C_i + L_{ij} + C_j + M (t_i + \displaystyle\frac{1}{\beta_{ij}} + t_j)$

Parameters:
_this LMO model
i index of the processor
j index of the processor
M message size
Returns:
predicted execution time
double LMO_predict_scatter_flat ( void *  _this,
int  root,
int  M 
)

Predicts the execution time of flat-tree scatter. $ (n - 1) (C_r + M t_r) + \displaystyle\max_{i = 1, i \ne r}^{n - 1} (L_{ri} + C_i + M (\frac{1}{\beta_{ri}} + t_i)) $

Parameters:
_this LMO model
root root processor
M message size
Returns:
predicted execution time
double LMO_predict_gather_flat ( void *  _this,
int  root,
int  M 
)

Predicts the execution time of flat-tree gather. $ (n - 1) (C_r + M t_r) + \begin{cases} \displaystyle\max_{i = 1}^{n - 1} (L_{ri} + C_i + M (\frac{1}{\beta_{ri}} + t_i)) & M < M_1 \\ \displaystyle\sum_{i = 1}^{n - 1} (L_{ri} + C_i + M (\frac{1}{\beta_{ri}} + t_i)) & M > M_2 \\ \end{cases} $

Parameters:
_this LMO model
root root processor
M message size
Returns:
predicted execution time
void HLMO_estimate ( MPI_Comm  comm,
MPIB_precision  precision,
MPIB_msgset  msgset,
LMO_model **  model 
)

Estimates the parameters of the homogeneous LMO model (n = 2). Requires at least 3 processes. Based on the homogeneous LMO prediction of flat scatter $ t(m, n) $. Uses benchmarks for m = msgset.max_size and msgset.min_size, and n = comm_size and 0.9 * comm_size.

See also:
tests/hmodel.c
double HLMO_predict_p2p ( void *  _this,
int  i,
int  j,
int  m 
)

Predicts the execution time of point-to-point communication, using the homogeneous LMO model

double HLMO_predict_scatter_flat ( void *  _this,
int  root,
int  m,
int  n 
)

Predicts the execution time of flat-tree scatter, using the homogeneous LMO model