Difference between revisions of "UTK multicores + GPU"

From HCL
Jump to: navigation, search
 
(40 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== List of machines ==
 
== List of machines ==
  http://icl.cs.utk.edu/iclhelp/custom/index.html?lid=97&slid=180
+
http://icl.cs.utk.edu/iclhelp/custom/index.html?lid=97&slid=180
  
== Getting the info of GPUs on a node ==
+
== Display a list of available GPUs ==
   nvidia-smi -L
+
$ nvidia-smi -L
 +
 
 +
== Using Fupermod on hybrid multicore/GPUs node ==
 +
* Compiling : Create two separate directories for configuration with selected CPU cblas (e.g. gsl, acml, mkl) and GPU cblas (e.g. cublas).
 +
 
 +
- For example: configuring with [http://developer.amd.com/libraries/acml/pages/default.aspx acml] for CPU and [http://developer.nvidia.com/cublas cublas] for GPU
 +
 
 +
$ cd fupermod/
 +
 
 +
$ mkdir acml_config
 +
$ cd acml_config
 +
$ ../configure --with-blas=acml
 +
$ make
 +
 
 +
$ mkdir cuda_config
 +
$ cd cuda_config
 +
$ ../configure --with-blas=cuda
 +
$ make
 +
 
 +
* Building performance model:
 +
 
 +
- Rankfile is for [http://www.open-mpi.org/doc/v1.6/man1/mpirun.1.php#sect8 process binding], and appfile tells mpirun what programs to launch
 +
  $ mpirun -rf rankfile -app appfile_fpm
 +
 
 +
 
 +
- Example of a rankfile:
 +
  rank 0=ig.icl.utk.edu slot=0:0
 +
  rank 1=ig.icl.utk.edu slot=0:1
 +
  ...
 +
 
 +
- Example of an appfile for building functional permanence model (appfile_fpm):
 +
  # GPU
 +
  # e.g. Linking against cublas, and fupermod is configured under cublas_config
 +
  # suboption g=0 means device 0 is selected for computing
 +
  -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_1d.so -o k=640,g=0 -U10000 -s10
 +
  # CPU
 +
  # e.g. Linking against acml, and fupermod is configured under acml_config
 +
  -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_1d.so -o k=640 -U10000 -s10
 +
 
 +
* Data partitioning
 +
 
 +
- Matrix size D = N x N, and machinefile lists the nodes participating in the computing
 +
 +
   $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_1d.so -D10000 -o N=100 -m machinefile
 +
 
 +
* Running matrix multiplication
 +
 
 +
  $ mpirun -rf rankfile -app appfile_mxm
 +
 
 +
- Example of an appfile for matrix multiplication (appfile_mxm)
 +
  # GPU
 +
  # Assuming fupermod is configured under cublas_config, linking against cublas
 +
  # -g0 means device 0 is selected for computing
 +
  -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_2d -k640 -g0 -m machinefile
 +
  # CPU
 +
  # Assuming fupermod is configured under acml_config, linking against acml
 +
  -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_2d -k640 -m machinefile

Latest revision as of 22:32, 22 August 2012

List of machines

http://icl.cs.utk.edu/iclhelp/custom/index.html?lid=97&slid=180

Display a list of available GPUs

$ nvidia-smi -L

Using Fupermod on hybrid multicore/GPUs node

  • Compiling : Create two separate directories for configuration with selected CPU cblas (e.g. gsl, acml, mkl) and GPU cblas (e.g. cublas).

- For example: configuring with acml for CPU and cublas for GPU

$ cd fupermod/
$ mkdir acml_config 
$ cd acml_config
$ ../configure --with-blas=acml
$ make
$ mkdir cuda_config 
$ cd cuda_config
$ ../configure --with-blas=cuda
$ make
  • Building performance model:

- Rankfile is for process binding, and appfile tells mpirun what programs to launch

 $ mpirun -rf rankfile -app appfile_fpm


- Example of a rankfile:

 rank 0=ig.icl.utk.edu slot=0:0
 rank 1=ig.icl.utk.edu slot=0:1
 ...

- Example of an appfile for building functional permanence model (appfile_fpm):

 # GPU
 # e.g. Linking against cublas, and fupermod is configured under cublas_config
 # suboption g=0 means device 0 is selected for computing
 -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_1d.so -o k=640,g=0 -U10000 -s10
 # CPU
 # e.g. Linking against acml, and fupermod is configured under acml_config
 -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_1d.so -o k=640 -U10000 -s10
  • Data partitioning

- Matrix size D = N x N, and machinefile lists the nodes participating in the computing

 $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_1d.so -D10000 -o N=100 -m machinefile
  • Running matrix multiplication
 $ mpirun -rf rankfile -app appfile_mxm

- Example of an appfile for matrix multiplication (appfile_mxm)

 # GPU
 # Assuming fupermod is configured under cublas_config, linking against cublas
 # -g0 means device 0 is selected for computing
 -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_2d -k640 -g0 -m machinefile
 # CPU
 # Assuming fupermod is configured under acml_config, linking against acml
 -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_2d -k640 -m machinefile