Difference between revisions of "UTK multicores + GPU"

From HCL
Jump to: navigation, search
(Using Fupermod on hybrid node)
 
(37 intermediate revisions by 2 users not shown)
Line 5: Line 5:
 
$ nvidia-smi -L
 
$ nvidia-smi -L
  
== Using Fupermod on hybrid node ==
+
== Using Fupermod on hybrid multicore/GPUs node ==
*Compiling
+
* Compiling : Create two separate directories for configuration with selected CPU cblas (e.g. gsl, acml, mkl) and GPU cblas (e.g. cublas).
Currently user need to compile the code for CPU and GPU seperately
+
 
 +
- For example: configuring with [http://developer.amd.com/libraries/acml/pages/default.aspx acml] for CPU and [http://developer.nvidia.com/cublas cublas] for GPU
 +
 
 +
$ cd fupermod/
 +
 
 +
$ mkdir acml_config
 +
$ cd acml_config
 +
$ ../configure --with-blas=acml
 +
$ make
 +
 
 +
$ mkdir cuda_config
 +
$ cd cuda_config
 +
$ ../configure --with-blas=cuda
 +
$ make
 +
 
 +
* Building performance model:
 +
 
 +
- Rankfile is for [http://www.open-mpi.org/doc/v1.6/man1/mpirun.1.php#sect8 process binding], and appfile tells mpirun what programs to launch
 +
  $ mpirun -rf rankfile -app appfile_fpm
 +
 
 +
 
 +
- Example of a rankfile:
 +
  rank 0=ig.icl.utk.edu slot=0:0
 +
  rank 1=ig.icl.utk.edu slot=0:1
 +
  ...
 +
 
 +
- Example of an appfile for building functional permanence model (appfile_fpm):
 +
  # GPU
 +
  # e.g. Linking against cublas, and fupermod is configured under cublas_config
 +
  # suboption g=0 means device 0 is selected for computing
 +
  -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_1d.so -o k=640,g=0 -U10000 -s10
 +
  # CPU
 +
  # e.g. Linking against acml, and fupermod is configured under acml_config
 +
  -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_1d.so -o k=640 -U10000 -s10
 +
 
 +
* Data partitioning
 +
 
 +
- Matrix size D = N x N, and machinefile lists the nodes participating in the computing
 +
 +
  $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_1d.so -D10000 -o N=100 -m machinefile
 +
 
 +
* Running matrix multiplication
 +
 
 +
  $ mpirun -rf rankfile -app appfile_mxm
 +
 
 +
- Example of an appfile for matrix multiplication (appfile_mxm)
 +
  # GPU
 +
  # Assuming fupermod is configured under cublas_config, linking against cublas
 +
  # -g0 means device 0 is selected for computing
 +
  -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_2d -k640 -g0 -m machinefile
 +
  # CPU
 +
  # Assuming fupermod is configured under acml_config, linking against acml
 +
  -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_2d -k640 -m machinefile

Latest revision as of 23:32, 22 August 2012

List of machines

http://icl.cs.utk.edu/iclhelp/custom/index.html?lid=97&slid=180

Display a list of available GPUs

$ nvidia-smi -L

Using Fupermod on hybrid multicore/GPUs node

  • Compiling : Create two separate directories for configuration with selected CPU cblas (e.g. gsl, acml, mkl) and GPU cblas (e.g. cublas).

- For example: configuring with acml for CPU and cublas for GPU

$ cd fupermod/
$ mkdir acml_config 
$ cd acml_config
$ ../configure --with-blas=acml
$ make
$ mkdir cuda_config 
$ cd cuda_config
$ ../configure --with-blas=cuda
$ make
  • Building performance model:

- Rankfile is for process binding, and appfile tells mpirun what programs to launch

 $ mpirun -rf rankfile -app appfile_fpm


- Example of a rankfile:

 rank 0=ig.icl.utk.edu slot=0:0
 rank 1=ig.icl.utk.edu slot=0:1
 ...

- Example of an appfile for building functional permanence model (appfile_fpm):

 # GPU
 # e.g. Linking against cublas, and fupermod is configured under cublas_config
 # suboption g=0 means device 0 is selected for computing
 -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_1d.so -o k=640,g=0 -U10000 -s10
 # CPU
 # e.g. Linking against acml, and fupermod is configured under acml_config
 -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_1d.so -o k=640 -U10000 -s10
  • Data partitioning

- Matrix size D = N x N, and machinefile lists the nodes participating in the computing

 $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_1d.so -D10000 -o N=100 -m machinefile
  • Running matrix multiplication
 $ mpirun -rf rankfile -app appfile_mxm

- Example of an appfile for matrix multiplication (appfile_mxm)

 # GPU
 # Assuming fupermod is configured under cublas_config, linking against cublas
 # -g0 means device 0 is selected for computing
 -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_2d -k640 -g0 -m machinefile
 # CPU
 # Assuming fupermod is configured under acml_config, linking against acml
 -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_2d -k640 -m machinefile