Difference between revisions of "UTK multicores + GPU"

From HCL
Jump to: navigation, search
Line 31: Line 31:
  
 
** example of an appfile for building functional permanence model (FPM):
 
** example of an appfile for building functional permanence model (FPM):
   # GPU #
+
   # GPU
 
   # e.g. Linking against cublas, and fupermod is configured under cublas_config
 
   # e.g. Linking against cublas, and fupermod is configured under cublas_config
 
   # suboption g=0 means device 0 is selected for computing
 
   # suboption g=0 means device 0 is selected for computing
 
   -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_col.so -o k=640,g=0 -U10000 -s10
 
   -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_col.so -o k=640,g=0 -U10000 -s10
   # CPU #
+
   # CPU
 
   # e.g. Linking against acml, and fupermod is configured under acml_config
 
   # e.g. Linking against acml, and fupermod is configured under acml_config
 
   -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_col.so -o k=640 -U10000 -s10
 
   -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_col.so -o k=640 -U10000 -s10
Line 41: Line 41:
 
* Data partitioning
 
* Data partitioning
  
** matrix size D = N x N, N = sqrt(D), and machinefile lists the nodes participating in the computing
+
** matrix size D = N x N, and machinefile lists the nodes participating in the computing
 
   
 
   
 
   $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_col.so -D10000 -o N=100 -m machinefile
 
   $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_col.so -D10000 -o N=100 -m machinefile
Line 50: Line 50:
  
 
** example of an appfile for matrix multiplication
 
** example of an appfile for matrix multiplication
   # GPU #
+
   # GPU
 
   # Assuming fupermod is configured under cublas_config, linking against cublas
 
   # Assuming fupermod is configured under cublas_config, linking against cublas
 
   # -g0 means device 0 is selected for computing
 
   # -g0 means device 0 is selected for computing
 
   -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_col -k640 -g0 -m machinefile
 
   -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_col -k640 -g0 -m machinefile
   # CPU #
+
   # CPU
 
   # Assuming fupermod is configured under acml_config, linking against acml
 
   # Assuming fupermod is configured under acml_config, linking against acml
 
   -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_col -k640 -m machinefile
 
   -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_col -k640 -m machinefile

Revision as of 11:56, 12 July 2012

List of machines

http://icl.cs.utk.edu/iclhelp/custom/index.html?lid=97&slid=180

Display a list of available GPUs

$ nvidia-smi -L

Using Fupermod on hybrid multicore/GPUs node

  • Compiling : Create two separate directories for configuration with selected CPU cblas (e.g. gsl, acml, mkl)and GPU cblas (e.g. cublas).
    • For example: Using acml blas for CPU and cublas for GPU computing
cd fupermod/
mkdir acml_config 
cd acml_config
./configure --with-cblas=acml
mkdir cuda_config 
cd cuda_config
./configure --with-cblas=cuda
  • Building performance model:
    • rankfile is for processing binding, and appfile tells mpi what programs to launch
 $ mpirun -rf rankfile -app appfile_fpm
    • example of a rankfile:
 rank 0=ig.icl.utk.edu slot=0:0
 rank 1=ig.icl.utk.edu slot=0:1
 ...
    • example of an appfile for building functional permanence model (FPM):
 # GPU
 # e.g. Linking against cublas, and fupermod is configured under cublas_config
 # suboption g=0 means device 0 is selected for computing
 -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_col.so -o k=640,g=0 -U10000 -s10
 # CPU
 # e.g. Linking against acml, and fupermod is configured under acml_config
 -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_col.so -o k=640 -U10000 -s10
  • Data partitioning
    • matrix size D = N x N, and machinefile lists the nodes participating in the computing
 $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_col.so -D10000 -o N=100 -m machinefile
  • Running matrix multiplication
 $ mpirun -rf rankfile -app appfile_mxm
    • example of an appfile for matrix multiplication
 # GPU
 # Assuming fupermod is configured under cublas_config, linking against cublas
 # -g0 means device 0 is selected for computing
 -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_col -k640 -g0 -m machinefile
 # CPU
 # Assuming fupermod is configured under acml_config, linking against acml
 -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_col -k640 -m machinefile