Difference between revisions of "UTK multicores + GPU"
From HCL
					
										
					
					| Zhongziming (talk | contribs) | Zhongziming (talk | contribs)  | ||
| (11 intermediate revisions by 2 users not shown) | |||
| Line 6: | Line 6: | ||
| == Using Fupermod on hybrid multicore/GPUs node == | == Using Fupermod on hybrid multicore/GPUs node == | ||
| − | * Compiling : Create two separate directories for configuration with selected CPU cblas (e.g. gsl, acml, mkl)and GPU cblas (e.g. cublas).   | + | * Compiling : Create two separate directories for configuration with selected CPU cblas (e.g. gsl, acml, mkl) and GPU cblas (e.g. cublas).   | 
| − | - For example:  | + | - For example: configuring with [http://developer.amd.com/libraries/acml/pages/default.aspx acml] for CPU and [http://developer.nvidia.com/cublas cublas] for GPU | 
| − |   cd fupermod/ | + |   $ cd fupermod/ | 
| − |   mkdir acml_config   | + |   $ mkdir acml_config   | 
| − |   cd acml_config | + |   $ cd acml_config | 
| − |   ./configure --with- | + |   $ ../configure --with-blas=acml | 
| + |  $ make | ||
| − |   mkdir cuda_config   | + |   $ mkdir cuda_config   | 
| − |   cd cuda_config | + |   $ cd cuda_config | 
| − |   ./configure --with- | + |   $ ../configure --with-blas=cuda | 
| + |  $ make | ||
| * Building performance model: | * Building performance model: | ||
| − | - Rankfile is for  | + | - Rankfile is for [http://www.open-mpi.org/doc/v1.6/man1/mpirun.1.php#sect8 process binding], and appfile tells mpirun what programs to launch | 
|    $ mpirun -rf rankfile -app appfile_fpm |    $ mpirun -rf rankfile -app appfile_fpm | ||
| − | -  | + | |
| + | - Example of a rankfile: | ||
|    rank 0=ig.icl.utk.edu slot=0:0 |    rank 0=ig.icl.utk.edu slot=0:0 | ||
|    rank 1=ig.icl.utk.edu slot=0:1 |    rank 1=ig.icl.utk.edu slot=0:1 | ||
|    ... |    ... | ||
| − | -  | + | - Example of an appfile for building functional permanence model (appfile_fpm): | 
|    # GPU |    # GPU | ||
|    # e.g. Linking against cublas, and fupermod is configured under cublas_config |    # e.g. Linking against cublas, and fupermod is configured under cublas_config | ||
|    # suboption g=0 means device 0 is selected for computing |    # suboption g=0 means device 0 is selected for computing | ||
| − |    -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/ | + |    -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_1d.so -o k=640,g=0 -U10000 -s10 | 
|    # CPU |    # CPU | ||
|    # e.g. Linking against acml, and fupermod is configured under acml_config |    # e.g. Linking against acml, and fupermod is configured under acml_config | ||
| − |    -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/ | + |    -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_1d.so -o k=640 -U10000 -s10 | 
| * Data partitioning | * Data partitioning | ||
| Line 43: | Line 46: | ||
| - Matrix size D = N x N, and machinefile lists the nodes participating in the computing | - Matrix size D = N x N, and machinefile lists the nodes participating in the computing | ||
| − |    $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/ | + |    $ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_1d.so -D10000 -o N=100 -m machinefile | 
| * Running matrix multiplication | * Running matrix multiplication | ||
| Line 49: | Line 52: | ||
|    $ mpirun -rf rankfile -app appfile_mxm |    $ mpirun -rf rankfile -app appfile_mxm | ||
| − | -  | + | - Example of an appfile for matrix multiplication (appfile_mxm) | 
|    # GPU |    # GPU | ||
|    # Assuming fupermod is configured under cublas_config, linking against cublas |    # Assuming fupermod is configured under cublas_config, linking against cublas | ||
|    # -g0 means device 0 is selected for computing |    # -g0 means device 0 is selected for computing | ||
| − |    -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/ | + |    -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_2d -k640 -g0 -m machinefile | 
|    # CPU |    # CPU | ||
|    # Assuming fupermod is configured under acml_config, linking against acml |    # Assuming fupermod is configured under acml_config, linking against acml | ||
| − |    -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/ | + |    -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_2d -k640 -m machinefile | 
Latest revision as of 23:32, 22 August 2012
List of machines
http://icl.cs.utk.edu/iclhelp/custom/index.html?lid=97&slid=180
Display a list of available GPUs
$ nvidia-smi -L
Using Fupermod on hybrid multicore/GPUs node
- Compiling : Create two separate directories for configuration with selected CPU cblas (e.g. gsl, acml, mkl) and GPU cblas (e.g. cublas).
- For example: configuring with acml for CPU and cublas for GPU
$ cd fupermod/
$ mkdir acml_config $ cd acml_config $ ../configure --with-blas=acml $ make
$ mkdir cuda_config $ cd cuda_config $ ../configure --with-blas=cuda $ make
- Building performance model:
- Rankfile is for process binding, and appfile tells mpirun what programs to launch
$ mpirun -rf rankfile -app appfile_fpm
- Example of a rankfile:
rank 0=ig.icl.utk.edu slot=0:0 rank 1=ig.icl.utk.edu slot=0:1 ...
- Example of an appfile for building functional permanence model (appfile_fpm):
# GPU # e.g. Linking against cublas, and fupermod is configured under cublas_config # suboption g=0 means device 0 is selected for computing -host localhost -np 1 $HOME/fupermod/cublas_config/tools/builder -l $HOME/fupermod/cublas_config/routines/mxm/.libs/libmxm_1d.so -o k=640,g=0 -U10000 -s10 # CPU # e.g. Linking against acml, and fupermod is configured under acml_config -host localhost -np 47 $HOME/fupermod/acml_config/tools/builder -l $HOME/fupermod/acml_config/routines/mxm/.libs/libmxm_1d.so -o k=640 -U10000 -s10
- Data partitioning
- Matrix size D = N x N, and machinefile lists the nodes participating in the computing
$ fupermod/tools/partitioner -l fupermod/routines/mxm/.libs/libmxm_1d.so -D10000 -o N=100 -m machinefile
- Running matrix multiplication
$ mpirun -rf rankfile -app appfile_mxm
- Example of an appfile for matrix multiplication (appfile_mxm)
# GPU # Assuming fupermod is configured under cublas_config, linking against cublas # -g0 means device 0 is selected for computing -host localhost -np 1 $HOME/fupermod/cublas_config/routines/mxm/mxm_2d -k640 -g0 -m machinefile # CPU # Assuming fupermod is configured under acml_config, linking against acml -host localhost -np 47 $HOME/fupermod/acml_config/routines/mxm/mxm_2d -k640 -m machinefile
