AICT Home > Research Support Group > Linux Cluster


This page is for legacy information only, please Use Our New Pages at:

http://cluster.srv.ualberta.ca/doc/

If you have questions, or can not find the information you are looking for please feel free to contact us at research.support@ualberta.ca.


AICT General Purpose Linux Cluster

AICT operates and maintains a general purpose Linux cluster for research use by faculty, staff, and students at the University of Alberta. It offers a 64-bit environment with excellent performance for serial programs written in C/C++ and Fortran 77/90. MPI and OpenMP parallel programs are also supported. Users share the available resources by running jobs in batch mode.

The cluster is made up of one head node (login node) and 29 execution nodes, all running Linux. They are interconnected by standard Gigabit Ethernet (for NFS) as well as high speed Infiniband (for MPI). The head node contains two dual-core AMD Opteron 275 processors (four cores total) at 2.2GHz and 4GB of RAM installed. The execution nodes provide a total of 188 cores in the following configuration.

nodes processor cores memory
15 AMD Opteron 275 @2.2GHz 4† 6 GB
4 AMD Opteron 275 @2.2GHz 4† 10 GB
4 AMD Opteron 280 @2.4GHz 4† 32 GB
3 AMD Opteron 8350 @2.0GHz 16‡ 128 GB
3 AMD Opteron 8378 @2.4GHz 16‡ 128 GB
† two dual-core
‡ four quad-core

News
May 13, 2009
G95 Fortran compiler installed from binary distribution (2009-05-13).
 
May 13, 2009
Capacity of /scratch file system increased from 5 to 20TB.
 
February 9, 2009
Another three 16-core/128GB nodes added. Six older 10GB nodes removed.
 
February 2, 2009
NetCDF version 4.0, HDF5 version 1.8.1, and PGPLOT 5.2 installed. OpenFOAM upgraded to version 1.5.
 
December 18, 2008
OpenFOAM version 1.4.1 installed.
 
November 12, 2008
GNU Scientific Library (GSL) version 1.11 installed.
 
September 4, 2008
Three nodes added, each with 16 cores (four quad-core AMD Opterons) and 128GB memory.
 
August 1, 2008
FFTW (2.1.5 and 3.1.2) and Gromacs 3.3.3 installed.

Table of Contents


Getting Started

Access to the cluster is not automatic. Your professor, supervisor, or departmental APO must send an email to idadmin@ualberta.ca, Cc to research.support@ualberta.ca, requesting access on your behalf. For proper identification, the request should include your

Please allow two days for processing. Once you are authorized, you can use an ssh (secure shell) client and your CCID to connect to the cluster head node cluster.srv.ualberta.ca and begin developing programs and running jobs.

Microsoft Windows users please note that a basic Windows installation does not come with an ssh client program. Therefore, we recommend that you download and install PuTTY (terminal program) and WinSCP (file transfer program). Furthermore, if you plan to run an application on the head node that has a graphical user interface, like nedit or GaussView, you will also need to install an X display server program such as Xming-mesa. Mac OS X and Linux users can use OpenSSH command line tools and an X display server should already be running by default.

The remainder of this document outlines basic use of the cluster. It assumes you have prior experience working in a Unix environment. If you have any questions or encounter any problems, please feel free to contact us. Our email address is research.support@ualberta.ca.

To receive important notices regarding this cluster and other high performance computing facilities managed by AICT, subscribe to the hpc-researchers mailing list through our web interface.

top

File Systems

Your home directory is located on an NFS file system that has a total capacity of approximately 1TB. In addition, you are assigned a directory under /scratch. This file system has a total capacity of 20TB. Both home and scratch file systems are common to all the nodes in the cluster. However, because of its larger size, we recommend that you work from your scratch directory instead of your home directory.

Neither home nor scratch file systems are backed up. Therefore, we strongly advise you to store all of your important data off-cluster.

top

Shell Environment

Selecting from among several available software versions or enabling support for optional software usually requires modifications to your shell environment such as amending the PATH variable. This is done conveniently on the cluster by loading modules (see Environment Modules package version 3.2.6). A module file encapsulates the necessary changes, taking into account the differences in csh/tcsh and bash shell syntax. When a module is loaded, the changes take effect immediately, no need to log out then log back in again. Typical module commands include

module avail
See what modules are available for you to load.
 
module list
See what modules you have currently loaded.
 
module help modulename
Read the help text associated with modulename.
 
module display modulename
See how module modulename will modify your environment.
 
module load modulename
module unload modulename
Load (unload) module modulename to actually modify the environment of your current terminal session. Other terminal sessions you may have open are not effected. Moreover, changes to your environment are not permanent. They will not survive after you log out.

A default set of modules is automatically loaded each time you login. They support all the programming tools and many of the libraries currently installed on the cluster. As such, most users will not need to use any module commands. However, if you do want to change versions or run optional software, like Gaussian, you must explicitly load the appropriate module or modules. Details are provided in the relevant sections below.

top

Programming Tools

The cluster features Portland Group compilers (version 6.1), including pgf77, pgf90/pgf95, pgcc, and pgCC. Reference documentation is available through the man pages, for example, man pgf90, and through the compiler command line, as in pgf77 -help=opt. In addition, the Intel (version 9.1) ifort and icc/icpc and GNU (version 4.2.0) gcc/g++ and gfortran compilers are also available. Although version 10.0.023 of the Intel compilers is installed (module load intel/10.0), it is not yet supported on the cluster. Please refrain from using it in production.

In order to compile and run programs with the Intel C++ compiler icpc, you must execute module load intel/icpc to work around an incompatibility with GCC 4.2.0.

OpenMP support is enabled by using the -mp switch with the Portland compilers, -openmp with the Intel compilers, and -fopenmp with the GNU compilers. If your program is compiled and linked in separate stages, be sure to also include the OpenMP switch as one of the linker flags. See the specific compiler documentation for futher details.

For optimal performance, AMD recommends using the following two switches with the Portland compilers.

-tp k8-64 -fastsse

The -fastsse switch is an aggregate of generally beneficial optimizations (see pgcc -fastsse -help) that can potentially lead to some vectorized code. However, vectorized code may not improve performance in all cases. Therefore, it is worthwhile experimenting with

-tp k8-64 -Mscalarsse -fast

which omits vectorization but performs the other optimizations. Also, try adding -Mipa=fast to cause certain optimizations to span procedure boundaries (ipa stands for interprocedural analysis).

For the Intel compilers, try using

-axP -xW -ipo -O3

and for GNU,

-O3 -ffast-math -funroll-all-loops -fpeel-loops \
    -ftracer -funswitch-loops -funit-at-a-time

Finally, on the subject of performance, we recommend using optimized libraries wherever possible. For example, the AMD Core Math Library and the Intel Math Kernel Library are available on the cluster, and are described later in this document.

Interactive program development (including testing and debugging) is permitted only on the head node, and only for short periods of time.

top

Working with MPI

You have three MPI-2 implementations to choose from. The default is MPICH2 (version 1.0.4p1), which is built with the version 6.1 Portland compilers. It supports message passing over the Gigabit ethernet network. Two additional implementations support message passing over the much faster Infiniband network. They are OpenMPI, the successor to LAM/MPI, and MVAPICH2, which is derived from MPICH2 (includes MPE support). Both are compiled with the GNU (version 4.2) compilers. A faster network may be advantageous to programs that have a high ratio of communication to calculation or that overlap communication and calculation.

To use one of the alternate MPI implementations, load the appropriate module first. For OpenMPI, execute module load mpi/openmpi-1.2.5, or for MVAPICH2, module load mpi/mvapich2-1.2p1. Regardless of the implementation, the same command names are used. To compile your program, use mpicc, mpicxx, mpif77, or mpif90. You can apply any options that are supported by the underlying compiler.

If necessary, you can change the underlying compiler by setting an environment variable.

MPICH2 or MVAPICH2

OpenMPI

For example, to compile a program with the Portland Fortran 90 compiler under OpenMPI, set OMPI_F90 to pgf90 in your shell environment before invoking mpif90. Use this feature cautiously however, particularly with Fortran, as the replacement compiler and the compiler used to build the MPI library may implement the LOGICAL data type differently.

To test or debug your program, run it interactively on the head node with mpiexec as follows.

mpiexec -np numprocs ./program

This will spawn numprocs MPI tasks, all on the head node. Limit numprocs to a small number. Also, be considerate of other users by running the program for only short periods of time, much less than one hour.

top

Batch Jobs

Introduction

Once you have successfully tested and debugged your program on the head node, you can start running production jobs on the cluster. A job is simply a shell script that invokes the program you want to execute. For example, the following script describes a job to run the date command.

#!/bin/bash
date

In principle, you can use any supported scripting language to define a job, including Awk, Perl, and Python. Here is the "date job" as a Perl script.

#!/usr/bin/perl
print scalar(localtime), "\n";

To run a job on the cluster, you must submit the script for batch processing, thus

qsub scriptfile

This will return a unique job id, and your job will take its place in a queue that includes all jobs from all the other users. It will wait there until the necessary resources become available.

Use the qstat command to view the status of jobs in the queue. Also try qstatx. Job status will be either (R)unning or (Q)ueued. To remove a running or queued job that you previously submitted, use the qdel command with the numerical portion of the job id as the argument. Qsub, qstat, and qdel are components of the Portable Batch System (PBS). For more information on each PBS command, see the corresponding man page.

As an example, here is what to expect when you submit a job script called datejob.

$ qsub datejob
338904.opteron-cluster.nic.ualberta.ca
$ qstatx -a
Fri Oct 10, 2008 11:51:02
                    --Requested-------------- --Used (mb)-----------------------
Job Id   S Username Nodes    PVMem  Walltime  Mem    VMem   Walltime  Cputime
-------- - -------- -------- ------ --------- ------ ------ --------- ----------
.
.
.
338893   R jes6     1:ppn=4    10gb 168:00:00   2459   2512  21:20:20   83:48:20
338894   R jes6     1:ppn=4    10gb 168:00:00   3656   3715   2:36:54    9:47:59
338895   Q jes6     1:ppn=4    10gb  48:00:00                                   
338896   Q jes6     1:ppn=4    10gb 168:00:00                                   
338902   Q jes6     1:ppn=1     4gb  48:00:00                                   
338903   Q jes6     1:ppn=1     4gb  48:00:00                                   
338904   Q esumbar  1:ppn=1   512mb  24:00:00 
                    ------------------------------------------------------------
                    103 cpus used

top

Working Directory

By default, when your job starts running on an execution node, it does so from your home directory. Usually, this is not what you want. Therefore, you should always include a cd command in your script to change to the original working directory from which the qsub command was executed. PBS defines a shell environment variable called PBS_O_WORKDIR in each job for this purpose.

#!/bin/bash
cd $PBS_O_WORKDIR
date

top

Standard Output

Frequently, programs are designed to print progress information on the screen. However, when run on the cluster, your programs do not have access to a screen. Therefore, PBS captures such output in a pair of files, one for standard output and the other for standard error output.

Because the contents of these files is accumulated on the execution node, they are unavailable for viewing while your job is running. Eventually, when your job terminates, PBS copies them to the original working directory, naming them using the associated job id.

If this is unacceptable, you can explicitly redirect your program's output to your own set of files as follows.

#!/bin/bash
cd $PBS_O_WORKDIR
date > stdout.$PBS_JOBID 2> stderr.$PBS_JOBID

To ensure these files are updated frequently, you must flush the buffers after every printf or write statement within your program. You can then use a command such as tail -f outputfile to monitor their contents.

top

Resource Specifications

Contention among jobs for the finite resources on a given node can lead to poor performance (or even a node crash). To avoid this, you should accurately specify your job's needs so that PBS can reserve the necessary resources. This can be done when you submit a job with the -l flag.

qsub -l pvmem=2gb,nodes=1:ppn=1,walltime=48:00:00 myjob

This demonstrates how to specify a per-process virtual memory of 2GB and an elapsed time of 48hours. The construct nodes=1:ppn=1 is read as: one node multiplied by one processor per node equals one CPU required.

PBS treats your pvmem and walltime specifications as hard limits. In other words, jobs that exceed pvmem or walltime are summarily terminated.

Overestimating any of these parameters could potentially waste resources and cause jobs to be delayed, including yours, particularly when the cluster is busy. Therefore, specify the minimum requirements that will allow your program to run successfully.

If you omit a resource in the specification, a default value will be applied from the following list.

pvmem=512mb
nodes=1:ppn=1
walltime=24:00:00

The maximum resources available are

pvmem=126gb
nodes=32:ppn=4
nodes=3:ppn=16
walltime=168:00:00

Note that pvmemmax can only be satisfied on three nodes.

A convenient alternative to specifying resources on the command line is to introduce a PBS directive into your job script. A PBS directive is a shell script comment that begins with the string PBS. It must appear before the first executable statement. For example,

#!/bin/bash
#PBS -l pvmem=2gb,nodes=1:ppn=1,walltime=48:00:00
cd $PBS_O_WORKDIR
./prog

For the sake of clarity, the individual components of the resource specification can appear on separate lines.

#!/bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=48:00:00
cd $PBS_O_WORKDIR
./prog

When specifications are given both on the command line and in directives, the command line takes precedence.

top

Job Attributes

You can also assign values to special job attributes. For example, to give your job a descriptive name, specify the -N command line flag or insert a similar directive into the job script. The following sample illustrates this and other useful attributes.

#!/bin/bash
#PBS -N job1
#PBS -S /bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=48:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
./prog

Besides altering the display name to job1, this job is configured to use the recommended bash-shell login environment when executing the script (-S /bin/bash). In addition, an email notification will be sent when the job (b)egins and (e)nds or is (a)borted (-m bea) to the email address given in the -M directive.

top

Execution Environment

To reproduce your login environment in the body of the script, including the shell function that is necessary for the Modules package (see above), the script must be invoked with the -l (login) bash option. The option is applied on the first line of the script, as follows.

#!/bin/bash -l
#PBS -N job1
#PBS -S /bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=48:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
./prog

Without -l, the shell that is spawned to run the script would be non-login and non-interactive, and therefore, none of the shell startup files would be executed, which is where the module function is defined. Specifying #PBS -S /bin/bash to select the login shell is not enough because the script runs as a child of this process, and child processes do not inherit functions from their parent.

top

Parameterized Scripts

In the course of submitting a series of jobs, you may find that the script you use is basically the same from one run to the next. For example, you may be executing the same program with a different input file each time. Instead of creating multiple script files or continually modifying an existing script, you can format your script to accept formal parameters, like a subroutine. The parameters are implemented as shell variables within the script. These are assigned values at the time the job is submitted with the qsub command.

Here is an example script in which the program input is redirected from a file whose name is given by the shell variable INPUTFILE.

#!/bin/bash -l
#PBS -N jobx
#PBS -S /bin/bash
#PBS -l pvmem=1gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=96:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
test -n "$INPUTFILE" || exit
./prog < $INPUTFILE

Note that INPUTFILE is not set within the script. Use the -v option to assign it a value when submitting the job thus

qsub -v INPUTFILE=/scratch/esumbar/input1 scriptfile

(Technically, this adds INPUTFILE to the shell environment of the script.) As a safeguard, a test command is inserted to verify that the variable has been assigned a value. If the assignment is mistakenly omitted, the variable will be empty, and the script will exit immediately without wasting computational resources.

Multiple variables can be assigned as well.

#!/bin/bash -l
#PBS -N jobx
#PBS -S /bin/bash
#PBS -l pvmem=1gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=96:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
test -n "$IN" || exit
test -n "$OUT" || exit
./prog < $IN > $OUT
qsub -v IN=input1,OUT=output1 scriptfile

Finally, you can synchronize the name of the job with your variable assignments by specifying the job name on the qsub command line (overriding the PBS directive in the script).

qsub -N job1 -v IN=input1,OUT=output1 scriptfile

top

Using Local Disk

Ideally, the time your program spends performing calculations (cputime) should be only slightly less than its elapsed time (walltime). Using qstatx, you may observe a difference on the order of a few seconds for every hour of elapsed time. However, if your program performs frequent input/output operations, in other words, if it is I/O intensive, there may be a substantial difference. This is understandable as file access to and from a network file server is slow. Moreover, because the file server is a shared resource, I/O intensive jobs tend to adversely impact the I/O performance of every other job (and interactive users as well). The effect gets progressively worse with each additional I/O intensive job running on the cluster. I/O intensive jobs are not unwelcome on the cluster, but their impact can be mitigated by configuring them to use local disk space instead of the file server.

Each running job has access to a private temporary subdirectory on a disk that is local to the execution node. This subdirectory is available through the environment variable TMPDIR. The following script demonstrates one way to utilize TMPDIR.

#!/bin/bash -l
#PBS -N job1
#PBS -S /bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=48:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
cp ./inputfile $TMPDIR/inputfile
./prog < $TMPDIR/inputfile > $TMPDIR/outputfile
mv $TMPDIR/outputfile .

Normally, the original working directory is on the file server (/scratch or /home). So this script first copies the input file to TMPDIR. While the I/O intensive program is executing, output is redirected to TMPDIR. And finally, when the program terminates, the output is moved to the original working directory.

Be aware that the file system dedicated to this purpose is only 64GB and must be shared by all the jobs running on the node at the time. Furthermore, TMPDIR and its contents are summarily deleted when the job ends. Consequently, a job that is deleted by you or by PBS for exceeding walltime, will suffer data loss because the file transfer commands at the end of the script will not be executed when it is aborted.

Alternatively, you can have PBS automatically stage files out of TMPDIR at the end of a job, even if it is aborted. Data staging is requested in a PBS script with a special directive. For example, an equivalent version of the preceeding script that employs data staging is given by

#!/bin/bash -l
#PBS -N job1
#PBS -S /bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=48:00:00
#PBS -W stageout=$TMPDIR/output@localhost:$PBS_O_WORKDIR
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
cp ./inputfile $TMPDIR/inputfile
./prog < $TMPDIR/inputfile > $TMPDIR/outputfile

In the stageout directive, the item to the left of the "@" is regarded as the source of a move operation, and the item on the right, the destination. Besides PBS_O_WORKDIR, HOME can also be specified.

It is possible to stage out multiple files. However, the stageout directive becomes unwieldy in this case. Moreover, wildcards are not supported. Therefore, it may be more convenient to stage out TMPDIR in its entirety with

#PBS -W stageout=$TMPDIR@localhost:$PBS_O_WORKDIR

This will create a subdirectory under PBS_O_WORKDIR named after the job id. Because the name is unique, the potential for overwritting existing files is avoided.

Both Gaussian and GAMESS are automatically configured to use TMPDIR for temporary data.

top

Reading Environment Variables

Getting the value of an environment variable within a PBS script is as simple as using the shell's parameter expansion character $, as in $TMPDIR. Because the shell environment is propagated to the running program, the program can retrieve the value of an environment variable as well by using the standard C library function getenv(3).

A trivial example of its use in C is

#include <stdio.h>
#include <stdlib.h>

int
main (int argc, char *argv[])
{
  printf("%s\n", getenv("PATH"));
  return 0;
}

and in Fortran

program gettmpdir
  character(len=255) :: tmpdir
  call getenv("TMPDIR", tmpdir)
  write(*,*) trim(tmpdir)
end program gettmpdir

This works for gfortran, pgf90, and ifort. The currently installed versions of gfortran and ifort also support the Fortran 2003 intrinsic procedure get_environment_variable.

top

Sample Scripts

Serial

Here is a minimal script for submitting a serial job to the cluster. Except for walltime, default resource limits are assumed.

#!/bin/bash -l
#PBS -S /bin/bash
#PBS -l walltime=72:00:00
cd $PBS_O_WORKDIR
./prog

The following example includes many of the features described above.

#!/bin/bash -l
#PBS -N job1
#PBS -S /bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=72:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
./prog > stdout.$PBS_JOBID 2> stderr.$PBS_JOBID

top

MPI

MPI jobs run on only those nodes and processors that PBS assigns. In the following example, the nodes specification is for eight CPUs. These will be allocated from any nodes with free CPUs, including duplicates. In addition, each MPI task (process) will have 750MB of virtual memory available.

#!/bin/bash -l
#PBS -N jobx
#PBS -S /bin/bash
#PBS -l pvmem=750mb
#PBS -l nodes=8
#PBS -l walltime=168:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
mpiexec ./a.out > output

To request one CPU on each of eight different nodes, specify nodes=8:ppn=1, or two CPUs on each of four different nodes, nodes=4:ppn=2. Note that it is not necessary to pass the number of tasks on the mpiexec command line.

This example assumes you are using the default MPI implementation, MPICH2. If your program is compiled against one of the alternate MPI implementations, you must load the appropriate module in the script. For OpenMPI, add module load mpi/openmpi-1.2.5, or for MVAPICH2, module load mpi/mvapich2-1.2p1, just before the mpiexec command.

#!/bin/bash -l
#PBS -N jobx
#PBS -S /bin/bash
#PBS -l pvmem=750mb
#PBS -l nodes=8
#PBS -l walltime=168:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
module load mpi/openmpi-1.2.5
mpiexec ./a.out > output

MPI job scripts are submitted with qsub in the usual way. Be aware that MPI parallel jobs may wait longer in the queue than serial jobs because multiple CPUs must become available simultaneously. Use of the ppn option will tend to increase the wait time even more.

top

OpenMP

OpenMP programs comprise multiple threads of execution running in the context of a single memory address space. This characterises Pthreads and multi-threaded Java programs as well. Therefore, an OpenMP job can only run on a single node, which is specified with nodes=1 in your scripts. Furthermore, you must reserve an adequate number of processors (nprocs) on the node to execute the desired number of threads (2 through 16). This is done either by appending a ppn request to the nodes specification or by reserving a whole node with a special PBS directive. The choice depends on nprocs and pvmem. For OpenMP, pvmem is estimated as the data in shared memory plus the sum, over all threads, of private data. Use the following tables to select the recommended directives. Note the use of node attributes omp and ompx on certain nodes specifications.

  nprocs == 2 nprocs == 4
pvmem < 32gb -l nodes=1:ppn=2 -l nodes=1:omp
-W x=NACCESSPOLICY:SINGLEJOB

or

-l nodes=1:ppn=4
pvmem > 32gb -l nodes=1:ppn=2 -l nodes=1
-W x=NACCESSPOLICY:SINGLEJOB

  nprocs > 4
pvmem < 32gb nprocs*pvmem < 128gb -l nodes=1:ppn=nprocs
nprocs*pvmem > 128gb -l nodes=1:ompx
-W x=NACCESSPOLICY:SINGLEJOB
pvmem > 32gb -l nodes=1
-W x=NACCESSPOLICY:SINGLEJOB

Here is an example script in which the desired nprocs is 4 and the estimated pvmem is much less than 32gb. This is expected to be the most common case.

#!/bin/bash -l
#PBS -N jobz
#PBS -S /bin/bash
#PBS -l pvmem=7800mb
#PBS -l nodes=1:omp
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -l walltime=36:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=4
./a.out > output

As with MPI jobs, OpenMP jobs may wait longer in the queue, in this case, because the processors on a single node must become available simultaneously.

top

Libraries

AMD Core Math Library

Version 3.0.1 of the AMD Core Math Library (ACML) comes with the Portland compilers and is available in serial and thread parallel (mp) variants. It supplies BLAS, LAPACK, FFT, and random number generator routines that have been optimized for the 64-bit Opteron architecture. See the documentation for details.

Because this version of the library is part of the Portland compilers distribution, it is very simple to use. Contrary to Section 2.2.3 Accessing the Library under Linux using PGI compilers pgf77/pgf90/pgcc, all you need to do is reference the library. For example, to link the serial library

pgf77 source.f -lacml
pgcc source.c -lacml -lpgftnrtl -lm

To link the parallel library

pgf77 -mp source.f -lacml_mp
pgcc -mp source.c -lacml_mp -lpgftnrtl -lm

By default, this procedure links static library code into your program. To link the library as a dynamically shared object, the easiest approach is to add the -fpic flag to the compiler command.

ACML version 4.0.0 is also installed as a separate product. It offers 64-bit integer (int64) versions of the standard serial and parallel libraries (for code compiled with the -i8 flag, for example) as well as the fast math and fast vector (mv) library. The previous version of ACML, 3.5.0, is still available and can be used by first loading the module, module load acml/3.5.

To link a program statically, use one of the following templates.

pgf77 [-mp] [-i8] source.f
    -L$ACML_PATH/variant/lib \
    -Bstatic -lacml -Bdynamic

pgcc [-mp] source.c -I$ACML_PATH/variant/include \
    -L$ACML_PATH/variant/lib \
    -Bstatic -lacml -Bdynamic -lpgftnrtl -lm

where ACML_PATH is defined as part of your shell environment, and variant is one of

The -mp compiler flag is used only with the mp versions of the library. Templates for using the mv library are similar, just replace -lacml with -lacml_mv. Of course, the two libraries can be used together by including both references.

Linking dynamically to ACML version 4.0.0 is somewhat more complicated.

pgf77 [-mp] [-i8] source.f \
    -L$ACML_PATH/variant/lib -L$PGI/linux86-64/6.1/libso \
    -lacml -lacml_mv \
    -Wl,-rpath,$ACML_PATH/variant/lib -Wl,-rpath,$PGI_LIB_PATH

pgcc [-mp] source.c -I$ACML_PATH/variant/include \
    -L$ACML_PATH/variant/lib -L$PGI/linux86-64/6.1/libso \
    -lacml -lacml_mv -lpgftnrtl -lm \
    -Wl,-rpath,$ACML_PATH/variant/lib -Wl,-rpath,$PGI_LIB_PATH

Use of the -Wl,-rpath flags has the effect of embedding additional search paths for dynamic libraries into the executable file. This precludes having to set LD_LIBRARY_PATH whenever you run the program.

top

Intel Math Kernel Library

Version 8.1 of the Intel Math Kernel Library (MKL) is installed. It provides BLAS, LAPACK, FFT, direct and iterative sparse matrix solvers, vectorized math and statistical functions (including random number generators), and an interval arithmetic package. In addition, wrapper functions for FFTW compatibility are available, and can be compiled upon request. The BLAS, LAPACK, direct sparse solver, and FFT funtions are threaded. Refer to the documentation for details.

To link the static libraries, use the following template for either Fortran or C code.

ifort source.f -L$MKLPATH [-lmkl_solver] \
    -lmkl_lapack -lmkl_em64t \
    -lguide -lpthread -Wl,-rpath,$INTEL_LIB_PATH

You should include -lmkl_solver only when using the sparse solvers. To link the dynamic libraries, use this template.

ifort source.f -L$MKLPATH [-lmkl_solver] \
    -lmkl_lapack32 | -lmkl_lapack64 \
    -lmkl [-lvml] -lguide -lpthread \
    -Wl,-rpath,$INTEL_LIB_PATH -Wl,-rpath,$MKLPATH

Choose only one of the LAPACK library variants, -lmkl_lapack32 or -lmkl_lapack64. They represent single and double precision versions, respectively. Furthermore, the vector/statistical math functions are located in a separate dynamic library -lvml. Include it only when necessary.

Note that the sparse solver library is always linked statically, while -lguide and -lpthread (parallel thread support) are always linked dynamically. Regarding the use of the -Wl,-rpath flag, see the last paragraph under the ACML topic for an explanation.

Although version 9.1 of the Intel MKL is installed (module load mkl/9.1 or mkl/9.1_ilp64 or mkl/9.1_serial), it is not yet supported on the cluster. Please refrain from using it in production.

top

FFTW

FFTW version 3.1.2 and 2.1.5 are installed (built using GCC 4.2.0). Both single (float) and double precision libraries are available in each version. Thread-parallel transforms are not included, nor are the MPI transforms (available only in 2.1.5).

FFTW 3.1.2 is found in the directory given by the environment variable FFTW3, while FFTW 2.1.5 is given by FFTW2.

$FFTW3/include contains

$FFTW3/lib contains

$FFTW2/include contains

$FFTW2/lib contains

top

GNU Scientific Library

The path to the GNU Scientific Library (GSL) version 1.11 is given by the environment variable GSLROOT, which is set by executing module load gsl. The header and library search paths should be given as -I$GSLROOT/include and -L$GSLROOT/lib, respectively. For example,

gcc -o app -I$GSLROOT/include -L$GSLROOT/lib \
    source.c -lgsl -lgslcblas -lm

module load gsl also sets LD_LIBRARY_PATH. Therefore, you must load the module in your batch scripts and when running the program interactively.

top

HDF5

HDF5 version 1.8.1 has been installed with Fortran support. Prepare to use the library by executing module load hdf5. This gives you access to the compiler scripts h5cc and h5fc. If you must work with the library files directly, the header and library search paths can be specified as -I$HDF5ROOT/include and -L$HDF5ROOT/lib, respectively (you will also need -L$ZLIBROOT/lib -lz). To run any program that is linked against the HDF5 library, whether interactively or in a batch script, you must execute module load hdf5 first in order to update LD_LIBRARY_PATH.

top

NetCDF

The root of the NetCDF version 4.0 installation is given by the environment variable NETCDFROOT, which is set by executing module load netcdf. This automatically loads the hdf5 module, just in case you want to use NetCFD-4/HDF5 features.

The header and library search paths should be given as follows.

gcc -o app -I$NETCDFROOT/include -L$NETCDFROOT/lib source.c -lnetcdf

If using HDF5 features,

gcc -o app -I$NETCDFROOT/include -L$NETCDFROOT/lib \
    -L$HDF5ROOT/lib -L$ZLIBROOT/lib \
    source.c -lnetcdf -lhdf5_hl -lhdf5 -lz

In either case, you must execute module load netcdf in your batch scripts and when running interactively to set LD_LIBRARY_PATH appropriately.

top

PGPLOT

PGPLOT version 5.2 (with PostScript driver support only) is installed. Execute module load pgplot to configure your shell environment. This will set the PGPLOT_DIR variable appropriately. To make linking more convenient, PGPLOTROOT is set to point to the installation directory. For example, to link against the static library, use the following template.

gcc -o app source.c $PGPLOTROOT/lib/libpgplot.a

To link against the shared library, try

gcc -o app -L$PGPLOTROOT/lib source.c -lpgplot

Remember to execute module load pgplot in your batch scripts and when running interactively in order to set LD_LIBRARY_PATH (when necessary) and PGPLOT_DIR.

top

Software Packages

Gaussian 03

Due to licensing restrictions, access to Gaussian 03 software is controlled. If you want to use Gaussian, please send your request by email to research.support@ualberta.ca. Include your CCID and contact information for your professor or departmental administrator. We will add you to the list of authorized Gaussian users and notify you by return email.

The version of Gaussian currently installed on the cluster is E01. To run it interactively on the head node (for a short period of time), first load the appropriate module with module load gaussian. Then, run the program with g03 < input > output, where input and output are your input and output files, respectively.

For batch processing, a typical serial Gaussian job script follows. Note the module command.

#!/bin/bash -l
#PBS -N gaussjob
#PBS -S /bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=36:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
module load gaussian
g03 < input > output

If necessary, an older version of Gaussian, C02, is still available for interactive use and in job scripts by loading its module with module load gaussian/gaussianC02.

Best performance is achieved when the Gaussian scratch directory is located on a file system that is local to the node on which the job is running. PBS makes such a directory available to each job through the environment variable TMPDIR. Accordingly, the gaussian module assigns the value of TMPDIR to GAUSS_SCRDIR in the script environment.

Be aware however that the contents of each job's TMPDIR directory is automatically deleted when the job ends or is terminated. Consequently, to prevent the loss of checkpoint files that are written to the Gaussian scratch directory by default, you will need to include a %Chk command in your input file. Giving it a simple file name, without a path specification, will save it in PBS_O_WORKDIR (the directory from which the job was submitted). Unfortunately, performance may suffer as a result. To avoid this, locate the checkpoint file in TMPDIR instead and have it staged out at the end of the job. Accordingly, your input file should contain a line similar to

%Chk=$TMPDIR/mycheckpointfile.chk

and the job script should be modified to include one additional PBS directive and two commands, thus

#!/bin/bash -l
#PBS -N gaussjob
#PBS -S /bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=36:00:00
#PBS -W stageout=$TMPDIR/mycheckpointfile.chk@localhost:$PBS_O_WORKDIR
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
module load gaussian
cp mycheckpointfile.chk $TMPDIR
sed -i -e 's|$TMPDIR|'$TMPDIR'|' input
g03 < input > output

Before Gaussian is launched, the cp command will stagein an existing checkpoint file to TMPDIR. Then, the sed command will substitute the actual value of TMPDIR into the input file. At the end of the job, the checkpoint file will be staged out of TMPDIR to PBS_O_WORKDIR, as specified in the stageout directive.

Gaussian can use up to 16 processors for shared memory parallel execution on the cluster. Configure %NProcShared in your input file accordingly. (Distributed memory parallel execution, using Linda directives, is not supported.) An example parallel job script follows.

#!/bin/bash -l
#PBS -N pargaussjob
#PBS -S /bin/bash
#PBS -l pvmem=2gb
#PBS -l nodes=1:omp
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -l walltime=36:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
module load gaussian
g03 < input > output

This script is appropriate when using four processors. See the OpenMP sample script section above for other possibilities.

top

GaussView

GaussView can be used interactively on the head node to prepare input files for Gaussian jobs and to analyze the output files from Gaussian jobs. However, it should not be used to run Gaussian directly.

Because GaussView on the cluster has a graphical user interface, you must be running an X display server on your desktop computer (see Getting Started above). Furthermore, when connecting to the cluster via ssh, you must enable X11 forwarding or tunneling. With OpenSSH, add the -X or -Y switch to the command line (PuTTY and SSH Secure Shell have configuration options).

ssh -Y esumbar@cluster.srv.ualberta.ca

Once connected, load the appropriate module with module load gaussview, then run the program by executing gv. Note that this is the 32-bit version of GaussView. It is unrelated to the 64-bit version of Gaussian.

top

GAMESS

GAMESS was built from the source code distribution labeled "Mar. 24 2007 R3 for 64 bit Opteron/EMT64 under Linux with gnu compilers," using GNU gfortran and the Intel Math Kernel Library (version 8.1). You can run GAMESS in either serial mode or in parallel (sockets). Documentation is provided in six files.

  1. INTRO.txt
  2. INPUT.txt (input file reference)
  3. TESTS.txt (the installed GAMESS passes all tests according to the checktst script)
  4. REFS.txt
  5. PROG.txt (contains information for estimating memory requirements)
  6. IRON.txt

Use the following template when submitting serial jobs. Make sure to adjust the memory specification to match the needs of your calculation.

#!/bin/bash -l
#PBS -N jobx
#PBS -S /bin/bash
#PBS -l pvmem=1gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=36:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
module load gamess
rungms inputfile &> jobx.log

The input file should have an inp file extension, as in inputfile.inp. Standard output and standard error output will accumulate in jobx.log. Meanwhile, dat output files will be saved in a directory called scr. The scr directory will be created in PBS_O_WORKDIR (the directory from which the job was submitted) if it does not already exist.

You may want to run GAMESS in parallel to achieve faster execution times or, more likely, to handle problems that require large amounts of memory, more than is available on a single node. GAMESS deals with the memory issue by distributing the load. Not all types of calculations can be run in parallel however. TESTS.txt identifies some of these. The following example illustrates how to configure a parallel job. It requests a total of 40GB, 5GB distributed across 8 separate nodes (ppn=1). The input file must include the corresponding memory specifications.

#!/bin/bash -l
#PBS -N parjobx
#PBS -S /bin/bash
#PBS -l pvmem=5gb
#PBS -l nodes=8:ppn=1
#PBS -l walltime=36:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
module load gamess
ncpus=$(wc -l $PBS_NODEFILE | awk '{print $1}')
rungms inputfile 00 $ncpus &> parjobx.log

GAMESS accurately reports memory usage and CPU times in the output file for either serial or parallel jobs. For serial jobs, these will match closely with those reported by PBS. However, in the case of parallel jobs that span multiple nodes, PBS will report values that are too low.

top

Gromacs

Version 3.3.3 of Gromacs is installed. Execute module load gromacs to get access to all of the single and double precision command line tools. The names of the double precision tools are the same as the single precision tools, but with a _d appended. In addition, MPI parallel versions of the mdrun program called mdrun_mpi and mdrun_d_mpi are also available (incidentally, they are linked against OpenMPI 1.2.5).

A typical PBS script for running Gromacs jobs follows.

#!/bin/bash -l
#PBS -N jobx
#PBS -S /bin/bash
#PBS -l pvmem=1gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=72:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
module load gromacs
mdrun -s input.tpr

For parallel runs, use the following template.

#!/bin/bash -l
#PBS -N parjobx
#PBS -S /bin/bash
#PBS -l pvmem=750mb
#PBS -l nodes=4
#PBS -l walltime=168:00:00
#PBS -m bea
#PBS -M esumbar@ualberta.ca
cd $PBS_O_WORKDIR
module load gromacs
ncpus=$(wc -l $PBS_NODEFILE | awk '{print $1}')
mpiexec mdrun_mpi -s in.tpr -np $ncpus

Loading the Gromacs module automatically loads the appropriate MPI module. There is no need to load an MPI module separately. The example illustrates how to distribute the work over four processors on the cluster, not necessarily all on the same node (#PBS -l nodes=4).

top

OpenFOAM

To set up your environment to run OpenFOAM applications (version 1.5), execute module load openfoam followed by source $OPENFOAM_SETUP, as instructed. To run a paralel OpenFOAM job, use the following script as a template.

#!/bin/bash -l
#PBS -N job
#PBS -S /bin/bash
#PBS -l pvmem=6gb
#PBS -l nodes=8
#PBS -l walltime=12:00:00
cd $PBS_O_WORKDIR
module load openfoam
source $OPENFOAM_SETUP
mpiexec -d icoFoam -case jobdir -parallel >jobdir/log 2>jobdir/log.err </dev/null

If necessary, OpenFOAM version 1.4.1 is still available by executing module load openfoam/openfoam-1.4.1. Note that the command line for version 1.4.1 is different (replace icoFoam -case jobdir with icoFoam . jobdir in the sample script).

top


$Id: index.html 298 2009-05-14 19:47:47Z esumbar $


© 2009 University of Alberta