 |
Home
> Research
Support Group > WestGrid
> Local Usage Policies
-
The Origin machines are expected to be used to run large parallel jobs
which cannot be run elsewhere.
-
For Serial jobs and for jobs requiring distributed memory, WestGrid has
a large Compaq
Alpha cluster in Calgary,
and a large Linux cluster at the University of British Columbia for its
members. Check the WestGrid
web site for more information.
-
While the SGI Origins can be used for parallel work, WestGrid is
encouraging all of its members to use the Compaq machines for any
serial, i.e. single processor, problems that members have to run.
-
In addition AICT maintains machines specifically for scalar numerical
work, the numerical
server, as well as a Linux
cluster which provides a
parallel environment. Please contact Research.Support@ualberta.ca
if you wish to pursue the move of your jobs to these other machines.
-
Nexus, an 8
processor SGI Origin model 350:
-
To be used for interactive and batch processing.
-
Interactive jobs:
-
must not use more than two CPU's, and
-
must not use more than 60 minutes of walltime.
-
Short interactive jobs are intended to allow users to compile and test
their programs. This is to reserve most of the machine for large batch
jobs. Note that interactive usage is monitored and clients who
repeatedly exceed 60 minutes of walltime may have their processes
killed.
-
Arcturus, Aurora,
Borealis, Australis, Corona and Helios:
-
There is no interactive use on these machines.
-
All processors are reserved for batch processing.
-
Only parallel jobs using 64 or more cpus can be submitted to Arcturus.
-
Jobs that require 64GB of memory or more can also be submitted to
Arcturus.
-
Programs to be run on
Arcturus may be evaluated by Research Support to ensure efficient use
of the machine.
-
The Portable
Batch System (OpenPBS) and
the Maui Scheduler are used to manage all non-interactive jobs running
on Nexus and all jobs running on the other UofA WestGrid machines.
-
PBS keeps track of jobs submitted and runs them in order of their
priority.
-
All clients are expected to submit their jobs to PBS/Maui rather than
running them directly, except for limited interactive use permitted on
Nexus.
-
Jobs will be run using a "fair share" priority system, which
calculates priority based on the amount of recent CPU usage by that
user/group. The priority formula depends on the amount of recent usage.
-
For Jobs that require a "large" amount of memory (i.e., more
that just a few hundred MBytes), please request enough CPUs to cover
your memory usage. We have 1 GB per CPU on Arcturus. Defaults for
memory and time are set to encourage inclusion of these resource
requests; however users that have very high memory usage and low CPU
count may need to request more than 64 CPU's.
-
The maximum walltime (physical elapsed time) for jobs on Arcturus,
Aurora, Borealis, Australis, Corona and Helios; is 24 hours.
-
Please use checkpointing (saving the current state of your program
before a restart) to avoid losing the results of your calculations
during a restart.
-
The SGI Origins share disk space mounted on an SGI TP9400 and an SGI
TP9500 disk array.
-
Clients must be aware that as with all file system and back up
facilities, problems can still occur. Users should still insure that
their critical files are archived in a separate location.
If you have concerns about this policy, please contact us at: Research.Support@ualberta.ca.
The research support team is available to assist users in porting code
and ensuring that their programs are making efficient use of the
parallel architecture of the SGI Origins.
Revised: Jan. 14, 2005
|
|