Research Group Header


Usage Policies for the MACI SGI Origin Machines.

NOTE: As of 20 January, 2004
Local MACI and WestGrid environments will be merged:
(Reference: MACIWG.MergMain.2004Jan15.html )

January 23, 2004: This page is in the process of becoming an archive/history page.
Up-to-date information is at:
http://www.ualberta.ca/CNS/RESEARCH/WestGrid/index.html

This page describes the usage rules, scheduling policies and priority calculations for our current SGI Origin computers: Aurora, Borealis and Australis.

(For information regarding the scheduling policy for the University of Calgary's MACI cluster of Compaq Alphas see: www.maci-cluster.ucalgary.ca )

Please contact Research.Support@ualberta.ca with questions or comments related to the SGI Origin machines.


Contents:


Available Machines

There are currently three SGI Origins in operation at the University of Alberta. Aurora, an Origin 2000, is the only Origin on which we permit limited interactive use: the limitations are described below. The other Origins are Borealis (an Origin 2400) and Australis (an Origin 3800). These machines are reserved for batch processing only.

Back to the top of the page


Disk Usage and Backups

The three SGI Origins share disk space mounted on an SGI TP9400 disk array. The space is mounted as /scratch and users can create their own sub directory in the scratch space. As with all scratch space we do not guarantee to backup files in this space.

Until now, CNS has been using /scratch in an ongoing test of the capabilities of the TSM backup facility. As a result, most files in this space have been backed up on a regular basis. Users must be aware, however, that no assurance is given that any file located in /scratch will be backed up. All critical files should therefore be copied or moved to a more secure area to ensure an appropriate archive and/or backup.

Back to the top of the page


Current Scheduling Policy

The Portable Batch System (PBS) is used to manage all non-interactive jobs running on Aurora, Borealis and Australis. PBS keeps track of jobs submitted and runs them in order according to their priority. All users are, therefore, expected to submit their jobs to PBS rather than running them directly, except for limited interactive use permitted on Aurora as described below.

Interactive use:

Interactive jobs:

  1. are only allowed on Aurora, and
  2. must not use more than two CPU's, and
  3. must not use more than 60 minutes of walltime.

Short interactive jobs are intended to allow users to compile and test their programs. This is to reserve most of the machine for large batch jobs. There is no interactive use on Borealis or Australis. Note that interactive usage is monitored and users who repeatedly exceed 60 minutes of walltime may have their processes killed.

Back to the top of the page

Serial use:

The Origin machines are expected to be used to run jobs which cannot be run elsewhere. MACI has a large Compaq Alpha cluster in Calgary for its members. These machines are connected with Gigabit ethernet and Myrinet networks. While the SGI Origins can be used for parallel work, MACI is encouraging all of its members to use the Compaq machines for any serial, i.e. single processor, problems that members have to run. In addition CNS maintains machines specifically for scalar numerical work, the numerical server, as well as a PC cluster which provides a parallel environment. Please contact Research.Support@ualberta.ca if you wish to pursue the move of your jobs to these other machines.

Back to the top of the page

Walltime and Checkpointing:

The maximum walltime (physical elapsed time) for jobs on Aurora and Australis is 24 hours, while the maximum walltime for jobs on Borealis is 12 hours. More precisely, at certain times of the day, all running jobs will be stopped. These times are currently 11:45 on Aurora and Australis, and 11:45 and 23:45 on Borealis. These restarts ensure that jobs in the queue will have a chance to start in a reasonable period and keep large parallel jobs from being shut out. Please use checkpointing (saving the current state of your program before a restart) to avoid losing the results of your calculation during a restart and wasting CPU cycles.

Back to the top of the page

Parallel Usage:

  • There is only one queue for each machine, the queues "aurora", "borealis", and "australis". When submitting a job through PBS these queues point to the machine you would like to use.

  • Jobs will be run using the PBS "fair share" priority system , which calculates priority based on the number of MACI shares which the user/group has and upon the amount of recent CPU usage by that user/group. The PBS priority formula depends on the amount of recent usage; look here for details.

  • Please request enough CPUs to cover your memory usage. We have 256 MB/cpu on Aurora and Borealis and 512 MB/cpu on Australis. For example, on Australis, using 32 GB of memory requires that 64 processors be requested. Defaults for memory and time are set to encourage inclusion of these resource requests.

  • Aurora (46 195 MHz CPUs - 12 GB or 0.25 GB/cpu)
    • all processors except 2 are reserved for batch processing
    • scalar and parallel jobs up to 44 cpus can be submitted to Aurora
    • restarts daily at 11:45


  • Borealis (64 400 MHz CPUs - 16 GB or 0.25 GB/cpu)
    • jobs must use a minimum of 8 processors
    • restarts daily at 11:45 and 23:45


  • Australis (64 400 MHz CPUs (fast interconnect) - 32 GB or 0.5 GB/cpu)
    • jobs must use a minimum of sixteen processors
    • because of the fast CPU interconnect and the large RAM Australis is reserved for large scalable parallel jobs and/or for jobs requiring a large amount of memory.
    • restarts daily at 11:45

The research support team will be happy to assist users in porting code and ensuring that their programs are making efficient use of the parallel architecture of the SGI Origins. Programs to be run on Australis may be evaluated by Research Support to ensure efficient use of the machine.

Updated: November, 2001.

Back to the top of the page


What's New

Nothing new at this time.

Updated: November 7, 2001.

Back to the top of the page


Previous Scheduling Policies

Previous Scheduling Policies archive file for Aurora/Borealis (contains the Policy reports back to April of '98 when Aurora first arrived).

Updated: November 21, 2000.

Back to the top of the page


University of Alberta
Computing and Network Services
E-mail: Research.Support@ualberta.ca
Web Site: www.ualberta.ca/CNS/RESEARCH