
Batch processing is easily implemented on Unix with publicly
available software that helps you spread the workload among
several systems
By Steve Hanson
Batch processing is usually the farthest thing from a Unix
user's mind. People are usually drawn to Unix so they can avoid
the old batch environments where users would queue up large
processing jobs and come back two days later to see what had
happened. Particularly in the workstation world, systems are
often single-user machines on which batching may not make a lot
of sense at first glance. However, batch processing is becoming
more important in the Unix world than it has been in the
past.
In the current frenzy to downsize from mainframes to open
systems, users often want to regain some of the mainframe
features they left behind, including batch processing. Also, the
advent of inexpensive high-powered Unix machines makes them good
candidates for doing batch processing, particularly if the
batching can be distributed across a number of nodes in a
network. We'll look at some examples of batch processing on Unix
systems, concentrating on examples for a freely available Unix
batch system.
People with a stock Unix system often start up a batch job by
using the at or
batch commands. These commands allow a user to
run a batch job either immediately or in the future. The job
will run, and the results will be mailed back to the user. This
method is less than satisfactory for several reasons. For one
thing, it is often useful to queue up a whole bunch of jobs and
have them run consecutively automatically. This way no
individual job will overload the machine. Usually six jobs
running sequentially will finish faster than six jobs running at
the same time.
In modern environments where there may be many workstations in
a workgroup, it would be useful to queue the job up to the
workstation that is least busy--after all, if your office mate
has gone out for a long lunch, it would be a shame to have that
new zippy workstation just sit idle. Modern batching software
will automatically dole out jobs to the machines that are least
busy.
A true batching system also gives finer control over the batch
jobs than do the traditional Unix tools. Using a batch control
system, users can assign priorities to jobs, suspend them, kill
them, or control them in other ways. In some cases, the software
may even make a group of systems appear to be one powerful
virtual machine by taking a job and splitting it up among the
systems in the cluster.
Thus, the goals of a Unix batching system include some or all
of the following:
- Forcing jobs in a queue to run sequentially.
- Distributing jobs between systems and balancing system load,
as well as making fuller use of resources that may be idle.
- Giving users finer control over how their jobs run.
- Making jobs parallel across a cluster of machines.
A Sample Batching System
In this tutorial I will use the Distributed Queuing System
(DQS) as an example of a batching system. DQS is publicly
available and was developed at the Supercomputer Computations
Research Institute at Florida State University by Tom Green and
Jeff Snyder. It is a fairly simple batching system, but it is
good to use as an example. And it compiles easily on many
platforms, including IBM's (Armonk, N.Y.) AIX, Silicon Graphics
Inc.'s (Mountain View, Calif.) Irix, Sun's (Mountain View,
Calif.) SunOS, Convex Computer Corp.'s (Richardson, Texas)
ConvexOS, and Cray Research Inc.'s (Eagan, Minn.) Unicos. It
also supports batching in AFS, a distributed file system,
authenticating the jobs at run time.
DQS runs on one system in the cluster as a ``qmaster''
machine. It keeps track of the different queues on the various
systems and starts the jobs on remote machines. Typically each
of the worker systems will have one queue (or possibly more) to
which jobs are submitted. Within an individual queue, one job
will be executed at a time, and the other jobs will wait for
resources to become available. Normally each system will have
one queue, which will force serialization of the jobs. However,
a system that handles multiple jobs well--for example, a
multiprocessor system--might be given more than one queue so that
multiple jobs will execute simultaneously.
The system may be built so that jobs will be distributed among
queues more or less sequentially or so that any incoming jobs
will be assigned to the node that has the lowest load currently.
This system generally assumes that nodes may be used for
something other than batch processing--that is, that you may be
dealing with nodes that are sitting on someone's desk, and the
batching system is being used to ``soak up'' those excess CPU
cycles that may be going to waste.
Normally jobs are not submitted to an individual queue but to
a group, which will usually be a number of queues distributed
across machines of the same architecture. Jobs are submitted
with the qsub command, which specifies the name of
the command file and the queue the job is being submitted to.
The qsub command files are fairly normal Bourne
shell scripts, with one exception. They have Bourne shell
comments--those starting with ``#$''--that control the DQS
system. The arguments after the ``#$'' are actually options that
are fed to the qsub command and could just as well
have been typed on the command line.
Listing 1 shows a sample command
script for a job to be submitted to a batch group called ``sun'',
a cluster of Sun workstations. This batch job recompiles the DQS
batching system. It sends electronic mail at the beginning and
end of the job and dumps the standard output and standard error
output into a file named batchjob.out.
Listing 2A shows how to submit
the batch script depicted in Listing 1. DQS also includes
powerful commands for examining and manipulating queues. Once a
job has been submitted with qsub, it may be examined
with the qstat command, which recognizes several
options. Listing 2B shows how the
-l option is used with
qstat to give more detailed information about the
queue status. Note that two jobs were submitted to the ``sun''
group, and they were distributed automatically across both of the
machines in the group.
The qmod command may be used by the owner of a
queue--for example, the workstation owner--or a system
administrator to suspend and restart a queue. Listing 2C demonstrates these
functions.
Additionally, there is a qdel command, which may
be used to delete a DQS job from a queue. It can only be used by
the owner of the job, a system operator, or a DQS manager. So,
to delete a job, a user would usually first run
qstat to determine the request ID of the job he or
she wants to delete, then use qdel to terminate the
running job (Listing 2D).
The last command demonstrated in the listing is
qacct. There is an accounting system built into DQS
that tracks the system usage by job and by user. This facility
makes it easy to track what your systems are being used for. Listing 2E shows qacct -u
displaying the accounting history by user. The
qacct -j jobname option
provides a more detailed accounting record for a particular
job.
The qmon program allows the user to do most of
the features of all of the commands above while working in an X
Window System graphical-user-interface environment. This tool
uses Motif widgets and is nicely put together though the current
version of qmon doesn't support all of the DQS
features and is therefore currently lagging a little behind the
rest of the DQS system.
There is also a qidle program, which can be run
on a system that has a graphics display running X. This program
monitors the X display and will automatically suspend the the
queues on the machine when the X display is being used
interactively. This method will keep the batch jobs from slowing
down the responsiveness of X on the workstation but will allow
the excess CPU cycles to be used when the workstation is
idle.
Interactive Access to Queues
DQS includes a program called qsh, which allows
users to submit commands to a queue interactively. It is
somewhat analogous to the difference between using the shell
interactively and executing a shell script.
A queue has several options that may be configured in DQS.
These options will modify the behavior of the queue and limit the
resources that a job may use. The queues are configured with the
qconf command, which is also used to add and delete
hosts and users from the system configuration. Some of the
options to qconf are shown in Table 1. In general, the DQS stategy is
to define access to the cluster at two levels: access to the
cluster as a whole and access to individual queues in the
cluster. Each cluster or queue may have access set to one of the
following levels:
- Free
- No restrictions on the users.
- Open
- Access is permitted to the queue unless the access list for
the queue contains the user's account name.
- Restricted
- Access is denied unless the user's name is on the access
list.
This access permission system is fairly flexible and allows
policy to be set regarding which computing resources are
available to which users.
In addition to configuring queues, hosts, and users, the
editing features of qconf allow the administrator to
change the characteristics of the system. Several features of
the system may be configured either when the system is built or
changed dynamically by using qconf. These features
include the default shell, minimum user ID and group ID values
that are allowed to use the batching system, the number of jobs
that an individual user can have running at once, whether jobs
are re-runnable, and so forth. Several features of queues may
also be configured at run time or by qconf. Queues
may be defined as batch, interactive, or either. Only batches
that are defined as interactive may be used with the
qsh command. Overload levels may be defined for a
queue: If the load on the queue node exceeds the overload level,
the icon for the queue will turn red in the qmon
program display.
As was mentioned previously, systems may be configured to
assign jobs according to the system's load average. This feature
may also be tinkered with somewhat by assigning load multipliers
for each queue, so that some queues are more ``expensive'' than
others. Therefore, these queues will be less likely to run jobs.
This capability is a way of changing the priorities of different
queues.
Queues may also be assigned ``nice'' values. Adding niceness to
a process reduces its priority, so other processes will be more
likely to be scheduled first. Therefore, a system may be given
two different queues, one of which nices jobs down quite a lot
(useful for long-running jobs) and another that doesn't nice down
jobs (intended for quick, low-resource jobs).
So how do you keep your users from cheating? We all know
computer users--if there's a high-speed queue and a low-speed
queue, nobody will use the low-speed queue. After all, their
thinking goes, my job is more important than all those other
ones. Each queue may also be assigned limits on the amount of
CPU time, core file size, total memory usage, and disk output
each job may use. This way, you can set the limits low on queues
that are intended to be express queues and set the limits higher
on queues that are niced.
Other Batching Systems
There are several other systems for batching Unix jobs, some
commercially available, others free. Note that some of these are
simple batching systems or load-leveling systems, while others
are more ambitious and aim to divide an individual job in
parallel across a number of different systems, which may be the
same architecture or different architectures. NQS is somewhat
similar in flavor to DQS; it also supports the notion of using
the batch queues for other functions, such as printing queues.
Condor is a somewhat newer package that was written at the
University of Wisconsin and is freely available in source
form.
IBM Corporation has taken the Condor package and added
features to it, including usability in an AFS environment and
some more powerful batch job and queuing control mechanisms, and
called the package Load Leveler. It is available commercially
from IBM for RS/6000, Sun, and Silicon Graphics platforms. Load
Leveler is pretty nicely implemented and is one of the packages
we've been using at Fermilab for production batching. It has
somewhat more sophisticated queue controls than DQS, although DQS
has more features in the works.
Finally, Parallel Virtual Machine (PVM) is a more extended
package that allows an application developer to treat an extended
group of systems as if they were one very large virtual machine.
Unlike the other packages in this article, PVM requires source
modification of the C or Fortran program to call the PVM
libraries. It is particularly useful for classes of programs
that are easily split up into many parallel jobs, such as certain
kinds of data analysis. It may also be used as a back end for
the DQS package so that users may submit jobs through the normal
DQS mechanism but have them run on many machines in parallel.
Which package should you use? A lot will depend on a site's
feelings about publicly available software. The publicly
available software is somewhat more of an adventure but brings
with it the advantage of having source code available, either to
fix bugs or add features. The commercial packages have the
advantage of providing vendor support, although it's often the
case that the absence of support for publicly available packages
is better than the formal support of some commercial packages.
Take a look at some of the public packages, gather up some of the
information on the commercial packages, and decide what suits
your needs best. And happy batching!
Some Sources for Batching and Load-Leveling Systems
- DQS
- Available by anonymous FTP from
ftp.scri.fsu.edu in directory /pub/DQS.
- NQS
- The public domain version is available via anonymous FTP from
ftp.cosmic.uga.edu in /pub/software/ directory. This version
is described in the ``read me'' file as:
NQS - NETWORK QUEUING SYSTEM, VERSION 2.0
The Network Queuing System, NQS, is a versatile batch and device
queuing facility for a single Unix computer or a group of
networked computers. With the Unix operating system as a common
interface, the user can invoke the NQS collection of user-space
programs to move batch and device jobs freely around the
different computer hardware tied into the network. NQS provides
facilities for remote queuing, request routing, remote status,
queue status controls, batch request resource quota limits, and
remote output return.
NQS is written in C language and has been successfully
implemented on a variety of UNIX platforms, including Sun3 and
Sun4 series computers, SGI IRIS computers running IRIX 3.3, DEC
computers running ULTRIX 4.1, AMDAHL computers running UTS 1.3
It requires quite a lot of work to get going. More recent
releases are sold by Sterling Software (Palo Alto, Calif.) and
several different computer manufacturers.
- Condor
- Source available by anonymous FTP from
ftp.cs.wisc.edu in the /condor directory. This package is
difficult to build from source so binary versions of the latest
release (5.5.4b) are available for DEC Alpha (OSF/1 v1.0), Sun
sun4m (SunOS 4.1.3), and DEC MIPS (Ultrix 4.3b). Slightly older
binary versions are available for IBM R6000 (AIX3.2) and Sun
sun4c (SunOS 4.1.3).
- Load Leveler
- Available from IBM Corporation (Armonk, N.Y.) for several
different platforms, including RS/6000, Silicon Graphics Inc.
(Mountain View, Calif.), and Sun Microsystems Inc. (Mountain
View, Calif.).
- Load Balancer
- This program is more a load-balancing system than batching
system. It will let you balance log-in load across a number of
systems and do some other nice tricks. Freedman Sharp and
Associates Inc., Ste. 508, 1011 1st St. SW Calgary, Alberta
Canada T2R 1J2; 403-264-4822 or via e-mail at lb@fsa.ca.
- Qmaster
- GD Associates Ltd., 160 Bayview Dr., SW Calgary, Alberta Canada
T2V 3N8; 403-281-6923.
- Parallel Virtual Machine (PVM)
- This program is a somewhat more complex virtual machine
paralleling program. It can also be used in conjunction with DQS
as a front end. This software is available from the ``netlib'' mail
server at Oak Ridge National Laboratory. Send mail to
netlib@ornl.gov containing the message body:
send index from pvm3??
to receive a listing of the files on PVM version 3 that may be
obtained from the server.
|