|
Running Jobs
The "Run SnB" screen allows you to start an
SnB job on the local computer or submit it to a batch processing
system, such as PBS or LoadLeveler, if one is available. It also supports
submission to Condor, a system that scavenges unused computing time
on a network of workstations (for more information on Condor, see http://www.cs.wisc.edu/condor/ ). These
options provide you with convenient ways to take maximum advantage of
the inherently parallel nature of the Shake-and-Bake algorithm
by dividing the trial structures among as many processors as possible.
Thus, jobs can be run in several parts with each subjob creating its
own set of output files. The results, however, are combined for inspection
using the tools provided by the Evaluate Trials screen.
We (the SnB developers) have a limited number
of platforms available for development and testing. Your system configuration
may differ from ours, and the batch submission options may not work
as expected. In that case, please contact us at snbhelp@hwi.buffalo.edu so that we
can work with you to support your configuration.
There are three sections on this screen: Required
Information, Local Options, and Batch Options. The
required information must be supplied. Whether or not the other sections
need to be completed depends on the choices you make in the required
information section.
- Required Information
- Queueing System: Select the queueing system you would
like to use.
None (local machine) runs the job on the machine where
the GUI is running. If you are using X-Windows, note that this
is not necessarily the same as the machine where the GUI is
being displayed.
PBS will submit the job to a PBS queue. The 'qsub' program
must be installed and configured on your local machine, even
if PBS is actually submitting jobs to a remote machine.
Loadleveler submits a job to a LoadLeveler queue on an
IBM SP system.
Condor allows submission to a Condor flock.
Clicking Custom generates the dat files required to run
SnB without actually starting the job. This is useful
if you want to run SnB via a batch queueing system that
is not supported directly by SnB. Given the dat files,
you can write a script that will submit the job to the batch
queueing system that you are using at your site.
- File name prefix for results: All
files that are generated by the SnB run will start with
the prefix entered here. Appended to this prefix will be an underscore
and a number ranging from zero to one less than the number of
SnB processes you request (see the next variable). Do NOT
use an underscore in the prefix name itself (hyphens are OK).
- Number of SnB processes to run: If the local
run method is selected, the GUI will initiate this many processes
on the local machine. If you select one of the batch methods (PBS,
LoadLeveler, Condor), this variable indicates the
number of nodes to be requested from the batch queueing system.
- Local Options
- Priority: Used to choose the "nice" value
at the time of job submission. If you are sharing a machine
and wish to run a background job, choose "low" priority.
- Process jobs: When you have finished
filling in all the required fields, click this button to begin
processing the job.
- Batch Options
- Queue: Select the queue for PBS and LoadLeveler jobs.
Condor does not support different queues.
- Copy input files to remote machine(s): Select "yes"
if you want to copy all input files to the machine where the
job will be run. When SnB is finished, it will copy the
output files back to the working directory on the local machine.
Copying the files does not really improve overall performance
since the only significant amount of I/O occurs at the start
of the job. However, it is recommended that you transfer input
files to remote cluster machines since these machines typically
have low disk and network I/O performance. Thus, their network
and disk subsystems could become overloaded when starting a
job.
- Remote directory: The directory for staging files.
You need to supply this information only if you selected "yes"
for "copy input files to remote machine." If your
batch environment provides a temporary directory name in an
environment variable, you can enter that here.
- Queue type: Your choices are serial, parallel (shared
memory), and parallel (cluster). For example, suppose you entered
"8" for the number of SnB processes to run
(in the required information section). Choosing serial would
cause eight single-processor jobs to be submitted to the queue
that you selected. Both parallel selections will submit a single
eight-processor job. The difference between the two is that
the parallel shared memory option will use cp
to stage files whereas the parallel cluster option uses
rcp (a shared file system is not assumed). When running
LoadLeveler jobs, you are not prompted for this item.
Shared memory machines include the SGI Origin2000, Sun
Enterprise 10000, and any other machine that has multiple processors
in the same physical unit. On these machines you should select
parallel shared memory as the queue type.
Cluster machines include the IBM SP and Beowulf-style
clusters. Clusters consist of two or more distinct computers
that are coupled together via software. For these machines you
should select parallel cluster as the queue type.
Serial can be chosen for either shared memory or cluster
computers. Whether you choose serial or one of the parallel
options is a matter of preference. One serial job will start
up when a single processor is free. On the other hand, a parallel
job that requires n processors will have to wait till
n processors are free. Your computing site will also
have limits on how many jobs you can have running as well as
how many processors you can allocate for a parallel job. These
limits will also influence which option you should choose. If
you are unsure, you should contact the administrator of the
machine you are using.
- Tasks per node (LoadLeveler only): The number of tasks
to start on each SP node. If you are utilizing SMP nodes, you
can set this number to the number of processors in each node.
Then, the total number of processors that your job will use
is equal to (tasks per node)*(number of nodes).
- Number of nodes (LoadLeveler only): The number of
nodes to allocate for the job.
- Process jobs: When you have finished filling in all
the required fields, click this button to submit your job to
the batch system that you have selected.
|