The cluster machine name is Pulpo. Pulpo is the spanish word for octopus. The machine names are pulpo, brazo1, brazo2, brazo3,...,brazo8 or in other words there is the master node and 8 arms. You don't have to use these names; you can also refer to the machines as master, node2, node3, node4, node5, node6, node7, node8, node9.
Each of these nodes is an Alpha Processor Incorporporated UP2000 system with 667 Mhz 64 bit Alpha processor 4MB cache, 21264 generation machine with 512 MBytes of RAM. The parallel interconnect is via Myrinet. The Operating System is Redhat 7.1 Alpha linux. The Message Passing Interface is MPICH, but the driver for the myrinet cards is provided by Myricom, so we actually run the Myricom port of MPICH over their driver or MPI over GM. To 'build' parallel code, you make system calls to the mpich libraries.
www.myri.com
www.beowulf.org
www.netlib.org (this site is currently down, we'll see if they come back) (netlib.org/scalapack, /blas, /atlas were the links I intended)
To obtain an account on the cluster, send mail to help@math.unm.edu
.
The cluster cannot be logged into directly.
First log into wulfgang.math.unm.edu and then exec ssh master or ssh node9. The general working model is the first 8 nodes, master, node2, ... node8 are for parallel operation and node9 is for compiling your code, reading man pages, and generally getting a parallel job/program prepared for execution.
Once a job is ready to run, you can log into the master node and run the job with the mpirun command.
Such as
mpirun -np a.out
for example;
where np is the number of processors the job is to 'run on' and a.out is the filename of the program you have compiled.
In order for this
to work, you must have the right setup in your .gmpi/conf file.
from /usr/local/src/mpich/mpich-1.2..8/README-GM
More examples
of building and running:
MPI programs can be compiled with
mpicc *.[oc] -o exec
or
mpif77 *.[oc] -o exec
They can be launched with
mpirun -np <num_node> exec
In case of a problem with the mpirun script, try instead:
mpirun.ch_gm -np <num_node> exec
To use a different config file just use the "-f" parameter to mpirun.ch_gm.
mpirun.ch_gm -np <num_nodes> -f <new_conf_file> program
=================================================================
There is also a doc directory in this mpich directory.
Some additional libraries you may need to use/locate are
/usr/lib/libscalapack.a
/usr/lib/petsc/lib/libO/linux_alpha/lib*
/usr/lib/libblas*
The compaq optimized blas is in /usr/lib/compaq/cxml-5.0.0/libcxml_ev6.so.
All or most of these for linear algebra matrix manipulation. Lapack is also included in the compaq libraries.
The gnu serial compilers are gcc and g77, and the optimized compaq serial compilers are ccc and the compaq optimized fortran 90 compiler is fort.
http://www.support.compaq.com/alpha-tools/software/index.html
Another interesting directory is /usr/local/src/mpich/current/examples/
Another way, and
perhaps someday the only way, to run programs/jobs on the cluster is with the
batch scheduling software PBS Pro. The
idea behind this software is to allow persons to submit jobs to a job queue
and when various scheduling rules determine when the time (order)
is appropriate and the resources you have requested are available, then pbs
will schedule and run your jobs. This software can be configured
in very complicated setups to handle large and varried user bases. Our user
base is quite small at this time, so PBS is configured very simply with only
a single working queue.
If you are on the master node or node9, then the following should serve as a simple example....
at the command line run
qsub /usr/local/output test2.job
and the contents of test2.job are
==================================
#!/bin/sh
#PBS -l nodes=8:ncpus=1
#PBS -o master:/usr/local/output/myoutput
#PBS -e master:/usr/local/output/myjoberror
mpirun -np 8 -machinefile $PBS_NODEFILE /usr/local/src/mpich/current/examples/basic/cpi
==================================
and you would run qstat to check on the progress of the job.
==================================
master:~ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
41.wulfgang test2.job sulsky 00:00:00 E workq
42.wulfgang test2.job vageli 0 Q workq
==================================
You can also submit
jobs from wulfgang, but the situation changes a little bit. Now the script test2.job
must exist on your filesystem on wulfgang. You can execute the exact same test2.job
but the script test2.job must be 'on' wulfgang. Notice that the path to my output
files
in the pbs script are in the cluster filesystem; i.e. /usr/local/output does
not exist on wulfgang. So the references to programs and files that you
make in your pbs job script are to/for the cluster file system. There is also
a graphical user interface to pbspro. If you ssh to wulfgang from your unix
workstation, the display should automatically be set to be forwarded (over the
encrypted ssh channel) back to your machine and you can run xpbs. This will
open a graphical window in which you can submit jobs to the queue on wulfgang.
You can also track and monitor your jobs status in the queue.
For further information on PBS Pro use, consult the PBS
Pro User Guide, also another interesting
link is supercluster.org.