- I would like to ask for the account on XYZ.
- I would like to make a request for an account for all the clusters.
Please send this kind of request to: cluster-admin@democritos.it
specifing either:
- sector (CM, AP, SBP, ...);
- username of your SISSA main cluster account and/or sector server account (e.g: on shannon.sissa.it and on cm-srv.sissa.it).
- which cluster you need to use (read the documentation below).
- name of your supervisor (for postdoc/students/guests/etc...)
Here you can find some documentation about local clusters:
READ IT and subscribe to the cluster users' mailing list:
- Houston we have a problem....
Please, write to cluster-admin@democritos.it
E-mail concerning cluster problems sent to personal addresses will be ignored (i.e. Moreno/Marco/...).
- I wrote to cluster-admin@..., but I received a message that says: "Your message is being held until the list moderator can review it for approval"
It's ok, your message will be deferred until one of us accepts your post (which should happen as soon as possible).
- [briareo/hokule] I cannot find a file/directory on the scratch filesystem, what's happen?
- [briareo/hokule] I get the message "Stale NFS file handle" when I try to access the /scratch filesystem. What does it mean?
The gpfs filesystem could be down or unmounted,
contact cluster-admin@democritos.it.
- [briareo/hokule] There's an hanging job queued. What can I do?
The service PBS (pbs_mom daemon) is probably stopped on the remote node.
Each hour an automatic script checks about the daemons status.
Just wait or contact cluster-admin@democritos.it.
- I tried to link my application with MKL libraries but I found that routine DGxxxx is not present. What should I do ?
Remember that MKL libaries contains just a subset of the full lapack
routines. So please use MKL and LAPACK libraries toghether in this way:
|
ifc myprog.f -L/usr/local/intel/mkl/lib/32 -L/usr/local/lib/ -lifclapack_std ...
|
|
g77 myprog.f -L/usr/local/intel/mkl/lib/32 -L/usr/local/lib/ -lgnulapack_std ...
|
where std stands for standard: i.e. the standard lapack package
downloadable from www.netlib.org.
- [mulo/somaro] May I use MKL libraries with openMosix cluster ?
Sure why not ?
Please note however that MKL library is threaded library and
therefore application linked against these libraries CAN NOT migrate on
free (or less busy CPUs).
If you want to make them migratable just tell your application to use
one single thread by setting the OMP_NUM_THREADS enviroment variable.
In this case your application will be migrated without any problem.
To define the OMP_NUM_THREADS enviroment variable:
| for [t]csh:
setenv OMP_NUM_THREADS 1
|
| for [ba]sh:
export OMP_NUM_THREADS=1
|
- [mulo/somaro] I get the "cp: skipping file `foo', as it was replaced while being copied" message when trying to use cp under mfs.
This is a known bug of cp while working on oMFS (or oMFS bug in some *stat function
handler implementation).
As simple work-around we have placed a safe cp command as /bin/mfscp and we have aliased it for the users.
If you still get the above error it means you have aliased the cp command in your rc files,
to avoid the problem just use /bin/mfscp in your cp aliases (use same trick on your scripts or makefiles
when they work on /mfs).
- [mulo/somaro] I want to run a huge-output job, which filesystem can I use?
If your job is I/O intensive (frequently I/O operations) you should consider to use mfs (openMosix filesystem).
Create your own directory on /mfs/<PREFEREDNODE>/local_scratch/ (e.g. mkdir /mfs/5/local_scratch/foo)
and the run your job bounded to that (remote) filesystem (see below for a silly example).
Consider that, in this case, you'll no longer benefit from openMosix load balancing (this is a
sort of "manual-balancing").
In order to see which node has less or no jobs I/O bounded to mfs, you can use this command-line:
ompsinfo -A -a mfs | grep -v /
Note that mfs is not a stable filesystem, too many contemporaneous (I/O
bounded) jobs may crash the node involved in the I/O operation.
If you have only a huge output (no I/O intensive) all that you need is
the local scratch directory.
Example on using the openMosix filesystem:
As script:
#!/bin/bash
cd /mfs/5/local_scratch/foo/
myexe 1>./output.log 2>./output.err
As cmdline:
myexe >/mfs/5/local_scratch/foo/output.log
If your program has some sort of smart cmdline parsing:
myexe --output /mfs/5/local_scratch/foo/output.log
Here you can find another example.
- [mulo/somaro] I run my job over mfs as suggested but seems that does not migrate. What's wrong?
Probably your job uses threads or other non-migratable stuff.
If your program is compiled against mkl libraries, it use threads (read this).
Launching these jobs on specific node is useless unless they can migrate on that node.
somaro:~# ompsinfo -a mfs,ppid -u xyz
CMD PID PPID NODE NMIGS MFS LOCK CANTMOVE
sh 2903 1 0 0 5 0 migratable
sshd 5356 5352 0 0 / 0 migratable
tcsh 5357 5356 0 0 / 0 migratable
b 5366 2903 0 0 5 0 clone_vm
b 5367 5366 0 0 5 0 clone_vm
b 5368 5367 0 0 5 0 clone_vm
b 5369 5367 0 0 5 0 clone_vm
For instance, here you have more threads (at least two, one per processor + parents) which
are running on masternode instead of node005.
My advice is to set the OMP_NUM_THREADS environment variable to 1 in
your scripts or in your environment, as suggested here.
- [mulo/somaro] Mathematica complains about some not-installed fonts. It works but is difficult to read...
Try to use a script like this:
xset fp+ tcp/somaro.sissa.it:7100
xhost + somaro.sissa.it
ssh [<USERNAME>@]somaro mathematica