Compilers


On all the clusters we use three different suite of compilers: Versions and location of the compilers are as follow:


Available Compilers
Cluster Briareo BaCiuco Opteron Helium
Intel Fortran 77/90/95
locations are:
/usr/local/intel/compiler70/
/usr/local/intel/intel_fc_81/
Fortran 77/90/95
/opt/intel_fc_80/
ICC
Fortran 77/90/95 Fortran 77/90/95
/usr/local/intel_fc_80/bin/ifort (32 bit)
/usr/local/intel_fce_80/bin/ifort (64 bit)
Intel Documentation N/A /opt/intel_cc_80/doc/ /opt/intel/mkl72/doc/
/opt/intel_fc_80/docs/
N/A
Portland Group (PGI) Version 3.1.4
Fortran 77, Fortran 90, C, C++
N/A N/A N/A
GNU Fortran 77, C, C++ Fortran 77/95 C, C++ Fortran 77, C, C++ Fortran 77/95, C, C++

Note: on clusters where ifc_81 environment is installed, in order to run it, use the following command, this won't work on opteron:


Tips and Tricks

ifort on Helium

-xN generate specialized code to run exclusively on Intel Pentium 4 and compatible Intel processors.
Using this option during compilation we found an improvement about 11% on our specific test
-O0 Disable optimizations
-O1 Enable optimizations, this is the default option which is used also without specificate it
-O3 Enable optimizations plus more aggressive optimizations that may not improve performance for all programs
-static Prevents linking with shared libraries
-ipo Enable multi-file Interprocedural optimizations (between files)
-ip Enable single-file Interprocedural optimizations (within files)
-ipo_separate Create one object file for every source file (overrides -ipo[n])
-tune pn4 Optimize for Pentium(R) 4 processor (DEFAULT)

-xN -O3 -ip -static This is the optimal solution. With those flags we obtained the best results running our specific test
-xN -O3 -ipo -static

-O3 -ipo_separate -static
With this options enabled during compilation of our specific script we obtained a computing time of about 9% more then with the optimal solution
-O0 -static

-xN -O1 -ipo_separate -static
With this options enabled during compilation of our specific script we obtained a computing time of about 25% more than with the optimal solution
-O1 -static -tune pn4 With this options enabled during compilation of our specific script we had the worst results. About 80% more than with the optimal solution
-fast Our test will not run if compiled this option

Dynamically linking libraries outside default path using rpath

Sometimes you may need to use a library outside the standard path for a single job (e.g. you usually compile 64 bit codes and you have to compile a single 32 bit one). Without rewriting your .bashrc before the compilation, with problems if you run multiple jobs, you simply have to write a line similar to the following while you compile your code:
(ifort/ifc/icc/gcc/...) -Wl,-rpath=/full/path/to/libdir/
or
ld -rpath=/full/path/to/libdir/
e.g.:
ifort_32 -Wl,-rpath=/usr/local/intel_fc_80/lib/
ifort_64 -Wl,-rpath=/usr/local/intel_fce_80/lib/
when you run the code the research path of the libraries will be added without changing LD_LIBRARY_PATH or LD_RUN_PATH

Cross compilation

If you need to compile a code as a 32-bit executable on a 64-bit architecture (e.g. on helium), don't forget to use the gnu's compiler option -m<ARCH> (where <ARCH> is 32 or 64 bits) or the ifort wrapper compiler_<ARCH> (ifort_32, ifort_64).
gcc -m32 ...
ifort_32 ...
ifort_64 ...

Increase Quantum-ESPRESSO performances

You simply have to move global scratch files to local scratch on the nodes. A sample batch script that does this will be located in /opt/espresso along with an example make.sys-note for Both CPMD and PWSCF I will have example batch scripts. Check periodically /opt/cpmd/Makefile and /opt/espresso/make.sys for well tuned compilation options. See also comment about batch script.

---- More to come ----