[Pw_forum] File closing errors when using NEB image parallelization

J. J. Ramsey jjr19 at uakron.edu
Wed Jun 17 16:25:27 CEST 2009


----- Original Message ----

> Could you please provide more information on the compilers you have used  
> on the two different cluster

On the MJM cluster (http://www.arl.hpc.mil/Systems/mjm.html), the compiler is ifort 11.0, and the LAPACK used is the *sequential* version of MKL 10.1. The MPI implementation used was OpenMPI 1.2.9. On the OSC BALE cluster (http://www.osc.edu/supercomputing/computing/bale/), the compiler was ifort 9.1, and the LAPACK used was the one bundled with the QE tarball. The MPI implementation was MVAPICH 0.9.9.

> the exact command line you have used to  
> start the job (are you using pools?) 

The exact commands are as follows.

On MJM: mpirun.lsf ./pw.x -nimage 3 -in "smallerProb_PWscfNEB2_c18_v10_k4.in" > "smallerProb_PWscfNEB2_c18_v10_k4.out" 2> "smallerProb_PWscfNEB2_c18_v10_k4.err"

On BALE: mpiexec "$TMPDIR/pw.x" -nimage 3 -in smallerProb_PWscfNEB2_c18_v10_k4.in > smallerProb_PWscfNEB2_c18_v10_k4.out 2> smallerProb_PWscfNEB2_c18_v10_k4.err

> and the output of the "env" command  
> from pw.x point of view (i.e. you put env in you job's script).

These are attached

> Finally, please take no offense if this sound like a stupid suggestion;  
> the only way I could reproduce you problem was to run multiple copies of a  
> serial-compiled executables via mpirun. Could you please double-check it  
> is not your case?

They definitely were not compiled serially. Here is the first few lines of the output file, which is the same for both MJM and BALE (except for the date at the top):

     Program PWSCF     v.4.0.5  starts ...
     Today is 17Jun2009 at  8:44:24 

     Parallel version (MPI)

     Number of processors in use:      24
     path-images division:  nimage    =    3
     R & G space division:  proc/pool =    8

     For Norm-Conserving or Ultrasoft (Vanderbilt) Pseudopotentials or PAW

     Current dimensions of program pwscf are:
     Max number of different atomic species (ntypx) = 10
     Max number of k-points (npk) =  40000
     Max angular momentum in pseudopotentials (lmaxx) =  3

     initial path length           =  7.5354 bohr
     initial inter-image distance  =  1.8838 bohr
 
     calculation                   =  neb
     restart_mode                  =  from_scratch
     opt_scheme                    =  broyden
     num_of_images                 =  5
     nstep                         =  50
     CI_scheme                     =  no-CI
     first_last_opt                =  F
     coarse-grained phase-space    =  F
     use_freezing                  =  F
     ds                            =  1.0000 a.u.
     k_max                         =  0.6169 a.u.
     k_min                         =  0.6169 a.u.
     suggested k_max               =  0.6169 a.u.
     suggested k_min               =  0.6169 a.u.
     path_thr                      =  0.0500 eV / A


      
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BALE-my_env_vars.out.gz
Type: application/x-gzip
Size: 1352 bytes
Desc: not available
Url : http://www.democritos.it/pipermail/pw_forum/attachments/20090617/3ab5625e/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MJM-my_env_vars.out.gz
Type: application/x-gzip
Size: 2476 bytes
Desc: not available
Url : http://www.democritos.it/pipermail/pw_forum/attachments/20090617/3ab5625e/attachment-0001.bin 


More information about the Pw_forum mailing list