[Pw_forum] Calculation stopped at the beginning of ph.x process

GAO Zhe flux_ray12 at 163.com
Thu Feb 9 15:15:18 CET 2012

 Dear QE developer and users:
At first, I want to say thanks to Alex Kohlmeyer, who made me realized I still can run the calculation by normal user access.
I compiled pw.x and ph.x, again, by mpich2 1.4.1p1 which was compiled by PGI fortran 9.0 (trial version). The pw.x calculation among 12 nodes (24 cores) was very nice. But when I run the ph.x, the calculation stopped at the initial step.
The terminate displayed as:
mpirun -machinefile nodes -np 24 ${KKK}/ph.x -npool 3 -in ${KKK}/binary.ph.in > ${KKK}/binary.ph.out
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 23
And I checked the output file, binary.ph.out, it did not show the information about process stopping:
     Parallel version (MPI), running on    24 processors
     K-points division:     npool     =    3
     R & G space division:  proc/pool =    8

     Ultrasoft (Vanderbilt) Pseudopotentials

   Info: using nr1, nr2, nr3 values from input

   Info: using nr1s, nr2s, nr3s values from input
     Message from routine read_ions :
     PP will be read from ./

     Parallelization info
     sticks:   dense  smooth     PW     G-vecs:    dense   smooth      PW
     Min         253     253     72                 8611     8611    1298
     Max         254     254     73                 8614     8614    1301
     Sum        2025    2025    577                68891    68891   10395

     Dynamical matrices for ( 3, 3, 3,)  uniform grid of q-points
     (   4q-points):
       N         xq(1)         xq(2)         xq(3)
       1   0.000000000   0.000000000   0.000000000
       2   0.000000000   0.000000000   0.333333333
       3   0.000000000   0.333333333   0.333333333
       4   0.333333333   0.333333333   0.333333333

     Calculation of q =    0.0000000   0.0000000   0.0000000
rank 3 in job 73  node01_35097   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9
rank 22 in job 73  node01_35097   caused collective abort of all ranks
  exit status of rank 22: killed by signal 9
Why did this problem occurred? I ran the same input file in my own computer, with mpich2 1.4.1p1 and ifort 12. It worked well at least up to the "break point". Is this problem caused by the fortran compilor PGI 9.0?
Looking forward to any suggestion.
Thanks a lot.

CMC Lab, Materials Science & Engineering Department,
Seoul National University, South Korea
