[Pw_forum] [Fwd: diagonalization failure (david, cg) for large numbers of bands]

Stefano de Gironcoli degironc at sissa.it
Tue Dec 1 22:42:40 CET 2009


Dear  Vivek Ranjan... (or  Joseph Turnbull ?),

   I do not know if this comment is relevant... I never tried to  
compute so large a fraction of the band structure.
   how many plane waves your basis set contains ?
   the default operation of davidson diagonalization is that the basis  
set is expanded up to diago_david_ndim times (default = 4 times) the  
number or required bands (nbnd) ... are you asking more that 1/4 of  
the total number of elements in your basis set ?
   it seams to me that NPW in your case should be something of the  
order of 8000, isn't it ?
   you could try using diago_david_ndim=2..

    stefano


Quoting Vivek Ranjan <vranjan at ncsu.edu>:

> Hi,
>
> ## Summary:  Running pw.x on 128-1024 processors, testing bulk 64-Si cell
> at gamma
> (gamma tricks not used because of incompatibility with subsequent
> calculations) with
> a "large" number of (extra) bands.  No problems reported when nbnd is
> small.  With
> 128-256 processors, when nbnd>1300, if using Davidson diag, program exits
> before
> completion of 1 scf step, with cholesky decomposition failure error; if using
> iterative diag (cg), fails at same stage with error "(ZHEGV*) failed".
> System is
> Cray XT4.
>
> ## Purpose:  reproducing the beautiful results of PHYSICAL REVIEW B 79,
> 201104, 2009
> for GWW education purposes.  :)
>
> ## Background:  I have found similar-looking problems reported here, and
> have tried
> several of the recommendations (switching to ndiag 1 at runtime to use
> serial diag
> instead of parallel; switching from david to cg).
>
> In addition, I have tried increasing the PW cutoff (to provide more PWs
> relative to
> requested bands for the sake of Davidson diag, but this does not really
> help).
>
> I also attempted to do a regular SCF calculation with no nbnd specification,
> followed by a NSCF calculation with extra bands specified.  The same
> errors are
> obtained.
>
> ## Current status:  I am now trying to rule out memory-related errors (via
> running
> on more nodes), and will update this thread accordingly if the problem is
> related to
> memory requirements.  Running on 512 processors permitted nbnd=2500
> (converged
> results should require ~3300 bands for this particular calculation,
> according to my
> understanding of the noted paper), and I have some 1024 processor runs
> queued up.
>
> It does not seem to me that such a system, even with so many states,
> should have
> such large memory demands, so am wondering if I am doing something
> stupendously
> wrong (or perhaps not exactly doing something wrong, but failing to do
> something
> glaringly obvious that would solve the problem).  Below is my input file,
> followed
> by some brief technical specs in case such are helpful.
>
> ## Sample input file:
>
> &control
>  calculation='scf'
>  restart_mode='from_scratch',
>  prefix='si'
>  outdir='/scr/josepht/espresso/bsi64/Large_GAMMA/STEP_B/tmp'
>  pseudo_dir='/scr/josepht/espresso/bsi64/pseudo'
> /
> &system
>  ibrav= 8,
>  celldm(1)= 20.52,
>  celldm(2)= 1,
>  celldm(3)=1,
>  nat=  64,
>  ntyp= 1,
>  ecutwfc = 35.0,
>  nosym=.true.
>  nbnd = 3328,
> /
> &electrons
>  diagonalization='david',
>  conv_thr =  1.0d-8,
>  mixing_beta = 0.5,
> /
> ATOMIC_SPECIES
> Si  1. Si.pbe-rrkj.UPF
> ATOMIC_POSITIONS (bohr)
> Si      0.00000000        0.00000000        0.00000000
> Si      5.13000000        5.13000000        0.00000000
> Si      0.00000000        5.13000000        5.13000000
> Si      5.13000000        0.00000000        5.13000000
> Si      2.56500000        2.56500000        2.56500000
> Si      7.69500000        7.69500000        2.56500000
> Si      7.69500000        2.56500000        7.69500000
> Si      2.56500000        7.69500000        7.69500000
> Si     10.26000000        0.00000000        0.00000000
> Si     15.39000000        5.13000000        0.00000000
> Si     10.26000000        5.13000000        5.13000000
> Si     15.39000000        0.00000000        5.13000000
> Si     12.82500000        2.56500000        2.56500000
> Si     17.95500000        7.69500000        2.56500000
> Si     17.95500000        2.56500000        7.69500000
> Si     12.82500000        7.69500000        7.69500000
> Si      0.00000000       10.26000000        0.00000000
> Si      5.13000000       15.39000000        0.00000000
> Si      0.00000000       15.39000000        5.13000000
> Si      5.13000000       10.26000000        5.13000000
> Si      2.56500000       12.82500000        2.56500000
> Si      7.69500000       17.95500000        2.56500000
> Si      7.69500000       12.82500000        7.69500000
> Si      2.56500000        7.69500000        7.69500000
> Si     10.26000000        0.00000000        0.00000000
> Si     15.39000000        5.13000000        0.00000000
> Si     10.26000000        5.13000000        5.13000000
> Si     15.39000000        0.00000000        5.13000000
> Si     12.82500000        2.56500000        2.56500000
> Si     17.95500000        7.69500000        2.56500000
> Si     17.95500000        2.56500000        7.69500000
> Si     12.82500000        7.69500000        7.69500000
> Si      0.00000000       10.26000000        0.00000000
> Si      5.13000000       15.39000000        0.00000000
> Si      0.00000000       15.39000000        5.13000000
> Si      5.13000000       10.26000000        5.13000000
> Si      2.56500000       12.82500000        2.56500000
> Si      7.69500000       17.95500000        2.56500000
> Si      7.69500000       12.82500000        7.69500000
> Si      2.56500000       17.95500000        7.69500000
> Si      0.00000000        0.00000000       10.26000000
> Si      5.13000000        5.13000000       10.26000000
> Si      0.00000000        5.13000000       15.39000000
> Si      5.13000000        0.00000000       15.39000000
> Si      2.56500000        2.56500000       12.82500000
> Si      7.69500000        7.69500000       12.82500000
> Si      7.69500000        2.56500000       17.95500000
> Si      2.56500000        7.69500000       17.95500000
> Si     10.26000000       10.26000000        0.00000000
> Si     15.39000000       15.39000000        0.00000000
> Si     10.26000000       15.39000000        5.13000000
> Si     15.39000000       10.26000000        5.13000000
> Si     12.82500000       12.82500000        2.56500000
> Si     17.95500000       17.95500000        2.56500000
> Si     17.95500000       12.82500000        7.69500000
> Si     12.82500000       17.95500000        7.69500000
> Si     10.26000000        0.00000000       10.26000000
> Si     15.39000000        5.13000000       10.26000000
> Si     10.26000000        5.13000000       15.39000000
> Si     15.39000000        0.00000000       15.39000000
> Si     12.82500000        2.56500000       12.82500000
> Si     17.95500000        7.69500000       12.82500000
> Si     17.95500000        2.56500000       17.95500000
> Si     12.82500000        7.69500000       17.95500000
> Si      0.00000000       10.26000000       10.26000000
> Si      5.13000000       15.39000000       10.26000000
> Si      0.00000000       15.39000000       15.39000000
> Si      5.13000000       10.26000000       15.39000000
> Si      2.56500000       12.82500000       12.82500000
> Si      7.69500000       17.95500000       12.82500000
> Si      7.69500000       12.82500000       17.95500000
> Si      2.56500000       17.95500000       17.95500000
> Si     10.26000000       10.26000000       10.26000000
> Si     15.39000000       15.39000000       10.26000000
> Si     10.26000000       15.39000000       15.39000000
> Si     15.39000000       10.26000000       15.39000000
> Si     12.82500000       12.82500000       12.82500000
> Si     17.95500000       17.95500000       12.82500000
> Si     17.95500000       12.82500000       17.95500000
> Si     12.82500000       17.95500000       17.95500000
> K_POINTS
> 1
> 0.0 0.0 0.0 1.0
>
> ##END OF INPUT
>
> The above file runs when nbnd = 1280 , and (possibly) relevant output from
> the
> successful run includes:
>
> (Each subspace H/S matrix      400.00 Mb     (   5120,5120)
>
> ## Technical specs:  Code was compiled on a Cray XT4 (unsure if
> compilation details
> would be helpful), and runs were performed on Cray XT4 nodes with two
> quad-core 2.3
> GHz AMD Opteron processors with 16 GBytes of usable memory (requesting 4
> cores per
> node).
>
> I've read here that the problem might be related to libraries/compilers
> (issues with
> PGI, ACML, etcetera)...if that is likely the case, I would be interested
> in insight
> regarding optimal compilation on Cray.
>
> Thanks in advance for any assistance, and I apologize if this question has
> essentially already been answered on the forum - I searched but did not
> come across
> an explicit solution to something matching this, though admit that the
> general theme
> is present in several independent threads.
>
> Joseph Turnbull
> Department of Physics
> NC State University
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
>



----------------------------------------------------------------
   SISSA Webmail https://webmail.sissa.it/
   Powered by Horde http://www.horde.org/




More information about the Pw_forum mailing list