Page 1 of 1

G0W0 crashes on frontera supercomputer

Posted: Thu Feb 20, 2025 6:05 pm
by nicholas_dimakis1

Hello

I am trying to run G0W0 on a 2x2 MoS2 monolayer. The POSCAR is shown below:

Code: Select all

Mo2 S4                                  
1.00000000000000
6.3806309700000003 0.0000000000000000 0.0000000000000000 -3.1903140134000001 5.5257893618000002 0.0000000000000000 0.0000000000000000 0.0000000000000000 14.8790035247999999 Mo S 4 8 Direct 0.1666665429999981 0.3333335530000028 0.2500000080000007 0.1666665429999981 0.8333335530000028 0.2500000080000007 0.6666665429999981 0.3333335530000028 0.2500000080000007 0.6666665429999981 0.8333335530000028 0.2500000080000007 0.3333335680000005 0.1666665050000020 0.3549265529975554 0.3333335680000005 0.6666665050000020 0.3549265529975554 0.8333335680000005 0.1666665050000020 0.3549265529975554 0.8333335680000005 0.6666665050000020 0.3549265529975554 0.3333335680000005 0.1666665050000020 0.1450734630024459 0.3333335680000005 0.6666665050000020 0.1450734630024459 0.8333335680000005 0.1666665050000020 0.1450734630024459 0.8333335680000005 0.6666665050000020 0.1450734630024459

I do scf followed by EXACT, and finally the GW run. The INCAR for the GW is given below

Code: Select all

SYSTEM  = MoS2

#NCORE  = 4
KPAR   = 2

ENCUT   = 500
#IBRION  = -1

ISMEAR  = 0
SIGMA   = 0.01
#NBANDS  = 96
NEDOS   = 3000

#LOPTICS = TRUE
#ALGO =EXACT
#NELM =1
NBANDS= 300
#ISIF = 2 ; IBRION = 2; NSW = 100


ALGO = EVGW0
NELMGW = 1
NOMEGA = 50;


PREC    = Single
EDIFF   = 1.e-8
LREAL   = Auto
LASPH   = True

The grep memory OUTCAR gives the following information

Code: Select all

 total amount of memory used by VASP MPI-rank0    59275. kBytes
 available memory per node:   22.90 GB, setting MAXMEM to   23450
 files read and symmetry switched off, memory is now:
 total amount of memory used by VASP MPI-rank0   617433. kBytes
 min. memory requirement per mpi rank  21762.4 MB, per node 152336.9 MB
 all allocation done, memory is now:
 total amount of memory used by VASP MPI-rank0 22734744. kBytes

I am running this job on the TACC Frontera supercomputer using 20 nodes and keeping the number of CPUs low

#SBATCH -n 140
#SBATCH -N 20

Each node has 192 Gb of RAM, and thus, the total RAM is about 3.8 Tb—however, GW crashes.

I have similar crashes using the Lonestar 6 supercomputer.

Thank you-Nick


Re: G0W0 crashes on frontera supercomputer

Posted: Fri Feb 21, 2025 10:27 am
by manuel_engel1

Hello Nick,

Thanks for reaching out. I suspect that the crash is due to insufficient memory, but it's not 100% clear yet to me. The first thing I would like you to try is to lower the memory requirements. This will tell us if the crash is really due to memory constraints or not.
Here are a few things you can try to reduce memory consumption:

  • Lower ENCUTGW. This should drastically reduce the required memory.

  • Reduce the number of cores per node even further.

  • Reduce the number of bands in your calculation.

If the crash is due to insufficient memory, you will have to look for ways to converge your calculation in the given memory constraints.

One more thing that might be worth considering is the load distribution across the different nodes. With

Code: Select all

#SBATCH -n 140
#SBATCH -N 20

Slurm will try to allocate the cores evenly across the nodes. However, it is probably a good idea to check if this is actually the case. A load imbalance could easily cause your calculation to run out of memory. I recommend to start the job using

Code: Select all

#SBATCH --nodes=20
#SBATCH --ntasks-per-node=7

which is more explicit about the placement of cores/tasks across nodes.

Let me know how it goes.

Kind regards


Re: G0W0 crashes on frontera supercomputer

Posted: Fri Feb 21, 2025 3:56 pm
by nicholas_dimakis1

Thank you very much for your help. I used ENCUT =400, and it now works.

Nick