Page 1 of 1

Questions for merging different ML_AB files

Posted: Fri Sep 27, 2024 10:44 am
by jing_huang

Hi everyone,

I'd like to ask two questions about how to merge multiple ML_AB files in different situations.
1. According to the description in the VASP wiki, we can simply concatenate some ML_AB files to create a new ML_AB, it is understandable for those systems including the same elements. But how can I do if those ML_ABs include different elements? Like, I trained two ML_ABs: a bulk MAPbI3 and MAPbI3 with a Br_{I} substitution.
2. Which one is the better choice to train a force field in different temperatures? (1) Training a higher temperature MLFF from the low-temperature contained data. (2) Merging two different ML_AB files from high- and low-temperature training.

Cheers,
Jing


Re: Questions for merging different ML_AB files

Posted: Fri Sep 27, 2024 11:46 am
by martin.schlipf

Yes, merging multiple ML_AB files is possible. It is best if you try first to generate a proper ML_AB file for the system with the most atoms to see how the structure of the file looks like. This gives you confidence to produce the right format for the ML_AB file. If your ML_AB file is well formatted, you can run ML_MODE = SELECT and VASP does not care whether the structures have been put in the ML_AB file by on-the-fly training, merging two ML_AB files, or manual composition of different POSCAR files.

Probably the most efficient way is training at high temperature because a more diverse set of configurations is visited. If you do not expect a phase change, it may be even sufficient to train just at high temperatures though it is always good to double check this. Alternatively, you can run a temperature ramp (TEBEG, TEEND) to sample various temperatures in a single MD run. Merging two ML_AB files would also work but leads to more required DFT calculations in the production of the MLFF because both files will need to sample similar environments to get general idea of the structure.


Re: Questions for merging different ML_AB files

Posted: Fri Sep 27, 2024 1:34 pm
by jing_huang

Dear Schlipf,

Thank you for your kind reply. I know how to merge a couple of ML_ABs now.
Is the temperature ramp training good if I mean to study a process of the solid phase transition? In the classic work of the phase transition of MAPI (Phys. Rev. Lett. 122, 225701), they trained a MLFF in a few discrete temperatures instead of a temperature ramp. I'm curious which approach is better, not sure if this has an answer. ^-^

Cheers,
Jing

martin.schlipf wrote: Fri Sep 27, 2024 11:46 am

Yes, merging multiple ML_AB files is possible. It is best if you try first to generate a proper ML_AB file for the system with the most atoms to see how the structure of the file looks like. This gives you confidence to produce the right format for the ML_AB file. If your ML_AB file is well formatted, you can run ML_MODE = SELECT and VASP does not care whether the structures have been put in the ML_AB file by on-the-fly training, merging two ML_AB files, or manual composition of different POSCAR files.

Probably the most efficient way is training at high temperature because a more diverse set of configurations is visited. If you do not expect a phase change, it may be even sufficient to train just at high temperatures though it is always good to double check this. Alternatively, you can run a temperature ramp (TEBEG, TEEND) to sample various temperatures in a single MD run. Merging two ML_AB files would also work but leads to more required DFT calculations in the production of the MLFF because both files will need to sample similar environments to get general idea of the structure.


Re: Questions for merging different ML_AB files

Posted: Fri Sep 27, 2024 2:04 pm
by martin.schlipf

No, in the case of a phase transition, I would rather take distinct temperatures. Usually it takes too long for a material to transition to the new phase so that it is better to set it up in the appropriate phase from the outset.


Re: Questions for merging different ML_AB files

Posted: Tue Oct 01, 2024 9:01 am
by ferenc_karsai

Usually we advice to train with a temperature ramp for a single phase if you use an automatic determination of the threshold (ML_ICRITERIA=1). This has a rather technical reason because ML_CTIFOR, the threshold for the Bayesian force errors, is determined as the average of the last few sampling steps. If we have very few training data (e.g. starting from scratch) the Bayesian force error averages can be predicted very wrong. If they are predicted too low nothing happens since we are then just simply learn more until the predictions get reasonable. The problem occurs if the error thresholds are too large. Then in all upcoming steps the predicted errors are below the threshold and no learning will happen at all. To overcome this, we can introduce a temperature ramp. This helps because the errors of the force field grow with temperature and eventually the predictions will be above the threshold.

If you train multiple phases ideally you can make a temperature ramp for each phase when learning them (by ramping up close or slightly above the phase transition temperature), but it is maybe enough to learn with a temperature ramp only for the first phase and then train the others on a fixed temperature. You will have to try that out and see how much training data is picked up.
Alternatively you can set ML_ICRITERIA=0 and choose a value for ML_CTIFOR in the INCAR file. That way you can always train at a fixed temperature for a given phase.

Regarding the MAPbI paper, the production calculations were carried out using a temperature ramp and we could see the phase transitions happen by itself within this calculation. In some materials the phase transitions can happen even by heating and cooling runs (with a hysteresis) and in others nothing happens. If you try out ensure that the box is large enough and that you don't ramp to fast (by setting NSW quite large).