Nov 4 – 5, 2024
IT4Innovations
Europe/Prague timezone

Bottom-up machine learning potentials for peptides

Nov 4, 2024, 12:40 PM
20m
atrium (IT4Innovations)

atrium

IT4Innovations

Studentská 6231/1B 708 00 Ostrava-Poruba
User's talk Users' talks Users' Talks I

Speaker

Erik Andris (IOCB Prague)

Description

Can larger peptides be described just knowing their smaller constituents? Concretely: can we infer the potential energy surface (PES) of a 20-peptide just from the PESs of single amino acids and dipeptides? To answer these long-standing questions,[1] we trained equivariant neural network potentials[2] on oligopeptides of varying sizes (1-3; taken from PeptideCS[3] and P-CONF_1.6M[4] datasets) and tested the performance of these potentials on larger peptides. The training as well as the evaluation data consisted of structures optimized at GFN-2+ALPB(water) level of theory, some of them with fixed main chain and side chain dihedral angles. The energies were calculated at BP86/D3Rezac, COSMO-RS level described previously.[3] Because the neural network has no built-in inductive biases besides the locality of interactions (5 Å distance cutoff) and equivariance, we can test if any new interactions appear in larger peptides that are not present in the dipeptides used for training. Previous research in our group indicated that this should not be possible, and energy function that would estimate energy of longer chains from shorter ones could not be constructed.[5] Interestingly, a system trained on dipeptides and amino acids only can already predict energy of pentapeptides with 1 kcal mol-1 RMSE and it can also correctly identify the global minimum of a larger protein out of 1000 structures (Figure 1). We believe that resulting potentials can be immediately used to significantly accelerate calculations. In addition, the excellent performance of the ML potentials also indicates that a bottom-up theoretical approach to predicting protein structures from first principles might be possible.

Figure 1. (a) Description of the training process and the training peptide structures. (b) Example of a larger test peptide. (c,d) Actual (DFT) vs predicted (NN trained on mono- and dipeptides) absolute energies of (c) pentapeptides and (d) conformers of (b) (in eV; atomic energies were subtracted).

References
[1] Schweitzer-Stenner, R. The relevance of short peptides for an understanding of unfolded and intrinsically disordered proteins. Phys. Chem. Chem. Phys. 2023, 25, 11908-11933.
[2] Batzner, S.; Musaelian, A.; Sun, L.; Geiger, M.; Mailoa, J. P.; Kornbluth, M.; Molinari, N.; Smidt, T. E.; Kozinsky, B. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022, 13, 2453.
[3] Kalvoda, T.; Culka, M.; Rulíšek, L.; Andris, E. Exhaustive Mapping of the Conformational Space of Natural Dipeptides by the DFT-D3//COSMO-RS Method. J. Phys. Chem. B 2022, 126, 5949–5958.
[4] Culka, M.; Kalvoda, T.; Gutten, O.; Rulíšek, L. Mapping Conformational Space of All 8000 Tripeptides by Quantum Chemical Methods: What Strain Is Affordable within Folded Protein Chains? J. Phys. Chem. B 2021, 125, 58–69.
[5] Kalvoda, T. Studium konformačního chování krátkých peptidových fragmentů metodami kvantové chemie. Master Thesis [Online], Charles University, Prague, July 2020. http://hdl.handle.net/20.500.11956/122714 (accessed Sep. 4, 2024).

Primary author

Erik Andris (IOCB Prague)

Co-authors

Lubomir Rulisek Tadeas Kalvoda (Institute of Organic chemistry and Biochemistry of the CAS) Ján Michael Kormaník (IOCB Prague)

Presentation materials

There are no materials yet.