9th Users' Conference of IT4Innovations

Name: 9th Users' Conference of IT4Innovations
Start: 2025-10-30T10:00:00+01:00
End: 2025-10-31T14:00:00+01:00
Location: IT4Innovations

30–31 Oct 2025

IT4Innovations

Europe/Prague timezone

Support

training@it4i.cz

Use of open data for training machine-learning interatomic potentials

30 Oct 2025, 18:38

atrium (IT4Innovations)

atrium

IT4Innovations

Studentská 6231/1B 708 00 Ostrava-Poruba

Poster Materials Science (e.g. Computational/Theoretical/Physical Chemistry, Soft Matter, Polymer Research) Conference Dinner and Poster Session

Šimon Kratochvíl

With machine learning and its use in science rapidly growing in popularity, the need for high-quality training data is increasing. Most researchers however train their models either on their own data or on curated databases. With the growing emphasis on open science, a large amount of data from other researchers is now openly available, but such data often come without any guarantee of quality, thus its suitability for machine learning is uncertain.

In this work, we assess the quality of the data in NOMAD, the largest open materials simulation database, and its practical applicability for training machine-learning interatomic potentials for atomistic simulations. We present a workflow designed to tackle several challenges associated with the NOMAD data: automatically filtering out results with low numerical accuracy, deduplicating structures in the training data, and combining results coming from multiple DFT implementations with different total energy offsets.

With this workflow, we have successfully trained silicon-based potentials using only simulations from NOMAD as training data. The resulting potential predicts phase stability at a level comparable to state-of-the-art potentials, while also accurately describing large-scale atomic systems, even at high temperatures. This demonstrates that using open data can significantly reduce the time and costs required to generate suitable training datasets for machine-learning interatomic potentials.

Šimon Kratochvíl

Pavel Ondračka (MUNI)

There are no materials yet.

9th Users' Conference of IT4Innovations

Support

Use of open data for training machine-learning interatomic potentials

atrium

IT4Innovations

Speaker

Description

Primary author

Co-author

Presentation materials

Choose timezone

9th Users' Conference of IT4Innovations

Support

Speaker

Description

Primary author

Co-author

Presentation materials