Annotation
Although plants are an incredibly rich source of pharmaceutically relevant specialised metabolites, biosynthetic pathway elucidation in non-model plant species has proven challenging. Unlike bacteria and many fungi that contain biosynthetic operons, the genes of a given plant typically scatter randomly across the genome, making pathway discovery via genome mining nearly impossible. My lab is developing generalised workflows for connecting biosynthetic gene sequences (obtained using RNAseq) to their downstream metabolites (detected in LC-MS data). To this end, we have developed EnzymeExplorer, a targeted machine learning pipeline for predicting the enzymatic functions of terpene synthases directly from their amino acid sequences (Samusevich et al, bioRxiv 2025) and DreaMS, a self-supervised foundation machine learning model for tandem mass spectrometry, which outperforms state-of-the-art methods in a range of different prediction tasks (Bushuiev et al., Nature Biotechnology 2025). We are further building on these foundations towards the final goal: a full computational characterisation of the chemodiversity and biosynthetic potential of each plant species using easy-to-obtain experimental datasets.
Language
English
Tutor
Tomáš Pluskal, Ph.D., IOCB Prague
Tomáš Pluskal studied software engineering at the Faculty of Mathematics and Physics, Charles University in Prague. He earned his PhD in molecular biology at Hiroshima University in Japan, where he spent ten years. He subsequently worked for nearly five years at the Whitehead Institute at MIT in Cambridge, USA. Since 2020, he has been working at the Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, where he leads a research group focused on the study of plant metabolites. His research centres on the discovery of bioactive molecules in plants and the exploration of tools for the biosynthesis of these compounds.

This course was supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254).
All presentations and educational materials of this course are provided under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
