[ONLINE] Data science with R and Python (PTC course)

Europe/Prague
online

online

Description

Annotation

The R part of the course will be focused on presenting the basics of exploratory data analysis in R, as well as presentation of the findings through visualization, and basics of statistical/machine learning modelling. The course will cover the basic workflow of exploratory analysis using packages from the 'tidyverse' universe. These includes packages for the loading of data, preprocessing data, basic data exploration, and visualization. In the second part, we will work on the basics of modelling in R starting with data preparation (missing data handling, one-hot enconding, etc.), model training, and model evaluation. In this part the main tools will be packages 'caret' and 'xgboost'.

The Python oriented part will introduce essential data-scientific packages that will demonstrate their usage with real world data analytic problems, and showing how to tackle such problems.

Level

beginner - intermediate

Language

English

Prerequisites

Basic knowledge of Python and/or R.

Purpose of the course (benefits for the attendees)

Target audience: Users that want to use Python and/or R for data analysis and prototyping. The participants will learn basic and intermediate skills for exploratory data analysis and visualization in the programming languages of R and Python.

About the tutor(s)

Tomáš Martinovič obtained his PhD in computational sciences at IT4Innovations, VSB - Technical University of Ostrava in 2018. From 2015 to 2018 he worked in a team focused on analysis of complex dynamical systems, where he worked on scalable implementations of algorithms from the field of nonlinear time series analysis. Since the start of 2019 he has been working in a team focused on high performance data analysis with the defined objective of research and transfer of knowledge in cooperation with industry.

Stanislav Böhm has a PhD in computer science, and is a researcher at IT4Innovations. He is interested in distributed systems, verification, and scheduling.

Acknowledgements

                                                                            

This event was partially supported by The Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project "e-Infrastruktura CZ – LM2018140“ and partially by the PRACE-6IP project - the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 823767.

 

This course is supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90140).

    • Time to join the meeting
    • Exploratory analysis in R
    • 10:30
      Comfort Break
    • XGBoost basics
    • 12:15
      Lunch
    • Pandas Introduction
    • 14:45
      Comfort Break
    • Selected Data Analytic problems