14–15 Oct 2015
VŠB - Technical University Ostrava, IT4Innovations building
Europe/Prague timezone

Annotation

Data volumes are ever growing, for a large application spectrum going from traditional database applications, scientific simulations to emerging applications including Web 2.0 and online social networks. To cope with this added weight of Big Data, we have recently witnessed a paradigm shift in the way data is processed through the MapReduce model. First promoted by Google, MapReduce has become, due to the popularity of its open-source implementation Hadoop, the de facto programming paradigm for Big Data processing in large-scale data-centers and clouds.

The goal of this tutorial is to serve as a first step towards exploring the Hadoop platform and also to provide a short introduction into working with big data in Hadoop. An overview on Big Data including definitions, the source of Big Data, and the main challenges introduced by Big Data, will be presented. We will then present the MapReduce programming model as an important programming model for Big Data processing in the Cloud. Hadoop ecosystem and some of major Hadoop features will then be discussed. Finally, we will discuss several approaches and methods used to optimise the performance of Hadoop in the Cloud.
 
Several hand-ons will be provided to study the operation of Hadoop platform along with the implementation of MapReduce applications.
 
This course is a substitute for the Hadoop session, which could not be held during the PRACE Winter School 2015.

Level

basic/intermediate

Language

English

About the tutor(s)

Dr. Shadi Ibrahim is a permanent Inria research scientist within the KerData research team. He obtained his Ph.D. in Computer Science from Huazhong University of Science and Technology in Wuhan of China in 2011. His research interests are in cloud computing, big data management, data-intensive computing, high performance computing, virtualization technology, and file and storage systems. He has published several research papers in recognized big data and cloud computing research journals and conferences, among which, several papers on optimizing and improving Hadoop MapReduce performance in the cloud and one book chapter on MapReduce framework.

Starts
Ends
Europe/Prague
VŠB - Technical University Ostrava, IT4Innovations building
207
Studentská 6231/1B 708 33 Ostrava–Poruba Czech Republic

Practicalities

Prerequisities

This tutorial assumes some experience with using the Linux command-line. Programming skills in Java are a plus for this tutorial. To participate in the exercises a laptop is needed.

NEW: For the practical session environment, we are going to use an Ubuntu virtual machine, to be installed on participants' laptops, preferably before the event. Please download it from
https://transfert.inria.fr/fichiers/22f50e749b1a3df51351fd4c9232e9e1/Ubuntu-without.zip

To run this virtual machine, install  a free VMware workstation player to run this VM. The link:
https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_workstation_player/12_0

Registration

Obligatory registration - registration form here; deadline 11/10/2015 23:30 or exhausted course capacity.

Fees

The event is provided free of charge for the participants.

Capacity

30 attendees

Practicalities

  • NEW: For training environment preparation, please see the Prerequisites section above.
  • See a page on transport and accommodation (in Czech) how to get to the campus of  VŠB - Technical University Ostrava and to the new IT4Innovations building.
  • Participants without the IT4Innovations card please arrive early enough to settle the formalities with obtaining an entry permit.
  • System documentation is available at http://support.it4i.cz/docs.