Matière

Management and processing of massive data

Composante	UFR de mathématique et d'informatique
Langue(s) d'enseignement	Anglais
Heures d'enseignement	CM : 12 h TD : 9 h TP : 9 h
Ouvert aux étudiants d'autres disciplines
Ouvert aux étudiants en échange

Description

Data processing, real-time processing, distributed storage, distributed processing, stream processing

• Introduction and fundamental concepts

• Batch processing and MapReduce

• The Hadoop ecosystem • Real-time processing using Spark

• Distributed storage and processing in NoSQL databases (mongoDB)

• Streaming with Spark

Compétences visées

The main goal of this course is to develop the necessary skills to handle huge volumes and frequencies of data using distributed storage, processing and computing solutions : (1) to understand the challenges related to big data storage and processing, (2) to understand the limitations of traditional (relational) databases systems and explore data-intensive alternatives, (3) to know how MapReduce algorithm works to learn how to use the Apache Hadoop ecosystem, (4) to learn basic distributed computing techniques using Apache Spark, (5) to learn how to handle streams of data using Spark (6) to discover real-time processing and storage relying on NoSQL technologies (using MongoDB)
3. Organisation

Bibliographie

Big Data & Streaming : Le traitement streaming & temps réel des données en Big
Data, Juvénal Chokogoue, Juvénal & Associés.
• Hadoop Devenez Opérationnel dans le monde du Big Data, Juvénal Chokogoue,
Juvénal & Associés.
• Les bases de données NoSQL et le Big Data, Rudi Bruchez, Eyrolles

Contact

Responsable(s) de l'enseignement

Maxime Gueriau : gueriau@unistra.fr