Matière
Management and processing of massive data
Description
Data processing, real-time processing, distributed storage, distributed processing, stream processing
• Introduction and fundamental concepts
• Batch processing and MapReduce
• The Hadoop ecosystem • Real-time processing using Spark
• Distributed storage and processing in NoSQL databases (mongoDB)
• Streaming with Spark
Compétences visées
The main goal of this course is to develop the necessary skills to handle huge volumes and frequencies of data using distributed storage, processing and computing solutions : (1) to understand the challenges related to big data storage and processing, (2) to understand the limitations of traditional (relational) databases systems and explore data-intensive alternatives, (3) to know how MapReduce algorithm works to learn how to use the Apache Hadoop ecosystem, (4) to learn basic distributed computing techniques using Apache Spark, (5) to learn how to handle streams of data using Spark (6) to discover real-time processing and storage relying on NoSQL technologies (using MongoDB)
3. Organisation
Bibliographie
Big Data & Streaming : Le traitement streaming & temps réel des données en Big
Data, Juvénal Chokogoue, Juvénal & Associés.
• Hadoop Devenez Opérationnel dans le monde du Big Data, Juvénal Chokogoue,
Juvénal & Associés.
• Les bases de données NoSQL et le Big Data, Rudi Bruchez, Eyrolles