Data science

Composante	Faculté des sciences économiques et de gestion (FSEG)
Langue(s) d'enseignement	Anglais
Niveau de l'enseignement (pour les langues uniquement)	B2 - Avancé - Utilisateur indépendant
Heures d'enseignement	CM : 50 h
Période	Semestre 3
Campus	Campus Esplanade
Ouvert aux étudiants d'autres disciplines
Ouvert aux étudiants en échange	6 ECTS suggéré(s)
Code Apogée	EG35KU41

Description

1) The Data science part is structured in four macro blocks:

1. The art of learning from data. What is learning; supervised learning and function approximation; bias-variance trade-off; model accuracy, assessment and selection; cross validation.

2. Regression methods and regularization. Least squares revisited; model selection and regularization; subset selection methods; shrinkage methods (ridge, LASSO, LARS, elastic nets); dimension reduction methods (PCA, PLS).

3. Classification. Linear regression on indicator matrices; logistic regression; linear and quadratic discriminant analysis (LDA and QDA); hyperplane separation theorems; optimal separating hyperplane; “kernel trick”; Support Vector Machines (SVM).

4. Tree-based methods. Stratified feature space; tree-building process; recursive binary splitting and pruning.

2) The Deep Learning part is structured in four macro blocks:

1. Machine learning paradigm; overfitting and underfitting; bias and variance; gradient-based learning; motivations for deep models; historical trends in artificial neural networks research.

2. Architecture design for deep feedforward neural networks; hidden layers, hidden and output units; universal approximation theorem; computational graphs language; back-propagation algorithm

3. Surrogate loss functions; batch/minibatch deterministic and stochastic methods; main challenges in neural network optimization (ill-conditioning, local minima, flat regions, cliff, etc.); stochastic gradient descent; momentum; Nesterov momentum; parameters initialization strategies; algorithms with adaptive learning rates; supervised pre-training

4. Regularization strategies for deep models; parameter norm penalties; data augmentation and sparse representation; early stopping algorithm; Ensemble methods; dropout; adversarial training.

5. Introduction to convolutional neural networks (CNNs) and recurrent neural networks (RNNs)

Compétences visées

Upon completion of this course, students will have solid theoretical knowledge on the most effective (supervised) machine learning techniques, and gain practice implementing them.

Select the appropriate method based on the scope and available data.
Implement a range of regression and classification methods.
Develop predicts tools for economics and business problems.
Source, store and pre-process heterogeneous (large scale) data.
Choose, design and train supervised machine learning techniques.
Coding in R and Python.
Speak in public to present an empirical project.

Modalités d'organisation et de suivi

Supervised learning: Oral lectures (in English) [22h] and computer exercises with Python [8h]
Deep learning: Oral lectures (in English) [14h] and computer exercises with Python [6h]

Discipline(s)

Sciences économiques

Bibliographie

Part 1:

- Hastie T., R. Tibshirani, J. Friedman, 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer.

- James G., D. Witten, T. Hastie, R. Tibshirani, 2013, An Introduction to Statistical Learning with Applications in R, Springer.

Part 2 :

- Goodfellow, I., Y. Bengio, & A. Courville, 2016, Deep learning. MIT press.

- Chollet, F., & J. J. Allaire, 2017, Deep Learning with R. Manning Publications.

Chollet, F., 2017, Deep Learning with Python. Manning Publications.

Contact

Responsable(s) de l'enseignement

Stefano Bianchini : s.bianchini@unistra.fr