Data Mining
Public syllabus for 2025-2026
Academic overview
Teaching team
Learning time distribution
| Total | ||||||
|---|---|---|---|---|---|---|
| Curriculum | Lecture | Practice | Total Weekly | Lecture | Practice | |
| 42 | 28 | 14 | 3 | 2 | 1 | |
| Exam hours | ||||||
| 6 | ||||||
| Individual Study | Bibliography study | Field study | Homework | Tutoring | Others | |
| 102 | 20 | 19 | 58 | 5 | 0 | |
| Overall | ||||||
| 150 |
Learning outcomes
Knowledge
- Understanding how to design decision models starting from data
- Knowledge of the concepts related to the data mining process
- Knowledge of the main data mining techniques: classification, clustering, regression, association rule mining
- Understanding the process of constructing data-driven models, the evaluation of their performance and their limitations
Skills
- Ability to use knowledge on data driven models to design, implement and deploy decision assisted systems in different applicative domains
- Ability to analyze data and extract knowledge from them
- Ability to identify the algorithm/method appropriate to classify and cluster data and to make predictions starting from data
- Ability to solve a real-world problem using data mining tools.
- Ability to work in a data mining project team
Responsibility
- Identification of solutions in an autonomous manner
- Understanding all aspects related to data integrity and the risks of inappropriate usage of incomplete and/or biased data
Online platform
Course content
| Content | Methods | Obs |
|---|---|---|
| L1. Introduction in knowledge discovery from data. Basic concepts and main data mining tasks. Data categories and types of attributes. | Discourse, conversation, illustration by examples | 2 hours ([1]- ch 1,[2]- ch 1, [3]-ch 2) |
| L2-3. Data pre-processing. Basic transformations on data (discretization, normalization, standardization). Data cleaning and dealing with missing values. Attribute selection and feature extraction. Filter-based and wrapper-based methods. Dimensionality reduction (Principal Component Analysis). | Discourse, conversation, illustration by examples | 4 hours ([1]-ch 2, [8] – ch 3, ch 8) |
| L4-6. Classification methods. Basic concepts and performance measures (accuracy, precision, recall, specificity, sensitivity, ROC). Training, testing and cross-validation. Instance based classifiers (k Nearest Neighbour). Decision tables and rule-based classifiers. Decision trees (ID3, C45). Probability-based classifiers (Bayesian networks). Neural networks. Support Vector Machines. | Discourse, conversation, illustration by examples | 6 hours ([1]-ch 10; [2] –ch 4; [3] – cap 4, sect 5.2,5.3, 5.5, 5.6, [8] – cap 7) |
| L7-8. Clustering methods. Basic concepts (cluster, centroid). Similarity and dissimilarity measures. Cluster quality measures. Partitional algorithms (kMeans, Fuzzy CMeans). Hierarchical algorithms (agglomerative, divisive). Statistical-based clustering (EM algorithm). Spatial clustering (DBSCAN). | Discourse, conversation, illustration by examples | 4 hours ([1] – ch 6, [2]-ch 5, [3] – sect 5.8, [7]- ch 4) |
| L9. Association rules. Basic concepts (support, confidence, frequent itemsets). Measures for rules quality. Apriori algorithm. | Discourse, conversation, illustration by examples | 2 hours ([1]-ch 4; [2]-ch 6, [3]-sect. 5.4) |
| L10-12. Regression and time series processing. Nonlinear regression models. Regression trees. Radial Basis Networks. Time series analysis (trend analysis, pattern detection, prediction models, anomaly detection). | Discourse, conversation, illustration by examples | 6 hours ([1] – ch 8 , ch 11.5, 14, [2] – ch 9, [7]-ch 5) |
| L13 Ensemble methods. Voting. Bagging (Random forests). Boosting (AdaBoost). Stacking. | Discourse, conversation, illustration by examples | 4 hours ([1] – sect 11.8, [5]) |
| L14. Processing unstructured data and massive data. Revision. | Discourse, conversation, illustration by examples | 2 ore ([1] –cap 13, 18, [4], [2] – cap 7), [6] |
Course bibliography
Charu C. Aggarwal. Data Mining – the textbook, Springer, 2015 M. H. Dunham. Data Mining. Introductory and Advanced Topics, Pearson Education 2003 F. Gorunescu, Data Mining. Concepts, Models and Techniques, Springer, 2011 C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. I.H. Witte, E. Frank, M.A. Hall. Data Mining – Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, 2011 J. Leskovec, A. Rajaraman, J. Ullman – Mining of Massive Datasets, http://infolab.stanford.edu/~ullman/mmds.html, 2020 D. Kroese, Z. Botev, T.Taimre, R. Vaisman, Data Science and Machine Learning: Mathematical and Statistical Methods, CRC Press, 2020 S. Skiena, The Data Science Design Manual, Springer, 2017 D. Zaharie: course materials (Google Classroom - Code crsuf57u)
Seminar content
| Content | Methods | Obs |
|---|---|---|
| L1. Data sets and repositories. Introduction to Pandas and Scikit-learn packages. | Problem-based approach, dialogue, learning through collaboration | 2 hours |
| L2. Data visualization. Data pre-processing. | Problem-based approach, dialogue, learning through collaboration | 2 hours |
| L3. Data classification using rules, decision trees, probabilistic models, neural networks and SVMs. | Problem-based approach, dialogue, learning through collaboration | 2 hours |
| L4. Data clustering using partitional, hierarchical and density-based algorithms. | Problem-based approach, dialogue, learning through collaboration | 2 hours |
| L5. Extracting association rules. Applications in market basket analysis. | Problem-based approach, dialogue, learning through collaboration | 2 hours |
| L6. Nonlinear regression & Time series analysis. Pre-processing. Analysis. Forecasting models. | Problem-based approach, dialogue, learning through collaboration | 2 hours |
| L7. Ensemble methods. Applications in data classification. | Problem-based approach, dialogue, learning through collaboration | 2 hours |
| Bibliography: •Colecții de date de test: http://archive.ics.uci.edu/ml/datasets, https://www.kaggle.com/ •J. Grus, Data Science from Scratch. First Principles with Python, O’Reilly, 2015 • D. Zaharie - lab support (Google Classroom – code crsuf57u) |
Seminar bibliography
The content is in accordance with similar courses provided at other universities, and it covers the basic aspects of data mining techniques in solving problems arising in various domains.
Corroboration
(none)
AI tools guidance
Evaluation and delivery
| Activity | Criteria | Methods | Percentage |
|---|---|---|---|
| C |
|
|
|
| C |
|
|
|
| S |
|
|
|
Performance standards
Knowledge of basic concepts in data mining Knowledge of main classification, clustering, regression and forecasting algorithms Ability to identify the appropriate data mining method in solving real-world problems Ability to use software tools for data mining. The final mark is computed as weighted average of the marks corresponding to the components specified at 10.4 and 10.5. The exam is considered passed if the average is at least 5 (each grade used to compute the average should be at least 4). In each session of exams (including re-examinations) the mark is computed using the same rule. The student can be re-examined only for the components for which the current mark is smaller than 5, excepting the cases when the student asks to be re-examined.
Additional info
(none)