Public Syllabus

Learning time distribution

Total
Curriculum	Lecture	Practice	Total Weekly	Lecture	Practice
42	28	14	3	2	1
Exam hours
6
Individual Study	Bibliography study	Field study	Homework	Tutoring	Others
102	20	19	58	5	0
Overall
150

Learning outcomes

Knowledge

Understanding how to design decision models starting from data
Knowledge of the concepts related to the data mining process
Knowledge of the main data mining techniques: classification, clustering, regression, association rule mining
Understanding the process of constructing data-driven models, the evaluation of their performance and their limitations

Skills

Ability to use knowledge on data driven models to design, implement and deploy decision assisted systems in different applicative domains
Ability to analyze data and extract knowledge from them
Ability to identify the algorithm/method appropriate to classify and cluster data and to make predictions starting from data
Ability to solve a real-world problem using data mining tools.
Ability to work in a data mining project team

Responsibility

Identification of solutions in an autonomous manner
Understanding all aspects related to data integrity and the risks of inappropriate usage of incomplete and/or biased data

Online platform

https://classroom.google.com/c/ODQ0OTUyOTE0MzM1?cjc=crsuf57u

Course content

Content	Methods	Obs
L1. Introduction in knowledge discovery from data. Basic concepts and main data mining tasks. Data categories and types of attributes.	Discourse, conversation, illustration by examples	2 hours ([1]- ch 1,[2]- ch 1, [3]-ch 2)
L2-3. Data pre-processing. Basic transformations on data (discretization, normalization, standardization). Data cleaning and dealing with missing values. Attribute selection and feature extraction. Filter-based and wrapper-based methods. Dimensionality reduction (Principal Component Analysis).	Discourse, conversation, illustration by examples	4 hours ([1]-ch 2, [8] – ch 3, ch 8)
L4-6. Classification methods. Basic concepts and performance measures (accuracy, precision, recall, specificity, sensitivity, ROC). Training, testing and cross-validation. Instance based classifiers (k Nearest Neighbour). Decision tables and rule-based classifiers. Decision trees (ID3, C45). Probability-based classifiers (Bayesian networks). Neural networks. Support Vector Machines.	Discourse, conversation, illustration by examples	6 hours ([1]-ch 10; [2] –ch 4; [3] – cap 4, sect 5.2,5.3, 5.5, 5.6, [8] – cap 7)
L7-8. Clustering methods. Basic concepts (cluster, centroid). Similarity and dissimilarity measures. Cluster quality measures. Partitional algorithms (kMeans, Fuzzy CMeans). Hierarchical algorithms (agglomerative, divisive). Statistical-based clustering (EM algorithm). Spatial clustering (DBSCAN).	Discourse, conversation, illustration by examples	4 hours ([1] – ch 6, [2]-ch 5, [3] – sect 5.8, [7]- ch 4)
L9. Association rules. Basic concepts (support, confidence, frequent itemsets). Measures for rules quality. Apriori algorithm.	Discourse, conversation, illustration by examples	2 hours ([1]-ch 4; [2]-ch 6, [3]-sect. 5.4)
L10-12. Regression and time series processing. Nonlinear regression models. Regression trees. Radial Basis Networks. Time series analysis (trend analysis, pattern detection, prediction models, anomaly detection).	Discourse, conversation, illustration by examples	6 hours ([1] – ch 8 , ch 11.5, 14, [2] – ch 9, [7]-ch 5)
L13 Ensemble methods. Voting. Bagging (Random forests). Boosting (AdaBoost). Stacking.	Discourse, conversation, illustration by examples	4 hours ([1] – sect 11.8, [5])
L14. Processing unstructured data and massive data. Revision.	Discourse, conversation, illustration by examples	2 ore ([1] –cap 13, 18, [4], [2] – cap 7), [6]

Course bibliography

Charu C. Aggarwal. Data Mining – the textbook, Springer, 2015 M. H. Dunham. Data Mining. Introductory and Advanced Topics, Pearson Education 2003 F. Gorunescu, Data Mining. Concepts, Models and Techniques, Springer, 2011 C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. I.H. Witte, E. Frank, M.A. Hall. Data Mining – Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, 2011 J. Leskovec, A. Rajaraman, J. Ullman – Mining of Massive Datasets, http://infolab.stanford.edu/~ullman/mmds.html, 2020 D. Kroese, Z. Botev, T.Taimre, R. Vaisman, Data Science and Machine Learning: Mathematical and Statistical Methods, CRC Press, 2020 S. Skiena, The Data Science Design Manual, Springer, 2017 D. Zaharie: course materials (Google Classroom - Code crsuf57u)

Seminar content

Content	Methods	Obs
L1. Data sets and repositories. Introduction to Pandas and Scikit-learn packages.	Problem-based approach, dialogue, learning through collaboration	2 hours
L2. Data visualization. Data pre-processing.	Problem-based approach, dialogue, learning through collaboration	2 hours
L3. Data classification using rules, decision trees, probabilistic models, neural networks and SVMs.	Problem-based approach, dialogue, learning through collaboration	2 hours
L4. Data clustering using partitional, hierarchical and density-based algorithms.	Problem-based approach, dialogue, learning through collaboration	2 hours
L5. Extracting association rules. Applications in market basket analysis.	Problem-based approach, dialogue, learning through collaboration	2 hours
L6. Nonlinear regression & Time series analysis. Pre-processing. Analysis. Forecasting models.	Problem-based approach, dialogue, learning through collaboration	2 hours
L7. Ensemble methods. Applications in data classification.	Problem-based approach, dialogue, learning through collaboration	2 hours
Bibliography: •Colecții de date de test: http://archive.ics.uci.edu/ml/datasets, https://www.kaggle.com/ •J. Grus, Data Science from Scratch. First Principles with Python, O’Reilly, 2015 • D. Zaharie - lab support (Google Classroom – code crsuf57u)

Seminar bibliography

The content is in accordance with similar courses provided at other universities, and it covers the basic aspects of data mining techniques in solving problems arising in various domains.

Corroboration

(none)

AI tools guidance

When completing assignments or the final project, the use of generative Artificial Intelligence (AIgen) tools is permitted to identify documentation resources and as an assistant in the coding stage. The use of AIgen must be explicitly specified, along with the prompts used.

Evaluation and delivery

Activity	Criteria	Methods	Percentage
C	Knowledge of basic data mining techniques	Written test	20.0%
C	Correct identification of the appropriate technique to solve a given problem and ability to use and implement data mining algorithms	Presentation of a project	60.0%
S	Usage of software tools for data mining	Lab applications and homework	20.0%

Performance standards

Knowledge of basic concepts in data mining Knowledge of main classification, clustering, regression and forecasting algorithms Ability to identify the appropriate data mining method in solving real-world problems Ability to use software tools for data mining. The final mark is computed as weighted average of the marks corresponding to the components specified at 10.4 and 10.5. The exam is considered passed if the average is at least 5 (each grade used to compute the average should be at least 4). In each session of exams (including re-examinations) the mark is computed using the same rule. The student can be re-examined only for the components for which the current mark is smaller than 5, excepting the cases when the student asks to be re-examined.

Additional info