Big Data Technologies
Public syllabus for 2025-2026
Academic overview
Teaching team
Learning time distribution
| Total | ||||||
|---|---|---|---|---|---|---|
| Curriculum | Lecture | Practice | Total Weekly | Lecture | Practice | |
| 42 | 28 | 14 | 3 | 2 | 1 | |
| Exam hours | ||||||
| 8 | ||||||
| Individual Study | Bibliography study | Field study | Homework | Tutoring | Others | |
| 100 | 30 | 23 | 37 | 10 | 0 | |
| Overall | ||||||
| 150 |
Learning outcomes
Knowledge
- (C1) Understanding of statistical analysis methods and communicating results respecting statistical validation standards
- (C2) Understanding the principles of processing big data with the provision of fast access, elements of consistency, security and confidentiality
Skills
- (A1) Identification of techniques and IT tools suitable for data processing and building models
- (A2) Installation, configuration and use of platforms for processing large volumes of data
- (A3) Building workflows by aggregating already implemented components
Responsibility
- (6a03a0952355ae3a04d2f2fe) Maintaining autonomy, integrity and independence in professional opinions
Online platform
Course content
| Content | Methods | Obs |
|---|---|---|
| L1-2. Big Data and Big Data Analysis. Cloud computing. | Lecturing, conversation, flipped learning. | 4 |
| L3-4. The CRISP-DM data cycle. LSEPI. | idem | 4 |
| L5-6. Loading, storing and cleaning Big Data. | idem + tutorial | 4 |
| L7-8. Handling volume. Distributed data processing. Simple analytics. | idem + tutorial | 4 |
| L9-10. Predictive analytics on time series. | idem + tutorial | 4 |
| L11-12. Data clustering vs. classification. Feature reduction. | idem + tutorial | 4 |
| L13-14. Handling velocity. Stream processing | idem + tutorial | 4 |
Course bibliography
Wess McKinney, Python for Data Analysis 3rd Ed., O’Reilly, 2017. Available online at https://wesmckinney.com/book/
Seminar content
| Content | Methods | Obs |
|---|---|---|
| 1. Big Data real-life examples. Challenges, opportunities, solutions. | Active learning, SCALE-UP. | 2. Associated with L1 |
| 2. Applying CRISP-DM to real-life problems. Identifying LSEPI. | Active learning, SCALE-UP. | 2. Associated with L2 |
| 3. Google Colab. Python overview. Loading, storing, and cleaning data in Python using pandas dataframes. | Active learning | 2. Associated with L3 |
| 4. Coursework revision | Discussion | 2. Project selection and requirements discussion. ONLINE delivery on Google Meet |
| 5. Data analytics with pandas and Pyspark dataframes. | Active learning | 2. Associated with L4 |
| 6. Predictive analytics on time series: regression, Arima, neural networks. | Active learning | 2. Associated with L5 |
| 7. Data clustering with Pyspark and Pandas. Classification with sklearn. | Active learning | 2. Associated with L6 |
| Bibliography: Marc Frincu, Posts on ML for time series, https://saveawatthour.com/index.php/category/data-science/ PySpark Documentation: https://spark.apache.org/docs/latest/api/python/index.html bibliography for the lecture. |
Seminar bibliography
In line with similar courses held at other universities and with the requirements of IT companies. Moreover, at the national level, the introduction of Big Data technologies into the training curricula of computer science graduates is being considered.
Corroboration
(none)
AI tools guidance
Evaluation and delivery
| Activity | Criteria | Methods | Percentage |
|---|---|---|---|
| C |
|
|
|
| S |
|
|
|
Performance standards
Grade 5. Fair presentation with minimal information covering key aspects without providing any details. No critical analysis. Some experiments are presented but they are incomplete and lack a conclusion. The problem is a Big Data problem but the chosen dataset does not demonstrate it. Grade 6-7. Good presentation but the focus is on the descriptive rather than the critical analysis. Experiments are described but results are not discussed in detail focusing mostly on images without an explanation of the plots. Grade 8-9. Very good presentation covering all aspects but not in high detail leading to further details required. Experiments are presented and results are discussed with evidence of critical analysis. Grade 10. Exceptional presentation covering all aspects of the problem, dataset, Big Data challenges with complete solutions and extensive results’ analysis of the experiments. Close to publishing/professional presentation standards.
Additional info
Given the current trends in the use of IAGen, the student assessment will be oral and will include discussions related to code and implementation decisions. Therefore, the students' ability to understand and think critically, not to use IAGen tools, will be monitored.