Skip to content

Big Data Technologies

Public syllabus for 2025-2026

Academic overview

Programme
CS
Period
Year 1, Semester 2
Credits
5
Weeks
14

Curriculum placement

Appears in study plans

Teaching team

Course coordinator
Seminar coordinators
Elena Flondor

Learning time distribution

Total
Curriculum Lecture Practice Total Weekly Lecture Practice
42 14 28 3 1 2
Exam hours
8
Individual Study Bibliography study Field study Homework Tutoring Others
75 24 17 27 7 0
Overall
125

Learning outcomes

Knowledge

  • (C1) Understanding of statistical analysis methods and communicating results respecting statistical validation standards
  • (C2) Understanding the principles of processing Big Data with the provision of fast access, elements of consistency, security and confidentiality

Skills

  • (A1) Identification of techniques and IT tools suitable for data processing and building models
  • (A2) Installation, configuration and use of platforms for processing large volumes of data
  • (A3) Building workflows by aggregating already implemented components

Responsibility

  • (6a03a0952355ae3a04d2f2fe) Maintaining autonomy, integrity and independence in professional opinions

Online platform

Google classroom

Course content

Content Methods Obs
L1. Big Data and Big Data Analysis. Cloud computing. Lecturing, conversation, flipped learning. 2
L2. The CRISP-DM data cycle. LSEPI. idem 2
L3. Loading, storing and cleaning Big Data. idem + tutorial 2
L4. Handling volume. Distributed data processing. Simple analytics. idem + tutorial 2
L5. Predictive analytics on time series. idem + tutorial 2
L6. Data clustering vs. classification. Feature reduction. idem + tutorial 2
L7. Handling velocity. Stream processing idem + tutorial 2

Course bibliography

Wess McKinney, Python for Data Analysis 3rd Ed., O’Reilly, 2017. Available online at https://wesmckinney.com/book/

Seminar content

Content Methods Obs
1-2. Big Data real-life examples. Challenges, opportunities, solutions. Active learning, SCALE-UP. 4. Associated with L1
3-4. Applying CRISP-DM to real-life problems. Identifying LSEPI. Active learning, SCALE-UP. 4. Associated with L2
5-6. Google Colab. Python overview. Loading, storing, and cleaning data in Python using pandas dataframes. Active learning 4. Associated with L3
7-8. Coursework revision Discussion 4. Project selection and requirements discussion. ONLINE delivery on Google Meet
9-10. Data analytics with pandas and Pyspark dataframes. Active learning 4. Associated with L4
11-12. Predictive analytics on time series: regression, Arima, neural networks. Active learning 4. Associated with L5
13-14. Data clustering with Pyspark and Pandas. Classification with sklearn. Active learning 4. Associated with L6
Bibliography: Marc Frincu, Posts on ML for time series, https://saveawatthour.com/index.php/category/data-science/ PySpark Documentation: https://spark.apache.org/docs/latest/api/python/index.html bibliography for the lecture.

Seminar bibliography

In line with similar courses held at other universities and with the requirements of IT companies. Moreover, at the national level, the introduction of Big Data technologies into the training curricula of computer science graduates is being considered.

Corroboration

(none)

AI tools guidance

Use of GenAI is allowed in both code generation and presentation preparation.

Evaluation and delivery

Activity Criteria Methods Percentage
C
  • Project based evaluation
  • Poster analysis
  • 40.0%
S
  • Project based evaluation
  • Oral presentation
  • 60.0%

Performance standards

Grade 5. Fair presentation with minimal information covering key aspects without providing any details. No critical analysis. Some experiments are presented but they are incomplete and lack a conclusion. The problem is a Big Data problem but the chosen dataset does not demonstrate it. Grade 6-7. Good presentation but the focus is on the descriptive rather than the critical analysis. Experiments are described but results are not discussed in detail focusing mostly on images without an explanation of the plots. Grade 8-9. Very good presentation covering all aspects but not in high detail leading to further details required. Experiments are presented and results are discussed with evidence of critical analysis. Grade 10. Exceptional presentation covering all aspects of the problem, dataset, Big Data challenges with complete solutions and extensive results’ analysis of the experiments. Close to publishing/professional presentation standards.

Additional info

Given the current trends in the use of IAGen, the student assessment will be oral and will include discussions related to code and implementation decisions. Therefore, the students' ability to understand and think critically, not to use IAGen tools, will be monitored.