Skip to content

Data Warehouses

Public syllabus for 2025-2026

Academic overview

Programme
BD
Period
Year 1, Semester 2
Credits
5
Weeks
14

Curriculum placement

Appears in study plans

Teaching team

Course coordinator
Seminar coordinators
Daniel Pop

Learning time distribution

Total
Curriculum Lecture Practice Total Weekly Lecture Practice
42 28 14 3 2 1
Exam hours
4
Individual Study Bibliography study Field study Homework Tutoring Others
79 20 10 45 4 0
Overall
125

Learning outcomes

Knowledge

  • (6a03a0952355ae3a04d2f310) Knowledge of statistical methods specific to different types of processing and understanding of how machine learning algorithms can be used;
  • (6a03a0952355ae3a04d2f311) Understanding how large volumes of data can be processed in a distributed manner and the principles underlying high-performance computing;
  • (6a03a0952355ae3a04d2f312) Understanding how platforms specific to the processing of large volumes of data work;

Skills

  • (6a03a0952355ae3a04d2f315) Using concepts from computer science, mathematics and statistics in defining models and designing data analysis strategies and interpreting results;
  • (6a03a0952355ae3a04d2f316) Identifying statistical and machine learning techniques as well as appropriate IT tools for data processing and building decision models;
  • (6a03a0952355ae3a04d2f317) Designing, implementing and testing software modules suitable for processing and analyzing large volumes of data;
  • (6a03a0952355ae3a04d2f318) Using distributed parallel processing principles in designing scalable applications;
  • (6a03a0952355ae3a04d2f319) Using knowledge regarding the construction of data-driven models to develop decision support systems specific to different application domains.

Responsibility

  • (6a03a0952355ae3a04d2f31a) Responsibility to act in accordance with the interest of users;

Online platform

Google Classroom

Course content

Content Methods Obs
Introduction to Data Warehousing. Course and lab description Lecture [1]
Data Modelling for NoSQL systems. Conceptual modelling. Database Model Lecture. Exercise in small groups [1]
Columnar databases. Theoretical considerations. Case study Apache Cassandra Lecture [3] Chapter 10
Cassandra Query Language Lecture [4]
Introduction to NoSQL Database Management Systems. Data Models. Distribution Models Lecture [3] Chapters 1 - 4
Consistency and Versioning in Distributed Database Systems Lecture [3] Chapters 5 - 6
Key-value databases. Case study Amazon Dynamo. Document databases. Case study Mongo DB Lecture [3] Chapter 8, 9
Graph oriented storage. Case study Neo 4J. Vector databases Lecture [3] Chapter 11
Data Warehouse Architecture and Technologies Lecture [2] Chapter 31
Design for Data Warehouses. Part 1 Lecture [2] Chapter 32
Design for Data Warehouses. Part 2 Lecture [2] Chapter 32
Cloud based Data Warehousing solutions Lecture [1]
Reporting and analytics tools Lecture [1]
Design considerations for Big Data applications Lecture [3] Chapter 12. 13, 14, 15

Course bibliography

[1] Classroom materials [2] T. Connolly, C. Begg – Database Systems. A Practical Approach to Design, Implementation and Management. 6th Edition. Pearson, 2014 [3] Pramod J Sadalage, Martin Fowler – NoSQL Distilled. A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, 2013. https://martinfowler.com/books/nosql.html [4] Apache Cassandra Documentation: https://cassandra.apache.org/doc/latest/

Seminar content

Content Methods Obs
1. Project setup and connecting Datastax AstraDB Students will have to configure the project in an IDE of choice (IntelliJ is recommended) This is a project based, individual work that will span over 7 laboratories. The objective is to develop a distributed, scalable data platform for management of financial data.The system will integrate data from different financial data providers.
2. Design and implementation of Data Access Layer Self-conducted work. Tutoring Idem
3. Data Ingestion Process Self-conducted work Idem
4 - 5. Consuming Data from Data Warehouse. Design and implementation of a REST API Self-conducted work Idem
6 - 7. Data Analytics with Apache Spark Self-conducted work Idem
Bibliography: [1] Lab notes: https://drive.google.com/drive/folders/19YziQ2Iow8BDZV52a1NQhTibX-BGQp5t?usp=sharing [2] Apache Cassandra Documentation: Apache Cassandra Documentation: https://cassandra.apache.org/doc/latest/

Seminar bibliography

Although the relational database approach is the prevalent, de-facto approach, the unstructured / semi-structured data explosion in the past 25 years calls for new ways to store, retrieve and process large amounts of data. Thus, the local, national and international workforce market is increasingly searching for highly skilled personnel to develop, administer or configure NoSQL database management systems.

Corroboration

(none)

AI tools guidance

Students are allowed to use AI / GenAI tools to implement the project or to learn/research/document about the field of study.AI / GenAI tools are not allowed in the examination.

Evaluation and delivery

Activity Criteria Methods Percentage
C
  • Final test will assess the students’ level with respect to the following course objectives:
  • - Understand the data warehouse paradigm: concepts, tools, methods and processes.
  • - Good understanding of relational and aggregate data models.
  • - Understand the challenges and solutions of distributed storage systems.
  • - Design of simple business domains using a No SQL approach.
  • Quiz at each lecture
  • Final test (Exam)
  • 20.0%
  • 40.0%
S
  • Assess the following elements:
  • - Design and implement a distributed data platform for storage, retrieval and analysis (using a Machine Learning library) of large amounts of semi- and un-structured data using Java/Python technology stack and Apache Cassandra
  • - Use datastore-specific query language to manipulate schema and data, e.g. Apache Cassandra Query Language
  • - Programmatically use NoSQL database systems (Apache Cassandra) from Java/Python programming environments
  • Multiple-choice test in exam session
  • 40.0%

Performance standards

Minimal knowledge for passing this subject ------------------------------------------------ - Basic knowledge of relational and aggregate data model, - Understand the challenges of NoSQL data systems, - Basic programmatically and ad-hoc query Apache Cassandra or MongoDB. Students Evaluation Details ------------------------------ The final grade is computed as the weighted average of grades obtained for components 10.4 and 10.5. The exam is passed if each individual grade obtained at components 10.4 and 10.5 (i.e. both lecture and lab evaluations) is greater or equal to 5. The components 10.4 and 10.5 are reported next year in case a student needs re-examination. The lecture score is composed of a final test grade weighted with the results obtained on each lecture quiz. Quiz scores cannot be changed afterwards and are computed as the ratio of your quiz points over the total available quiz points from all lectures. If the student wants to increase a passing grade, then he/she should retake the final test or project assessment. Final remark: All students all welcome to tutoring meetings as scheduled by the department. Attendance Rules -------------------- For all students, - lectures, minimum 3 - laboratories, minimum 2 If the attendance rules are not fulfilled, then the student needs to re-contract the subject next year.

Additional info

(none)