Data Warehouses
Public syllabus for 2025-2026
Academic overview
Teaching team
Learning time distribution
| Total | ||||||
|---|---|---|---|---|---|---|
| Curriculum | Lecture | Practice | Total Weekly | Lecture | Practice | |
| 42 | 28 | 14 | 3 | 2 | 1 | |
| Exam hours | ||||||
| 4 | ||||||
| Individual Study | Bibliography study | Field study | Homework | Tutoring | Others | |
| 79 | 20 | 10 | 45 | 4 | 0 | |
| Overall | ||||||
| 125 |
Learning outcomes
Knowledge
- (6a03a0952355ae3a04d2f310) Knowledge of statistical methods specific to different types of processing and understanding of how machine learning algorithms can be used;
- (6a03a0952355ae3a04d2f311) Understanding how large volumes of data can be processed in a distributed manner and the principles underlying high-performance computing;
- (6a03a0952355ae3a04d2f312) Understanding how platforms specific to the processing of large volumes of data work;
Skills
- (6a03a0952355ae3a04d2f315) Using concepts from computer science, mathematics and statistics in defining models and designing data analysis strategies and interpreting results;
- (6a03a0952355ae3a04d2f316) Identifying statistical and machine learning techniques as well as appropriate IT tools for data processing and building decision models;
- (6a03a0952355ae3a04d2f317) Designing, implementing and testing software modules suitable for processing and analyzing large volumes of data;
- (6a03a0952355ae3a04d2f318) Using distributed parallel processing principles in designing scalable applications;
- (6a03a0952355ae3a04d2f319) Using knowledge regarding the construction of data-driven models to develop decision support systems specific to different application domains.
Responsibility
- (6a03a0952355ae3a04d2f31a) Responsibility to act in accordance with the interest of users;
Online platform
Course content
| Content | Methods | Obs |
|---|---|---|
| Introduction to Data Warehousing. Course and lab description | Lecture | [1] |
| Data Modelling for NoSQL systems. Conceptual modelling. Database Model | Lecture. Exercise in small groups | [1] |
| Columnar databases. Theoretical considerations. Case study Apache Cassandra | Lecture | [3] Chapter 10 |
| Cassandra Query Language | Lecture | [4] |
| Introduction to NoSQL Database Management Systems. Data Models. Distribution Models | Lecture | [3] Chapters 1 - 4 |
| Consistency and Versioning in Distributed Database Systems | Lecture | [3] Chapters 5 - 6 |
| Key-value databases. Case study Amazon Dynamo. Document databases. Case study Mongo DB | Lecture | [3] Chapter 8, 9 |
| Graph oriented storage. Case study Neo 4J. Vector databases | Lecture | [3] Chapter 11 |
| Data Warehouse Architecture and Technologies | Lecture | [2] Chapter 31 |
| Design for Data Warehouses. Part 1 | Lecture | [2] Chapter 32 |
| Design for Data Warehouses. Part 2 | Lecture | [2] Chapter 32 |
| Cloud based Data Warehousing solutions | Lecture | [1] |
| Reporting and analytics tools | Lecture | [1] |
| Design considerations for Big Data applications | Lecture | [3] Chapter 12. 13, 14, 15 |
Course bibliography
[1] Classroom materials [2] T. Connolly, C. Begg – Database Systems. A Practical Approach to Design, Implementation and Management. 6th Edition. Pearson, 2014 [3] Pramod J Sadalage, Martin Fowler – NoSQL Distilled. A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, 2013. https://martinfowler.com/books/nosql.html [4] Apache Cassandra Documentation: https://cassandra.apache.org/doc/latest/
Seminar content
| Content | Methods | Obs |
|---|---|---|
| 1. Project setup and connecting Datastax AstraDB | Students will have to configure the project in an IDE of choice (IntelliJ is recommended) | This is a project based, individual work that will span over 7 laboratories. The objective is to develop a distributed, scalable data platform for management of financial data.The system will integrate data from different financial data providers. |
| 2. Design and implementation of Data Access Layer | Self-conducted work. Tutoring | Idem |
| 3. Data Ingestion Process | Self-conducted work | Idem |
| 4 - 5. Consuming Data from Data Warehouse. Design and implementation of a REST API | Self-conducted work | Idem |
| 6 - 7. Data Analytics with Apache Spark | Self-conducted work | Idem |
| Bibliography: [1] Lab notes: https://drive.google.com/drive/folders/19YziQ2Iow8BDZV52a1NQhTibX-BGQp5t?usp=sharing [2] Apache Cassandra Documentation: Apache Cassandra Documentation: https://cassandra.apache.org/doc/latest/ |
Seminar bibliography
Although the relational database approach is the prevalent, de-facto approach, the unstructured / semi-structured data explosion in the past 25 years calls for new ways to store, retrieve and process large amounts of data. Thus, the local, national and international workforce market is increasingly searching for highly skilled personnel to develop, administer or configure NoSQL database management systems.
Corroboration
(none)
AI tools guidance
Evaluation and delivery
| Activity | Criteria | Methods | Percentage |
|---|---|---|---|
| C |
|
|
|
| S |
|
|
|
Performance standards
Minimal knowledge for passing this subject ------------------------------------------------ - Basic knowledge of relational and aggregate data model, - Understand the challenges of NoSQL data systems, - Basic programmatically and ad-hoc query Apache Cassandra or MongoDB. Students Evaluation Details ------------------------------ The final grade is computed as the weighted average of grades obtained for components 10.4 and 10.5. The exam is passed if each individual grade obtained at components 10.4 and 10.5 (i.e. both lecture and lab evaluations) is greater or equal to 5. The components 10.4 and 10.5 are reported next year in case a student needs re-examination. The lecture score is composed of a final test grade weighted with the results obtained on each lecture quiz. Quiz scores cannot be changed afterwards and are computed as the ratio of your quiz points over the total available quiz points from all lectures. If the student wants to increase a passing grade, then he/she should retake the final test or project assessment. Final remark: All students all welcome to tutoring meetings as scheduled by the department. Attendance Rules -------------------- For all students, - lectures, minimum 3 - laboratories, minimum 2 If the attendance rules are not fulfilled, then the student needs to re-contract the subject next year.
Additional info
(none)