Public Syllabus

Learning time distribution

Total
Curriculum	Lecture	Practice	Total Weekly	Lecture	Practice
42	28	14	3	2	1
Exam hours
4
Individual Study	Bibliography study	Field study	Homework	Tutoring	Others
79	20	10	45	4	0
Overall
125

Learning outcomes

Knowledge

(6a03a0952355ae3a04d2f310) Knowledge of statistical methods specific to different types of processing and understanding of how machine learning algorithms can be used;
(6a03a0952355ae3a04d2f311) Understanding how large volumes of data can be processed in a distributed manner and the principles underlying high-performance computing;
(6a03a0952355ae3a04d2f312) Understanding how platforms specific to the processing of large volumes of data work;

Skills

(6a03a0952355ae3a04d2f315) Using concepts from computer science, mathematics and statistics in defining models and designing data analysis strategies and interpreting results;
(6a03a0952355ae3a04d2f316) Identifying statistical and machine learning techniques as well as appropriate IT tools for data processing and building decision models;
(6a03a0952355ae3a04d2f317) Designing, implementing and testing software modules suitable for processing and analyzing large volumes of data;
(6a03a0952355ae3a04d2f318) Using distributed parallel processing principles in designing scalable applications;
(6a03a0952355ae3a04d2f319) Using knowledge regarding the construction of data-driven models to develop decision support systems specific to different application domains.

Responsibility

(6a03a0952355ae3a04d2f31a) Responsibility to act in accordance with the interest of users;

Online platform

Google Classroom

Course content

Content	Methods	Obs
Introduction to Data Warehousing. Course and lab description	Lecture	[1]
Data Modelling for NoSQL systems. Conceptual modelling. Database Model	Lecture. Exercise in small groups	[1]
Columnar databases. Theoretical considerations. Case study Apache Cassandra	Lecture	[3] Chapter 10
Cassandra Query Language	Lecture	[4]
Introduction to NoSQL Database Management Systems. Data Models. Distribution Models	Lecture	[3] Chapters 1 - 4
Consistency and Versioning in Distributed Database Systems	Lecture	[3] Chapters 5 - 6
Key-value databases. Case study Amazon Dynamo. Document databases. Case study Mongo DB	Lecture	[3] Chapter 8, 9
Graph oriented storage. Case study Neo 4J. Vector databases	Lecture	[3] Chapter 11
Data Warehouse Architecture and Technologies	Lecture	[2] Chapter 31
Design for Data Warehouses. Part 1	Lecture	[2] Chapter 32
Design for Data Warehouses. Part 2	Lecture	[2] Chapter 32
Cloud based Data Warehousing solutions	Lecture	[1]
Reporting and analytics tools	Lecture	[1]
Design considerations for Big Data applications	Lecture	[3] Chapter 12. 13, 14, 15

Course bibliography

[1] Classroom materials [2] T. Connolly, C. Begg – Database Systems. A Practical Approach to Design, Implementation and Management. 6th Edition. Pearson, 2014 [3] Pramod J Sadalage, Martin Fowler – NoSQL Distilled. A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, 2013. https://martinfowler.com/books/nosql.html [4] Apache Cassandra Documentation: https://cassandra.apache.org/doc/latest/

Seminar content

Content	Methods	Obs
1. Project setup and connecting Datastax AstraDB	Students will have to configure the project in an IDE of choice (IntelliJ is recommended)	This is a project based, individual work that will span over 7 laboratories. The objective is to develop a distributed, scalable data platform for management of financial data.The system will integrate data from different financial data providers.
2. Design and implementation of Data Access Layer	Self-conducted work. Tutoring	Idem
3. Data Ingestion Process	Self-conducted work	Idem
4 - 5. Consuming Data from Data Warehouse. Design and implementation of a REST API	Self-conducted work	Idem
6 - 7. Data Analytics with Apache Spark	Self-conducted work	Idem
Bibliography: [1] Lab notes: https://drive.google.com/drive/folders/19YziQ2Iow8BDZV52a1NQhTibX-BGQp5t?usp=sharing [2] Apache Cassandra Documentation: Apache Cassandra Documentation: https://cassandra.apache.org/doc/latest/

Seminar bibliography

Although the relational database approach is the prevalent, de-facto approach, the unstructured / semi-structured data explosion in the past 25 years calls for new ways to store, retrieve and process large amounts of data. Thus, the local, national and international workforce market is increasingly searching for highly skilled personnel to develop, administer or configure NoSQL database management systems.

Corroboration

(none)

AI tools guidance

Students are allowed to use AI / GenAI tools to implement the project or to learn/research/document about the field of study.AI / GenAI tools are not allowed in the examination.

Evaluation and delivery

Activity	Criteria	Methods	Percentage
C	Final test will assess the students’ level with respect to the following course objectives: - Understand the data warehouse paradigm: concepts, tools, methods and processes. - Good understanding of relational and aggregate data models. - Understand the challenges and solutions of distributed storage systems. - Design of simple business domains using a No SQL approach.	Quiz at each lecture Final test (Exam)	20.0% 40.0%
S	Assess the following elements: - Design and implement a distributed data platform for storage, retrieval and analysis (using a Machine Learning library) of large amounts of semi- and un-structured data using Java/Python technology stack and Apache Cassandra - Use datastore-specific query language to manipulate schema and data, e.g. Apache Cassandra Query Language - Programmatically use NoSQL database systems (Apache Cassandra) from Java/Python programming environments	Multiple-choice test in exam session	40.0%

Performance standards

Minimal knowledge for passing this subject ------------------------------------------------ - Basic knowledge of relational and aggregate data model, - Understand the challenges of NoSQL data systems, - Basic programmatically and ad-hoc query Apache Cassandra or MongoDB. Students Evaluation Details ------------------------------ The final grade is computed as the weighted average of grades obtained for components 10.4 and 10.5. The exam is passed if each individual grade obtained at components 10.4 and 10.5 (i.e. both lecture and lab evaluations) is greater or equal to 5. The components 10.4 and 10.5 are reported next year in case a student needs re-examination. The lecture score is composed of a final test grade weighted with the results obtained on each lecture quiz. Quiz scores cannot be changed afterwards and are computed as the ratio of your quiz points over the total available quiz points from all lectures. If the student wants to increase a passing grade, then he/she should retake the final test or project assessment. Final remark: All students all welcome to tutoring meetings as scheduled by the department. Attendance Rules -------------------- For all students, - lectures, minimum 3 - laboratories, minimum 2 If the attendance rules are not fulfilled, then the student needs to re-contract the subject next year.

Additional info