Big Data Processes

2024/2025

Content, progress and pedagogy of the module

Disclaimer. This is an English translation of the module. In case of discrepancy between the translation and the Danish version, the Danish version of the module is valid.

Learning objectives

Knowledge

  • Data preparation, such as data quality, data cleaning, outlier detection, data generation and augmentation, data transformation, and data wrangling.

  • Data integration, such as schema matching, schema mapping, schema integration, global and local as view, wrappers/mediators, advanced ETL, change detection and management, and metadata and master data management.  

  • DataOps, such as data flow specification, testing, and management, versioning, deployment, robustness/resilience, data auditing and control, continuous integration, testing, and deployment and performance tuning on a distributed infrastructure. 

  • The concepts above are illustrated by advanced realistic use cases, typically based on open data.

Skills

  • Must be able to integrate data from multiple data sources from separate providers into an integrated and consistent representation

  • Must be able to cleanse and prepare realistic Big Data sets with relevant tools while managing volume, velocity, variety, and veracity 

  • Must be able to apply DataOps using best practices on realistic data flows running on a distributed infrastructure.

Competences

  • The student is able to pick relevant techniques for data integration, data preparation, and DataOps for a given realistic Big Data scenario.

  • The student is able to understand and reason about the results of Big Data processes.

Exam

Exams

Name of examBig Data Processes
Type of exam
Written or oral exam
ECTS5
Assessment7-point grading scale
Type of gradingInternal examination
Criteria of assessmentThe criteria of assessment are stated in the Examination Policies and Procedures

Facts about the module

Danish titleBig data processer
Module codeDSNDVK103
Module typeCourse
Duration1 semester
SemesterSpring
ECTS5
Language of instructionDanish
Empty-place SchemeYes
Location of the lectureCampus Aalborg
Responsible for the module

Organisation

Education ownerMaster of Science (MSc) in Data Science and Machine Learning
Study BoardStudy Board of Computer Science
DepartmentDepartment of Computer Science
FacultyThe Technical Faculty of IT and Design