Disclaimer. This is an English translation of the module. In case of discrepancy between the translation and the Danish version, the Danish version of the module is valid.
Data preparation, such as data quality, data cleaning, outlier detection, data generation and augmentation, data transformation, and data wrangling.
Data integration, such as schema matching, schema mapping, schema integration, global and local as view, wrappers/mediators, advanced ETL, change detection and management, and metadata and master data management.
DataOps, such as data flow specification, testing, and management, versioning, deployment, robustness/resilience, data auditing and control, continuous integration, testing, and deployment and performance tuning on a distributed infrastructure.
The concepts above are illustrated by advanced realistic use cases, typically based on open data.
Must be able to integrate data from multiple data sources from separate providers into an integrated and consistent representation
Must be able to cleanse and prepare realistic Big Data sets with relevant tools while managing volume, velocity, variety, and veracity
Must be able to apply DataOps using best practices on realistic data flows running on a distributed infrastructure.
The student is able to pick relevant techniques for data integration, data preparation, and DataOps for a given realistic Big Data scenario.
The student is able to understand and reason about the results of Big Data processes.
|Name of exam
|Big Data Processes
|Type of exam
Written or oral exam
|7-point grading scale
|Type of grading
|Criteria of assessment
|The criteria of assessment are stated in the Examination Policies and Procedures
|Big data processer
|Language of instruction
|Location of the lecture
|Responsible for the module
|Study Board of Computer Science
|Department of Computer Science
|The Technical Faculty of IT and Design