This course focuses on the techniques and tools required to manage, process, and optimize large-scale data workflows. Students will explore advanced data storage concepts, including data lakes, lakehouses, and efficient file formats (e.g., Parquet, ORC). The course delves into both batch and real-time data processing frameworks (e.g., Apache Hadoop, Flink), teaching students to design ETL workflows and handle high-throughput, low-latency data streams. Topics also include scalable database management systems for unstructured and semi-structured data (e.g., HBase, Cassandra) and workflow orchestration tools (e.g., Apache Airflow). Practical exercises emphasize performance tuning and optimization, equipping students with the skills to build robust, efficient, and scalable big data systems for a variety of real-world applications.
Must have knowledge about different data types such as big data, data lake, etc.
Must have knowledge about data streams
Understand the foundational principles of managing large-scale data, including storage, processing, and retrieval.
Learn methods for structuring and optimizing data storage to support analytical and operational needs.
Explore techniques for handling both batch and real-time data processing in distributed systems.
Comprehend the architecture and functionality of scalable database systems for unstructured and semi-structured data.
Understand strategies for optimizing data workflows, focusing on scalability, reliability, and performance.
Design and implement workflows for data extraction, transformation, and storage across diverse datasets.
Query and analyze large-scale data efficiently using optimized storage and processing techniques.
Develop workflows to process high-throughput, low-latency data streams in real time.
Implement scalable and reliable database systems for managing diverse data types.
Monitor, troubleshoot, and optimize data workflows to meet performance and business objectives.
Design and manage comprehensive systems for processing and analyzing large-scale datasets.
Evaluate and select appropriate data management techniques and tools for specific scenarios.
Lead the development of scalable, efficient data pipelines that handle batch and real-time workflows effectively.
Innovate and adapt data management practices to emerging technologies and evolving industry needs.
The instruction will combine lectures, invited talks, assignments, and exercises
Name of exam | Big Data Management |
Type of exam | Written or oral exam |
ECTS | 5 |
Permitted aids | With certain aids:
See exam specification |
Assessment | 7-point grading scale |
Type of grading | Internal examination |
Criteria of assessment | The criteria of assessment are stated in the Examination Policies and Procedures |
Danish title | Big data management |
Module code | ESNCEKK2K2 |
Module type | Course |
Duration | 1 semester |
Semester | Spring
|
ECTS | 5 |
Language of instruction | English |
Empty-place Scheme | Yes |
Location of the lecture | Campus Copenhagen |
Responsible for the module | |
Used in |
Education owner | Master of Science (MSc) in Engineering (Computer Engineering) |
Study Board | Study Board of Electronics and IT |
Department | Department of Electronic Systems |
Faculty | The Technical Faculty of IT and Design |