Recommended prerequisite for participation in
the module
This module builds upon knowledge of algorithms, data structures,
databases, distributed systems, and programming. Moreover, a good
understanding of math principles is recommended to grasp the time
and memory complexity of processing large amounts of data.
Content, progress and pedagogy of the
module
The course teaches a systematic way of approaching problems that
deal with large and/or scattered data. The students will initially
learn about the concepts of big data, data lake, and structured and
unstructured data. Then, the data collection process is introduced,
which is composed of different steps such as Extract, Load, and
Transform. The course will dive into each of these steps, teaching
techniques on how to extract/mine the data from various sources,
how and where to load/store them afterwards, and, finally, how and
where to process/transform them. Furthermore, different data
structures and databases for working with large amounts of data
will be taught. Finally, principles of data visualization will also
be covered in the course.
The course will make use of real-world datasets with practical
examples by using the latest data analysis tools such as Python
libraries.
Learning objectives
Knowledge
- Must have knowledge about different data types such as big
data, data lakes, etc.
- Must have knowledge about data streams
- Must have knowledge about systems and software that can be used
to collect data from various sources
- Must have knowledge about restrictions when accessing online
data, such as API limitations or privacy considerations
- Must have knowledge about different standards for data
formatting and interchanging such as JSON, XML, protocol buffers,
etc.
- Must have knowledge about diverse ways to store structured and
unstructured data
- Must have knowledge about different programming tools to be
used for processing data
- Must have knowledge about different algorithms for analyzing
data
- Must have knowledge about the time and memory complexity of the
algorithms
- Must have knowledge about diverse ways to visualize
data
Skills
- Must be able to design, develop, and deploy a system for data
collection/mining
- Must be able to work with real-time data streams
- Must be able to use online APIs to retrieve data from various
sources
- Must be able to analyze several types of databases for storing
the data
- Must be able to use programming tools and libraries to handle
the data correctly
- Must be able to identify and use the appropriate data
structures for processing the data in an efficient way
- Must be able to use data processing algorithms and analyze
their complexity
- Must be able to scale the system as needed vertically or
horizontally depending on the amount of data and algorithms
used
- Must be able to perform diverse types of data analysis such as
clustering, classification, and prediction, among others
- Must be able to use programming tools to visualize results and
findings from the data via scientific plots
Competences
- Must have the competence to work with real datasets in order to
extract useful information
- Must have the competence to design a scalable mining system, a
reliable data storage platform, an efficient processing tool, and
an insightful visualizationĀ framework
- Must have the competence to apply the knowledge and skills
gained through the course to problems from various domains such as
healthcare, smart city, industry, etc.
- Must have the competence to document the systematic approach
followed during the process in a clear way
Type of instruction
Types of instruction are listed at the start of Ā§17; Structure
and contents of the programme.
Exam
Exams
Name of exam | Data Mining and Analysis |
Type of exam | Written or oral exam |
ECTS | 5 |
Permitted aids | Find the information about the allowed tools on the exam
specification |
Assessment | 7-point grading scale |
Type of grading | Internal examination |
Criteria of assessment | The criteria of assessment are stated in the Examination
Policies and Procedures |