Data Mining and Analysis

2024/2025

Recommended prerequisite for participation in the module

This module builds upon knowledge of algorithms, data structures, databases, distributed systems, and programming. Moreover, a good understanding of math principles is recommended to grasp the time and memory complexity of processing large amounts of data.

Content, progress and pedagogy of the module

The course teaches a systematic way of approaching problems that deal with large and/or scattered data. The students will initially learn about the concepts of big data, data lake, and structured and unstructured data. Then, the data collection process is introduced, which is composed of different steps such as Extract, Load, and Transform. The course will dive into each of these steps, teaching techniques on how to extract/mine the data from various sources, how and where to load/store them afterwards, and, finally, how and where to process/transform them. Furthermore, different data structures and databases for working with large amounts of data will be taught. Finally, principles of data visualization will also be covered in the course.
The course will make use of real-world datasets with practical examples by using the latest data analysis tools such as Python libraries.

Learning objectives

Knowledge

  • Must have knowledge about different data types such as big data, data lakes, etc.
  • Must have knowledge about data streams
  • Must have knowledge about systems and software that can be used to collect data from various sources
  • Must have knowledge about restrictions when accessing online data, such as API limitations or privacy considerations
  • Must have knowledge about different standards for data formatting and interchanging such as JSON, XML, protocol buffers, etc.
  • Must have knowledge about diverse ways to store structured and unstructured data
  • Must have knowledge about different programming tools to be used for processing data
  • Must have knowledge about different algorithms for analyzing data
  • Must have knowledge about the time and memory complexity of the algorithms
  • Must have knowledge about diverse ways to visualize data

Skills

  • Must be able to design, develop, and deploy a system for data collection/mining
  • Must be able to work with real-time data streams
  • Must be able to use online APIs to retrieve data from various sources
  • Must be able to analyze several types of databases for storing the data
  • Must be able to use programming tools and libraries to handle the data correctly
  • Must be able to identify and use the appropriate data structures for processing the data in an efficient way
  • Must be able to use data processing algorithms and analyze their complexity
  • Must be able to scale the system as needed vertically or horizontally depending on the amount of data and algorithms used
  • Must be able to perform diverse types of data analysis such as clustering, classification, and prediction, among others
  • Must be able to use programming tools to visualize results and findings from the data via scientific plots

Competences

  • Must have the competence to work with real datasets in order to extract useful information
  • Must have the competence to design a scalable mining system, a reliable data storage platform, an efficient processing tool, and an insightful visualizationĀ  framework
  • Must have the competence to apply the knowledge and skills gained through the course to problems from various domains such as healthcare, smart city, industry, etc.
  • Must have the competence to document the systematic approach followed during the process in a clear way

Type of instruction

Types of instruction are listed at the start of Ā§17; Structure and contents of the programme.

Exam

Exams

Name of examData Mining and Analysis
Type of exam
Written or oral exam
ECTS5
Permitted aidsFind the information about the allowed tools on the exam specification
Assessment7-point grading scale
Type of gradingInternal examination
Criteria of assessmentThe criteria of assessment are stated in the Examination Policies and Procedures

Facts about the module

Danish titleData mining og analyse
Module codeESNICTEK3K10
Module typeCourse
Duration1 semester
SemesterAutumn
ECTS5
Language of instructionEnglish
Empty-place SchemeYes
Location of the lectureCampus Copenhagen
Responsible for the module

Organisation

Study BoardStudy Board of Electronics and IT
DepartmentDepartment of Electronic Systems
FacultyThe Technical Faculty of IT and Design