The course teaches a systematic way of approaching problems that deal with large and/or scattered data, building on the concepts of big data, data lake, and structured and unstructured data. The data collection process is introduced, which is composed of different steps such as Extract, Load, and Transform. The course will dive into each of these steps, teaching techniques on how to extract/mine the data from various sources, how and where to load/store them afterwards, and, finally, how and where to process/transform them. Furthermore, analyzing and obtaining insights from the data will be used to drive and optimize operation of some exemplary systems. Finally, principles of data visualization will also be covered in the course.
Must have knowledge about systems and software that can be used to collect data from various sources
Must have knowledge about restrictions when accessing online data, such as API limitations or privacy considerations
Must have knowledge about different standards for data formatting and interchanging such as JSON, XML, protocol buffers, etc.
Must have knowledge about diverse ways to store structured and unstructured data
Must have knowledge about different programming tools to be used for processing data
Must have knowledge about different algorithms for analyzing data
Must have knowledge about the time and memory complexity of the algorithms
Must have knowledge about diverse ways to visualize data
Must be able to design, develop, and deploy a system for data collection/mining
Must be able to work with real-time data streams
Must be able to use online APIs to retrieve data from various sources
Must be able to analyze several types of databases for storing the data
Must be able to use programming tools and libraries to handle the data correctly
Must be able to identify and use the appropriate data structures for processing the data in an efficient way
Must be able to use data processing algorithms and analyze their complexity
Must be able to scale the system as needed vertically or horizontally depending on the amount of data and algorithms used
Must be able to perform diverse types of data analysis such as clustering, classification, and prediction, among others
Must be able to use programming tools to visualize results and findings from the data via scientific plots
Must have the competence to create datasets from real world systems and sources
Must have the competence to analyze the collected data to extract useful information
Must have the competence to design a scalable mining system, a reliable data storage platform, an efficient processing tool, and an insightful visualization framework
Must have the competence to apply the knowledge and skills gained through the course to problems from various domains such as healthcare, smart city, industry, etc.
Must have the competence to document the systematic approach followed during the process in a clear way
The instruction will combine lectures, showcasing of examples, and exercises.
Name of exam | Data Mining and Analysis |
Type of exam | Written or oral exam |
ECTS | 5 |
Permitted aids | With certain aids:
See exam specification |
Assessment | 7-point grading scale |
Type of grading | Internal examination |
Criteria of assessment | The criteria of assessment are stated in the Examination Policies and Procedures |
Danish title | Data mining og analyse |
Module code | ESNCEKK3K1 |
Module type | Course |
Duration | 1 semester |
Semester | Autumn
|
ECTS | 5 |
Language of instruction | English |
Empty-place Scheme | Yes |
Location of the lecture | Campus Copenhagen |
Responsible for the module | |
Used in |
Education owner | Master of Science (MSc) in Engineering (Computer Engineering) |
Study Board | Study Board of Electronics and IT |
Department | Department of Electronic Systems |
Faculty | The Technical Faculty of IT and Design |