Syllabus of CSCE822 Data Mining

Course Summary

This course will cover the techniques and topics that are widely used in real-world data mining projects including classification, clustering, dimension reduction, feature selection, open-ended knowledge discovery, and etc. We will use real-world data to challenge your skills of data mining. Students from computer science, engineering, biostatistics, (molecular) biology, medicine are all encouraged to enroll.

Easy acquision of huge amount of data in science, business, and national security makes it critial to extract informative knowledge and patterns from these data to ensure the competitiveness in the world. Data mining have been intensively used in large companies such as IBM, HP, Ebay, Wellsfargo, by govenmental organizations such as National Security Agency and CIA, and in the emerging field of genomics or bioinformatics. Understanding the principles of data mining and obtaining hands-on experience of implementing data mining projects will greatly improve the competitiveness of students in the job market as well as enhance their research skills.

Course Objective

  • To develop an understanding of the concepts in data mining
  • To be able to locate and evaluate popular data mining techniques and software packages
  • To be able to identify promising applications of data mining
  • To be able to implement prototype data mining systems
  • To be able to design/implement new data mining algorithms

Prerequisite You are expected to have some basic programming skills. Any of C, C++, java, R, matlab, Perl, Python is ok.

Textbooks
You can select either one of the following two textbooks. The first one is more interesting to read while the second one is more comprehensive and comes with the lastest research topics.

(you can find cheap books at http://www.addall.com)

Data Mining: Practical Machine Learning Tools and Techniques

Meeting Time(s): TTH 2:00PM- 3:15PM
Classroom: 2A11 Swearinger Engineering Center
Instructor: Dr. Jianjun Hu
Email: jianjunh AT cse.sc.edu
Office: 3A66 Swearinger Engineering Center
Office Hours: TTH 3:30PM-4:30PM or by Appointment.

Lecture Notes/Assignments/Readings

Lecture notes, homework assignments will be available at the class website. You will be responsible for downloading them to prepare for class and homework.

Supplementary Readings Extensive reading materials will be provided each week to develop a broad understanding of research and applications of data mining.

Softwares We either develop our code for projects or use existing data mining packages. In many cases, you will be asked to find those packages using search engines such as google.

Grading

Your course grade will be based on homework assignments, a mid-term exam, and a final project. The weights given to these components is:

Homework assignments(40%); Mid-term Exam (15%); Project report and presentation (35%); Classroom participation (10%).