**Syllabus of CSCE822 Data Mining**

**Course Summary**

This course will cover the techniques and topics that are widely used in real-world data mining projects including classification, clustering, dimension reduction, feature selection, open-ended knowledge discovery, and etc. We will use real-world data to challenge your skills of data mining. Students from computer science, engineering, biostatistics, (molecular) biology, medicine are all encouraged to enroll.

Easy acquision of huge amount of data in science, business, and national security makes it critial to extract informative knowledge and patterns from these data to ensure the competitiveness in the world. Data mining have been intensively used in large companies such as IBM, HP, Ebay, Wellsfargo, by govenmental organizations such as National Security Agency and CIA, and in the emerging field of genomics or bioinformatics. Understanding the principles of data mining and obtaining hands-on experience of implementing data mining projects will greatly improve the competitiveness of students in the job market as well as enhance their research skills.

**Course Objective**

- To develop an understanding of the concepts in data mining
- To be able to locate and evaluate popular data mining techniques and software packages
- To be able to identify promising applications of data mining
- To be able to implement prototype data mining systems
- To be able to design/implement new data mining algorithms

**Prerequisite**
You are expected to have some basic programming skills. Any of C, C++, java, R, matlab, Perl, Python is ok.

**Textbooks**

You can select either one of the following two textbooks. The first one is more interesting to read while the second one is more comprehensive and comes with the lastest research topics.

- Introduction to Data Mining, P.-N. Tan, M. Steinbach, and V. Kumar, Addison Wesley, 2005. ISBN 0321321367 (Well written, good for undergraduate).
- Data Mining Concepts and Techniques (Second Edition), Jiawei Han & Micheline Kamber, Morgan Kaufman Publishers, 2005, ISBN 1558609016. (More Advanced, good for graduate)
- For reference only: Principles of Data Mining, Hand, Mannila, and Smyth. Cambridge, MA: MIT Press, 2001. ISBN: 026208290X.

(you can find cheap books at http://www.addall.com)

Data Mining: Practical Machine Learning Tools and Techniques

#### Meeting Time(s): TTH 2:00PM- 3:15PM

Classroom: 2A11 Swearinger Engineering Center

Instructor: Dr. Jianjun Hu

Email: jianjunh AT cse.sc.edu

Office: 3A66 Swearinger Engineering Center

Office Hours: TTH 3:30PM-4:30PM or by Appointment.

**Lecture Notes/Assignments/Readings**

Lecture notes, homework assignments will be available at the class website. You will be responsible for downloading them to prepare for class and homework.

**Supplementary Readings**
Extensive reading materials will be provided each week to develop a broad understanding of research and applications of data mining.

**Softwares**
We either develop our code for projects or use existing data mining packages. In many cases, you will be asked to find those packages using search engines such as google.

**Grading**

Your course grade will be based on homework assignments, a mid-term exam, and a final project. The weights given to these components is:

Homework assignments(40%); Mid-term Exam (15%); Project report and presentation (35%); Classroom participation (10%).