Suggested Readings for the course

This list will be updated every week. Read most or all of them to obtain broad understanding of the field. Be sure you have read something before coming to lectures so you have something in mind for discussion and brainstorming.

Week 1

  1. New York Times interview on Data mining (2005) dm_NYT_interview2005.pdf
  2. Data mining at Walmart(2004) DM_walmart.pdf
  3. Jinyan Li (2005). Data mininng applications in bioinformatics dm_bioinformatics.pdf
  4. Jiawei Han, Kevin Chen-Chuan Chang.(2002) Data mininng for web intelligence dm_webintelligence.pdf
  5. Shashi Shekhar, Pusheng Zhang, Yan Huang, Ranga Raju Vatsavai. Trends in Spatial Data Mining. dm_spatial.pdf
  6. Introduction to data mining: what data, what to mine dm_intro.pdf
  7. Data Mining and Homeland Security: An Overview dm_homelandsecurity.pdf
  8. DATA MINING Federal Efforts Cover a Wide Range of Uses dmgovernment.pdf
  9. KNN in handwritten digit recognition KNN_recognition.pdf
  10. Overfitting problem Issue in KNN KNN_overfit.pdf
  11. Bin Zhang,Srihari, S.N. Fast k-nearest neighbor classification using cluster-based trees. KNN_fastbyClustering.pdf

Week 2

  1. Quackenbush, Computational Analysis of microarray data DM_microarray01.pdf
  2. Leung, et al. Fundamentals of microarray data analysisDM_microarray03.pdf
  3. Enhancement of Data for data miningDM_enhanceData.pdf
  4. Future trends in data mining with discussion on preprocessingDM_Trends2007.pdf

Week 3

  1. 2006 Classification of Mammograms Using Decision Trees.DT_imageclassifier.pdf
  2. 2006 Ian Fette et al. Learning to Detect Phishing Emails.DM_phishingemail.pdf DM_phishingemail1.pdf
    (pls. check how they evaluate their method and compare to other methods).
  3. GFI MailEssentials -a company using Bayesian filers for spam detection.DT_spamBayes.pdf

Week 4

  1. (Seminar paper) BREIMAN, Leo, 1996. Bagging predictors, Machine Learning 24(2)ML_bagging.pdf

Week 5

  1. Ensemble methods.ML_ensemble.pdf

Week 6

  1. Google's US. Patent on clustering. Scalable User clustering based on Set similarity. ML_UserClusteringSetSimilarity.pdf
    You may have interest in knowing what other patents Google has obtained
  2. Slonim N 2005. Information-based clustering. PNAS 2005. ML_InfoClustering.pdf
  3. Fraley. How many clusters? Model-based clustering.


  1. Handl 2005. Computational cluster validation in post-genomic data analysis. ML_ClusteringValidation.pdf
  2. 2004 BiClustering algorithm: a Survey. ML_surveyBicluster.pdf
  3. 2005 Clustering algorithm: a Survey. ML_surveyCluster.pdf

Week 7

  1. 2003 Introduction to Variable and Feature Selection. ML_FeatureSelection.pdf
  2. 2004 Feature Selection for Unsupervised learning. ML_FeatureSelUnsupervised.pdf

Week 8

The Million Song Dataset Challenge paper

Week 9

Week 10

  1. 2004 FP tree method for frequent item mining. DM_FPtree.pdf
  2. 2004 FP tree slides. DM_FPtreeSlide.pdf

Week 11

  1. 2005 Graph mining in Chemical structures. DM_GraphMineChem.pdf
  2. 2006 In Silico fishing:Predicting biological targets from chemical structure. DM_BioTargetPredict.pdf

Week 12

  1. Hidden Markov Models. DM_HMM.pdf
  2. Expectation Maximization. DM_EM.pdf Δ


Feature Extraction course with slides

Feature Extraction

KDD2007 conference proceeding

KDD 2008 papers by subject