• What is “filter” and “wrapper” method in feature selection?
• What’s the advantage and disadvantage of them?
• What are the common methods to impute missing data?
• What’s the problem associated with learning using imbalance datasets? What are the
• Explain the bottom-up and top-down approaches in hierarchical clustering.
• How to you construct dendrogram from a given data set (HW)?
• What is single-link, complete-link, average link, centroid link?
• How does k-means algorithm work?
• What is SSE in k-means?
• How to select the optimal k?
• What is a k-mediods method? What’s the advantage of it over k-means?
• How to measure clustering quality (internal – scatter criteria, external – precision, recall,
• What is association rule mining? What are its applications?
• What is a strong association rule?
• How to calculate support, confidence of a given association rule?
• What is the apriori properity?
• How to generate C_i’s and L_i’s?
• How to evaluate a strong association is interesting or not: lift? What’s the implication of
lift >1, = 1, < 1?
• What is a sequential pattern?
• What is the apriori property in sequence mining?
• How to generate the C_i’s and L_i’s?
• What are the four components of a time series?
• What is an auto-regression model? How to construct lagged predictors?
• Why do we need DTW?
• What are the 3 properties a warping path must follow?
• How to find DTW give a cost matrix?