# Data Mining

ITS-632 Intro to Data Mining

Dr. Patrick Haney

Dept. of Information Technology &

School of Computer and Information Sciences

University of the Cumberlands

Chapter 2 Assignment

1. What’s an attribute? What’s a data instance?

2. What’s noise? How can noise be reduced in a dataset?

3. Define outlier. Describe 2 different approaches to detect outliers in a dataset.

4. Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.

5. Given a sample dataset with missing values, apply an appropriate technique to deal with them.

6. Give 2 examples in which aggregation is useful.

7. Given a sample dataset, apply aggregation of data values.

8. What’s sampling?

9. What’s simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).

10. What’s stratified sampling?

11. What’s “the curse of dimensionality”?

12. Provide a brief description of what Principal Components Analysis (PCA) does. [Hint: See Appendix A and your lecture notes.] State what’s the input and what the output of PCA is.

13. What’s the difference between dimensionality reduction and feature selection?

14. Describe in detail 2 different techniques for feature selection.

15. Given a sample dataset (represented by a set of attributes, a correlation matrix, a co-variance matrix, …), apply feature selection techniques to select the best attributes to keep (or equivalently, the best attributes to remove).

16. What’s the difference between feature selection and feature extraction?

17. Give two examples of data in which feature extraction would be useful.

18. Given a sample dataset, apply feature extraction.

19. What’s data discretization and when is it needed?

20. What’s the difference between supervised and unsupervised discretization?

21. Given a sample dataset, apply unsupervised (e.g., equal width, equal frequency) discretization, or supervised discretization (e.g., using entropy).

22. Describe 2 approaches to handle nominal attributes with too many values.

23. Given a dataset, apply variable transformation: Either a simple given function, normalization, or standardization.

24. Definition of Correlation and Covariance, and how to use them in data pre-processing (see pp. 76-78).

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.