Project-1212 x
Data exploration and desсrіption
Project 1: Data Exploration and Description
NAME:
INSTRUCTIONS:
· Place your name above.
· You will show your work in this file by making room right under each item.
· Submit this file when your report is complete before the deadline (4/26) in the Project-1 assignment item in Week 4 folder.
Max points: 50 points
Each item: 5 points
Extra credit: 5 points x 2 = 10 points
The goal of this project is to collect, explore, and describe your own data using the knowledge you gained through the course. In doing so, you will have an opportunity to calibrate and apply what you’ve learned in class to the real data of your interest.
You are tasked with the following activities, most likely in sequence. What you will include in the final report is indicated with the description of each task.
LOGISTICS:
There are two discussion boards dedicated to this project.
Discussion board #1: For all communications regarding this project
Discussion board #2: “Data Collection Ideas” for sharing ideas
TASK #1: Data Collection (15 points)
You will collect a set of samples from the population of your interest. However, we will limit our interest to quantitative data, either discrete or continuous. (Definition 2.1) The number of samples you will collect is at least 30. There is a reason for this lower bound, which you will find later in the course. Although it’s difficult to have a perfect random sample of the population, you try to have a simple random sample. (Definition 1.4) You are encouraged to read Section 1.2 and 1.3 in your e-Text. Sampling will be done without replacement.
There is a discussion board set up for ideas of population that might interest you. Anyone with extra ideas on what kind of data might be collected without extra burden is encouraged to post those ideas in the dedicated discussion board, titled “Data Collection Ideas.”
Items to report in this task:
1. What is the population?
2. What is your interest or characteristic in this population? This will be the variable. Is the variable discrete or continuous?
3. How did you collect the samples?
Tips: Data is everywhere. Try a simple population for which you can collect samples easily around you, your house, or your work place.
TASK #2: Data Exploration with Visualization (15 points)
Before you start exploring your sample data, you look closely into the samples to see if any erroneous observations were made. If so, you may remove them and obtain different samples from another observations. Once you have a clean set of samples, you will visualize the sample dataset with histograms and stem-and-leaf diagram. (Section 2.3) Based on the relative-frequency histogram of your sample data, you will determine the shape of the sample distribution. (Section 2.4)
Items to report in this task:
1. Histograms of the dataset (both frequency histogram and relative-frequency histogram)
2. Stem-and-Leaf diagram of the dataset
3. Identify the shape of the distribution of the sample data based on the relative-frequency histogram of your sample data.
Tips: You may google search for instructions for creating histograms using Excel. You can use any software tool in creating graphs.
TASK #3: Data Exploration with Descriptive Measures (Numbers) (15 points)
In Task #2, you explored the sample data visually. They offered us a sense of overall distribution of the sample data. Here, you will summarize the sample data with numbers, especially the sample mean and standard deviation along with other measures.
Items to report in this task:
1. The sample mean, median, and mode of the sample data (Section 3.1 Measures of center)
2. The sample variance and standard deviation of the sample data (Section 3.2 Measures of Variation)
3. The five-number summary of the sample data (Section 3.4)
4. The box-plot based on the five-number summary in Part 3. (Extra Credit: 5 points)
Tips: Box-plots can also be created with Excel.
TASK #4: Interpretation (5 points)
Now that you have a good understanding of your sample data, it’s time to interpret what you have found in Task 2 and 3 in the context of the population of interest and the meaning of the variable you chose to investigate. Also, you will examine how your sample data spread against the rules on spread given in Key Fact 3.3 and 3.4.
Items to report in this task:
1. Brief summary of your findings in Tasks 1, 2, and 3 in the context of the meaning of the variable in one paragraph.
2. Key Facts 3.3 and 3.4 (Section 3.3 – Chebyshev’s Rule and Empirical Rule) tells us for any quantitative data set with roughly a bell-shaped distribution approximately what percentage of the observations lie within one, two, or three standard deviations to either side of the mean. Report what percentage of the observations lie within one, two, or three standard deviations to either side of the mean in your sample data. It is fine even if your number don’t match their rules. (Extra credit: 5 points)