Assignment Instructions
To fulfill the Unit 1 Assignment, complete the following steps:
1. Locate a dataset that can be examined in RStudio. This dataset can be one of the built-in practice data sets available in R, a data set of your own making, or one downloaded from another source. If you choose to use one of R’s built-in data sets, please do not use any that were used in the book chapters or in Seminar, in order to ensure that you have an opportunity to do more practice than to simply re-create analysis that you have already seen. Note that your chosen dataset does not need to be particularly large or complex, but it should include several columns and at least a dozen rows, preferably more. Be sure that the data are primarily numerical in nature since you will create some summary statistics later in this Assignment. Do not use a dataset without permission, especially if the data belongs to your employer.
2. Import your dataset into a data frame in RStudio.
3. Create a summary of your dataset in RStudio.
4. Open a new Microsoft® Word® document. Save it as Unit1
5. Install the “psych” package in R (install.packages(“psych”). Once it is installed, load the library.
6. Create descriptive statistics for your data set. The function in the psych library to generate descriptive statistics is “describe”. Take a screenshot of your descriptive statistics and place it in your Word file. Write a short description of what you have done with your data, and why it might be interesting to a data analyst. If necessary, use more than one screenshot and description.
7. Chapter 1 of the textbook demonstrates how to do a simple scatterplot in R, using the “plot” function. Pick two numeric columns in your dataset and create a scatterplot showing their relationship. Place a screenshot of your scatterplot in your Word document, and then describe how your scatterplot may be useful to a data analyst.
8. Research some common issues with data formatting, transfer, and manipulation. In APA format, write 2–3 paragraphs describing some of the issues you learned about. Describe why such issues might represent a problem for data analysts. Cite at least three sources in APA format. Be sure to also cite the source for the dataset you used in this Assignment.