r assignment, use r studio. due tomorrow night
Hi, do you know how to use R studio. This is from Econs, but it is only about some basic operations for R language.
# ECON320 SPRING 2020: R EXAM (Due 5/17, 11:59PM)
# Author: (put your name here)
# Change the file name by adding your name at the end
# Points for different tasks are as indicated
# This assignment is an EXAM. YOU CANNOT COLLABORATE DIRECTLY ON ANSWERING QUESTIONS ON THIS EXAM. UNDENIABLE INSTANCES OF COLLABORATION/COPYING WILL BE CONSIDERED VIOLATION OF ACADEMIC INTEGRITY AND TREATED ACCORDINGLY.
# You can, however, use previous lectures, other online and printed resources. IF YOU ARE UNSURE ABOUT WHAT IS PERMITTED, ASK ME.
# You can skip parts you can’t execute and get credit for subsequent parts.
# You will need to Knit the document at the end, so make sure that every part works and you have commented out any part that doesn’t work, installation of packages or running esquisse or help requests through “?”
# DELIVERABLES: you will submit this script with your code in it and the knitted html file
###################################################################
# PART 1. (36 points, 2 points for each subpart) Below I wanted to give you some freebies. You have a lot of freedom in what to do.
# i. Clear any data that may already exist
# ii. Display any array, but not one identical to what we created in class
# iii. Display any list, but not one identical to what we created in class
# iv. Display any logical vector, but not one identical to what we created in class
# v. Create a vector containing values of 81, 71, 91, 12; call it v1 and display it
# vi. Create another vector containing values of 1984, 2014, 1995, 2022; call it v2 and display it
# vii. Create a matrix with 4 rows and 2 columns; the first column is vector v1 and the second is v2;name the matrix mat1 and display (hint: you can use c() to combine the two vectors in the first argument of matrix function)
# viii. Create any data frame
# ix. Add any type of a calculated column to it
# x. Change column names to whatever you want
# xi. Delete any one column from the data frame
# xii. Delete any row from the data frame
# xiii. Use the data frame to show that you understand how a simple “For” loop works
# xiv. Display mean, standard deviation, minimum and maximum of 1 numeric column in your data frame
# xv. Set directory to where you want your data stored on your laptop
# xvi. Export/save your data frame in .csv format
# xvii. Save all objects you have created in RData format
# xviii. Clear all objects from the environment
###################################################################
# PART 2. (15 points, 5 points each subpart) Gyms are closed and you need to decide whether to buy a treadmill for home. The damn things are expensive, however, so you don’t know if it’s worth it. To help your decision you want to have a quick look at whether exercise has a protective effect against succumbing to covid.
# i. At the county level, find the correlation between covid deaths per capita as of May 7, 2020, and physical inactivity (from the file with county characteristics we used for the lectures). Is it positive or negative and is it statistically different from 0? What does this suggest about the importance of physical activity?
# ii. At the county level, regress deaths per capita on physical inactivity. What do the results suggest?
# iii. Again at the county level, regress number of deaths per capita on physical inactivity while controlling for Adult.smoking, % Rural, % 65.and.older, % Non.Hispanic.African.American , % Hispanic , Income.inequality , Median.household.income, High.school.graduation, Air.pollution…particulate.matter, Adult.obesity , Poor.or.fair.health . Interpret the coefficient on physical inactivity–when share of physically inactive adults increases by 10 percentage points, what is the predicted change in number of deaths per 100,000 people? Is it statistically different from 0? What does this suggest about the importance of physical activity? If the conclusion is different than when looking at the simple OLS regression above, why might that be?
###################################################################
# PART 3. (22 points, ) You were asked by California governor to figure out which counties have seen slowing growth rate of covid cases and which have not. You need to (not necessarily in this order and not necessarily in separate steps):
# clear any objects already in the environment
# i. Load covid-19 data on us cases
# ii. Separate just California counties
# iii. Reshape data (wide to long) for easier analysis, so that instead of having separate columns for each date, you have dates in one column and values corresponding to those dates in another
# iv. Keep only March and April values (no February, January or May) (hint: you can use c() to combine the selection of different months or you can select them separately and then combine)
# v. Sort by county-date
# vi. Find day-to-day growth rate for all the counties (make sure to remove cases where growth rate = infinity or NA)
# vii. Run a t-test to see if average day-to-day growth rate in number of cases was equal in april and march against the alternative that it was slower in April. What do you conclude?
# viii. Keep only April data. Create a variable that takes value of 0 if the data is in the first half of April – 1 to 15th-and 1 if it is the second half. (hint: you may want to change date format to numeric)
# ix. Collapse/aggregate/summarize your data so that for each county (there are 60) you have the average growth rate for the first half of April and for the second
# x. Create a variable with the difference in growth rate for second half of April and first half of April for each county. So, if growth rate in second half was 1.2 and in the first – 1.3, the the difference is -0.1.
# xi. Sort or rank counties by this difference so you can answer the following. Which 5 counties saw rate slow down the most and which 5 saw it slow down the least or even increase?
###################################################################
# PART 4.(12 points) Take the 5 counties you identified above as having the smallest slowdown/largest growth; if you were unable to [identify the 5 counties], any other 5 counties.
# i. Get back to the data frame with growth rate in March and April and map growth rate in these two months using a line graph – you should have 5 lines for all 5 counties (if you were unable to obtain growth rates, then just graph cases) where the vertical axis is growth rate and horizontal – date. What kind of patterns in growth rate do you see?
# ii. For the 5 counties, create a clustered bar chart, horizontal or vertical, where the height of bars is the average growth rate. You should have 5 bars where the growth rate is for April and 5 (of different color) where the average growth rate is for March. You may need a couple of data manipulation steps to be able to produce the graph.
# iii. For all California counties (for each county), find the % increse in the number of new covid cases in the last week of April (=100*(the number on April 30th – the number on April 23)/ the number on April 23). If the number of cases on April 23 was 0, make sure to change growth value to 0. Plot a histogram to show the distribution of new cases.
# iv. Using the the % increase in the number of new covid cases in the last week of April you found in the previous step, map this increase using a county-level map of California. Color of polygons should reflect the % increase. (details of formatting up to you)
###################################################################
# PART 5. (15 points) You work as an analyst in at a think tank and you were asked to find the date of peak growth in each state. You may need to perform some data manipulation for each step.
# clear any objects already in the environment
# i. For each state, find the average, standard deviation, minimum and maximum of a daily increase in the number of cases (number-wise, not % wise) from Jan 22 until May 7. For example, if Alabama had, for 4 consecutive days, 100, 300, 100, and 100 new cases, the average is 200.
# ii. For each state, find the date when the number of new cases was the largest
# iii. Create a U.S. state map with the date of the largest increase displayed as text and the color of polygons reflecting the maximum value of daily increase (details of formatting up to you)
# PART 6. (3 points). Knit document in html format.