# data analysis and viz

• Use the tool of your choice (RStudio, Excel, Python) to generate a word document with basic data analysis of the data set posted in the Week 2 content folder.
• Create a Word document that includes the screen shots described below.

Questions/Requests:

• Create a summary of statistics for the dataset. (provide a screen shot)
• Create a correlation of statistics for the dataset. (provide a screen shot)
• What is the Min, Max, Median, and Mean of the Price? (provide a screen shot)
• What is the correlation values between Price, Ram, and Ads? (provide a screen shot)
• Create a subset of the dataset with only Price, CD, and Premium. (provide a screen shot)
• Create a subset of the dataset with only Price, HD, and Ram where Price is greater than or equal to \$1750. (provide a screen shot)
• What percentage of Premium computers were sold? (provide a screen shot)(Hint: Categorical analysis)
• How many Premium computers with CDs were sold? (provide a screen shot)(Hint: Contingency table analysis)
• How many Premium computers with CDs priced over \$2000 were sold? (provide a screen shot)(Hint: Conditional table analysis)

Your document should be an easy-to-read font in MS Word. Your cover page should contain the following: Title, Student’s name, University’s name, Course name, Course number, Professor’s name, and Date.

Don't use plagiarized sources. Get Your Custom Essay on
data analysis and viz
Just from \$13/Page

Analyzing and Visualizing Data

Chapter 4
Working With Data

Data Assets and Tabulation Types

• Two main categories
o Data that exist in tables; Datasets
o Data that exist as isolated values

• Data Types
o Levels of data or scales of measurement
o Type of exploratory data analysis you can undertake
o Editorial thinking you establish
o Specific chart types you might use
o Color choices and layout decisions around composition

Data Assets and Tabulation Types cont.

• Textual (Qualitative)
o Unstructured streams of words
o Descriptive details of a weather forecast for a given city
o The full title of an academic research project
o The description of a product on Amazon

Data Assets and Tabulation Types cont.

• Nominal (Qualitative)
o Ordinal data is still categorical and qualitative in nature
o Characteristics of order
o The response to a survey question: based on a scale of 1 (unhappy)

to 5 (very happy)
o The general weather forecast: expressed as Very Hot, Hot, Mild, Cold,

Freezing

Data Assets and Tabulation Types cont.

• Interval (Quantitative)
o Interval data is the less common form of quantitative data
o Quantitative and numeric measurement
o Measure for temperature

Data Assets and Tabulation Types cont.

• Ratio (Quantitative)
o Most common quantitative variable
o Age of a survey participant in years
o Forecasted amount of rainfall in millimetres
o Unlike interval data, for ratio data variables zero means something

Data Assets and Tabulation Types cont.

• Temporal Data
o Time-based data
o Textual: ‘Four o’clock in the afternoon on Monday, 12 March 2016’

Ordinal: ‘PM’, ‘Afternoon’, ‘March’, ‘Q1’
o Interval: ‘12’, ‘12/03/2016’, ‘2016’
o Ratio: ‘16:00’

Data Assets and Tabulation Types cont.

• Discrete
o No ‘in-between’ state
o Days of the week
o Heads or tails for a coin toss
o 1,2,3,4,5,6,etc.

• Continuous
o Has in-between state
o Height and weight
o Temperature
o Time
o 1.1,1.2,1.3,1.4,1.5,etc.

Data Acquisition

• What data do you need and why?
• From where, how, and by whom will the data be acquired?
• When can you obtain it?

Data Acquisition cont.

• Curated by You
o Primary data collection
o Manual collection and data foraging
o Extracted from pdf files
o Web scraping (also known as web harvesting)

Data Acquisition cont.

• Curated by Others
o Issued to you
o System report or export
o Third-party services
o API

Data Examination

• Data Properties
o Data types
o Size
o Condition

▪ Missing values
▪ Erroneous values
▪ Inconsistencies
▪ Duplicate records
▪ Out of date
▪ Uncommon system characters or line breaks

Data Examination cont.

• How to Approach This?
o Inspect and scan
o Data operations
o Statistical methods
o Frequency counts
o Frequency distribution
o Measurements of central tendency
o Maximum, minimum and range
o Percentiles
o Standard deviation

Influence on Process

• Moving forward
o Purpose map ‘tone’
o Editorial angles
o Physical properties influence scale

Data Transformation

• Potential Activities
o Transform to clean
o Transform to convert
o Transform to create
o Transform to consolidate

Data Exploration

• Exploratory Data Analysis
o Instinct of the analyst
o Reasoning

▪ Deductive
▪ Inductive

o Chart types
o Research
o Statistical methods
o Nothings
o Not always needed

How to Use the R Programming

Language for Statistical Analyses
Part I: An Introduction to R

What Is R?

◼ a programming “environment”

◼ object-oriented

◼ similar to S-Plus

◼ freeware

◼ provides calculations on matrices

◼ excellent graphics capabilities

◼ supported by a large user network

What is R Not?

◼ a statistics software package

◼ quick to learn

◼ a program with a complex graphical interface

Installing R

◼ www.r-project.org/

http://www.r-project.org/

Tutorials

◼ From R website under “Documentation”

– “Manual” is the listing of official R documentation

• An Introduction to R

• R Language Definition

• Writing R Extensions

• R Data Import/Export

• The R Reference Index

Tutorials cont.

– “Contributed” documentation are tutorials and

manuals created by R users

• Simple R

• R for Beginners

• Practical Regression and ANOVA Using R

– R FAQ

– Mailing Lists (listserv)

• r-help

Tutorials cont.

◼ Textbooks

– Venables & Ripley (2002) Modern Applied

Statistics with S. New York: Springer-

Verlag.

– Chambers (1998). Programming With Data: A

guide to the S language. New York: Springer-

Verlag.

R Basics

◼ objects

◼ naming convention

◼ assignment

◼ functions

◼ workspace

◼ history

Objects

◼ names

◼ types of objects: vector, factor, array, matrix,

data.frame, ts, list

◼ attributes

– mode: numeric, character, complex, logical

– length: number of elements in object

◼ creation

– assign a value

– create a blank object

Naming Convention

◼ can contain letters, digits (0-9), and/or

periods “.”

◼ case-sensitive

– mydata different from MyData

◼ do not use use underscore “_”

Assignment

◼ “<-” used to indicate assignment

– x<-c(1,2,3,4,5,6,7)

– x<-c(1:7)

– x<-1:4

◼ note: as of version 1.4 “=“ is also a valid assignment operator

Functions

◼ actions can be performed on objects using

functions (note: a function is itself an object)

◼ have arguments and options, often there are

defaults

◼ provide a result

◼ parentheses () are used to specify that a

function is being called

Let’s look at R

R Workspace & History

Workspace

◼ during an R session, all objects are stored in

a temporary, working memory

◼ list objects

– ls()

◼ remove objects

– rm()

◼ objects that you want to access later must be

saved in a “workspace”

– from the menu bar: File->save workspace

– from the command line:
save(x,file=“MyData.Rdata”)

History

◼ command line history

◼ can be saved, loaded, or displayed

– savehistory(file=“MyData.Rhistory)

– history(max.show=Inf)

◼ during a session you can use the arrow keys

to review the command history

Two most common object types

for statistics:

matrix

data frame

Matrix

◼ a matrix is a vector with an additional attribute

(dim) that defines the number of columns and

rows

◼ only one mode (numeric, character, complex,

or logical) allowed

◼ can be created using matrix()

x<-matrix(data=0,nr=2,nc=2)

or

x<-matrix(0,2,2)

Data Frame

◼ several modes allowed within a single data

frame

◼ can be created using data.frame()
L<-LETTERS[1:4] #A B C D

x<-1:4 #1 2 3 4

data.frame(x,L) #create data frame

◼ attach() and detach()
– the database is attached to the R search path so that the database is

searched by R when it is evaluating a variable.

– objects in the database can be accessed by simply giving their names

Data Elements

◼ select only one element

– x[2]

◼ select range of elements

– x[1:3]

◼ select all but one element

– x[-3]

◼ slicing: including only part of the object

– x[c(1,2,5)]

◼ select elements based on logical operator

– x(x>3)

Data Import & Entry

Importing Data

– reads in data from an external file

◼ data.entry()

– create object first, then enter data

◼ c()

– concatenate

◼ scan()

– prompted data entry

◼ R has ODBC for connecting to other programs

Data entry & editing

◼ start editor and save changes

– data.entry(x)

◼ start editor, changes not saved

– de(x)

◼ start text editor

– edit(x)

Calculator

Total price:\$26
Our features