How The Author Used Guide To Summarizing Datasets To Write Sample Statistical Report

Summarizing Dataset Variables

The author of the previous report used different techniques of summarizing data. First, the author gives an introduction of what he is planning to report. He then goes ahead to explain the dataset used by stating what each and every variable in the dataset represents. He states whether the variables are categorical or quantitative variables. For instance, the author categorizes “Gender”, “Are they old? Above or under 40” and “Do they like the product” as categorical variables while “How much they would pay for it” is categorized as a quantitative variable.

For the quantitative variables, the author utilizes descriptive summary statistics such as mean, median, mode among others to summarize the variables (quantitative) while for the categorical variables, the author utilizes frequency tables and bar graphs to present them.

Section 2

Pivot tables that let you investigate the relationship between the variables
“old or young” and “do the like the product? hate or like”

Count of do they like product?	Column Labels Save Time On Research and Writing Hire a Pro to Write You a 100% Plagiarism-Free Paper. Get My Paper
Row Labels	old	young	Grand Total
hate	15	10	25
like	55	20	75
Grand Total	70	30	100

Count of do they like product?	Column Labels
Row Labels	old	young	Grand Total
hate	21.43%	33.33%	25.00%
like	78.57%	66.67%	75.00%
Grand Total	100.00%	100.00%	100.00%

Make a simple comment

Majority (33.33%, n = 10) of the young people seem to hate the product as compared to the old people (21.43%, n = 15).

Using your sample what is the estimate for p₁– p₂? In other words what is the difference between the sample proportions –

0.7857-0.6667 = 0.119

Section 3

A pivot table that let you investigate the relationship between the variables
“old or young” and “how much they would pay for the product ”

sample collector id	420
Row Labels	Average of how much would pay?	StdDev of how much would pay?	Count of are they old?
old	2.520	1.224	70
young	2.183	1.405	30
Grand Total	2.419	1.283	100

Make a simple comment about the relationship between the variables

Old people are willing to pay slightly higher for the product as compared to the young people

Using your sample what is the estimate for µ₁– µ₂? In other words what is the difference between the sample means

2.520 – 2.183 = 0.337

Section 4

Scatterplot

Make a simple comment about the relationship between the variables
Estimated profit for the casino when there 1000 bets is

Section 5

A) Using the answer in section 2

Test the claim there is a difference in the proportions, use a 5% level of significance

State an appropriate H₀and H₁

Solution

Find the p-value Only using the answers to part (A) and the webpage
https://epitools.ausvet.com.au/content.php?page=z-test-2

Results

	Sample 1	Sample 2	Difference
Sample proportion	0.7857	0.6667	0.119
95% CI (asymptotic)	0.6896 – 0.8818	0.498 – 0.8354	-0.0662 – 0.3042
z-value	1.3
P-value	0.2079
Interpretation	Not significant, accept null hypothesis that sample proportions are equal
n by pi	n * pi >5, test ok

State whether or not you reject the H₀

Solution

We fail to reject the null hypothesis (H₀) since the p-value > 0.05

Give a conclusion in plain English

Solution

There is no significant statistical evidence to conclude that the proportion of old people who like the product is different from the proportion of young people who like the product.

B) Using the answer in section 3
Test the claim that there is a difference between the means using a 5% level of significance
State an appropriate H₀and H₁

Solution

Find the p-value using the answers to part (A) and the webpage
https://www.medcalc.org/calc/comparison_of_means.php

Solution

Results

Difference	-0.337
Standard error	0.279
95% CI	-0.8914 to 0.2174
t-statistic	-1.206
DF	98
Significance level	P = 0.2306

State whether or not you reject H₀

Solution

We fail to reject the null hypothesis (H₀) since the p-value > 0.05

Give a conclusion in plain English

Solution

There is no significant statistical evidence to conclude that the average amount spent by old people is different from the average amount spent by young people.

Section 6
Use the dataset given below you must use your own sample

Suppose A business has conducted an opinion poll to find out if their customers support a change to the Business

Use the PivotTable feature in excel to find appropriate summary statistics for your sample,. You should paste both into word, you do not need the excel file.

This pivot table must have the number of people that answer yes and the number of people that answer no

Solution

Row Labels	Count of do you support proposed change?
no	90
yes	112
Grand Total	202

The sample size n is 1000 and the sample proportion

Find 90% confidence interval for the proportion of people that support the change

standard error = = 0.03497

Using the z distribution 90% of sample proportions are within 1.645 standard errors of the population proportion so the 90% confidence for sample proportion is between
Lower bound:

Upper bound:
We are 90% confident that the sample proportion of people that support the change is between 0.4975 and 0.6125.

Section 7

Histogram

The histogram below shows the relationship between the variables “Win or loss” and the “goal difference “for the Man United football club.

Description of the variables

The variable “win or loss” is categorical variable because it is a question “Was it a win or a loss?” The variable goal difference is quantitative variable because the value is given in numbers.

Description of the relationship

The amount people would pay for the snack food is between 0 and 6

Large goal difference is observed for the wins as compared for the losses

Consider the histogram you found yourself and discussed in parts (a) ,(b) and (c)
Would the discussion be useful in business? Give a reason for your answer.

Solution

Yes the discussion would be useful in business since it will be able to predict the goal difference the team is likely to get in a win or a loss and this will prepare the manager on how to handle the case.

Consider the following discussion taken from the sample report you had to read in section 1, Would the discussion be useful in business? Give a reason for your answer

Solution

The discussions in section 1 are useful since they help in making summary for a business case. The summaries are able to tell the mean or the median values which helps the decision makers to plan well.

Section 8

This section is abstract so you are encouraged to try and roughly understand the following before attempting the task

a) Using section 2
Find the zscore of the estimate section 2d note that average of the estimates is 0.14 with standard deviation 0.088

Solution

Count of do they like product?	Column Labels
Row Labels	old	young	Grand Total
hate	21.43%	33.33%	25.00%
like	78.57%	66.67%	75.00%
Grand Total	100.00%	100.00%	100.00%

Using part (i) find P(Z<zscore) using wolframalpha.com

for example if the zscore is 0.5 type in
P(Z<0.5)”
into wolframalpha.com

IF there was a list of 1000 estimates ranked from lowest to highest, roughly what rank do you expect your estimate to have?

Hint: just use the formula
expected rank = P(Z<zscore)*1000

Solution

Complete the following table using https://app.box.com/s/2to195ysj0deo5wawwjp53e9jlt4peqp

	Which sample	Rank lowest to highest	Estimate X	Zscore=(X-mean)/stdev
Lowest estimate	475	1	-0.14306	-3.19465
Estimate from allocated sample	420	422	0.11905	-0.2386
Highest estimate	663	1000	0.543672	4.570203

b) Using section 3

Find the zscore of the estimate in section 3c note that average of the estimates is 0.408 with standard deviation 0.26

Solution

sample collector id	420
Row Labels	Average of how much would pay?	StdDev of how much would pay?	Count of are they old?
old	2.520	1.224	70
young	2.183	1.405	30
Grand Total	2.419	1.283	100

The estimate is – = 2.520 – 2.183=0.337

So the zscore is

Using part (ii) What is P(Z<zscore), you can find out the answer using wolframalpha.com

for example if the zscore =-1 type in
P(Z<-1)
into wolfram alpha

If there was a list of 1000 estimates ranked from lowest to highest, what rank do you think your would be close to, hint just use the formula
expected rank = P(Z<zscore)*1000

Complete the following table , use https://app.box.com/s/kiqemn0h0m3d03uygo1dhemvx4e5uf6r

	Which sample	Rank lowest to highest	Estimate X	Zscore=(X-mean)/stdev
Lowest estimate	475	1	-0.43474	-3.23897
Estimate from allocated sample	420	416	0.3367	-0.27308
Highest estimate	663	1000	1.607576	4.613465

Using section 4

Find the zscore of the slope estimate in section 4a note that average of the estimates is 0.952 with standard deviation 0.237

Solution

Using part (ii) What is P(Z<zscore), you can find out the answer using wolframalpha.com

for example if the zscore =-1 type in
P(Z<-1)
into wolfram alpha

If there was a list of 1000 estimates ranked from lowest to highest, what rank do you think your would be close to, hint just use the formula
expected rank = P(Z<zscore)*1000

Summary some of the 1000 estimates the full list of estimates is available from https://app.box.com/s/35a0x0hnxcqq2qh6krzua6qp587fke51

	Which sample	Rank lowest to highest	Estimate X	Zscore=(X-mean)/stdev
Lowest estimate	141	1	-0.00348010	-4.03134
Estimate from allocated sample	420	471	0.93864267	-0.05654
Highest estimate	683	1000	3.878984	3.876998

For parts a,b and c , compare the predicted rank for your sample iii using P(Z<zscore) to the actual rank in part iv

Solution

Section	Predicted rank	Actual rank
Section 2	406	422
Section 3	392	416
Section 4	478	471

As can be seen, the predicted and the actual ranks are slightly different; none of the ranks (predicted and actual ranks) were the same.

Comment on the connection between the following facts
*“part (d) shows totally different population with totally different variables have the same sampling distribution, (the normal distribution)”

*”Hypothesis testing uses a sampling distribution, p-value is a shaded area on the sampling distribution

Solution

Yes results showed totally different with the actual values since there is use of samples which are predicted to come from the sample but have almost similar characteristics.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ How The Author Used Guide To Summarizing Datasets To Write Sample Statistical Report ”

Get high-quality paper

NEW! AI matching with writer

Order an Essay Now & Get These Features For Free:

Turnitin Report

Formatting

Title Page

Citation

Outline

Place an Order