Statistics Example: Frequency Distribution, Regression Analysis, ANOVA, And Sales Data

Frequency Distribution of Students’ Scores

Score of students in an exam have been given as follows:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

52

99

92

86

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

84

63

72

76

95

88

92

58

65

79

80

90

75

74

56

99

Table 1: Scores

The frequency distribution of the scores of the students is giving in the following table. The table also contains the cumulative frequency, the relative frequency, the cumulative relative frequency and the frequency in percentage or the percentage frequency. The scores have been divided in 6 intervals of width equal to 10 each.

Score

Frequency

Cumulative Frequency

Relative Frequency

Cumulative relative frequency

Percentage frequency

52-61

3

3

0.15

0.15

15.00%

62-71

2

5

0.10

0.25

10.00%

72-81

6

11

0.30

0.55

30.00%

82-91

4

15

0.20

0.75

20.00%

92-101

5

20

0.25

1.00

25.00%

Grand Total

20

100.00%

100.00%

Table 2: Frequency Distribution 

the histogram of the scores of the students. The primary vertical axis represents the relative frequency and the secondary vertical axis represents the cumulative relative frequency. 

The distribution seems to have higher frequency on the right hand side and is thus not symmetric. The distribution seems to be negatively skewed or right skewed. The interval 72 to 81 is seen to be the class interval which contains the median. It also is the class which has highest frequency and it is therefore the modal class. 

The output of a regression analysis was given as given in the table below where the dependent variable, y is in thousands of unit and the independent variable X is in thousands of dollars.

 ANOVA TABLE

          Degrees of freedom

         Sum of Squares(SS)

 Regression

 1

 354.689

 Residual

 39

 7035.262

               Est.Coefficients

          Standard Error (SE)

 Intercept

 54.076

 2.358

 X

 0.029

 0.021

Table 3: Regression Output 

Then the sample size n can be computed by looking at the degrees of freedom of Regression and Residual. The degrees of freedom of the regression is k-1= 1 since there is only one independent variable and the degrees of freedom of residual is n-k-1= 39 (Given). Then n= 39+k +1= 39 + 1 = 40. So there are 40 observations in the sample. 

The regression coefficient of X or the unit price has been given as 0.029 and the standard error as 0.021. Then the t-statistic to test for significance of the regression coefficient is given by their ratio which is equal to 0.029/0.021 which equals 1.38 and the critical t-statistic with alpha 0.05 and degree of freedom 1 was found to be 0.968. Then the observed statistic has a value greater than the critical value. Therefore the variable X or the price in units is inferred to be significant and hence the supply, y is related to X. 

The coefficient of determination is the R-squared statistic of the regression model. It is the ratio of the explained variation by the model to the total variation and serves as a measure of goodness of fit for the model. The statistic is given by the ratio: SSR/ (SSR +SSE)

Regression Analysis Output

Here SSR is given to be 354.689 and SSE is given to be 7035.26.

 Then the coefficient of determination, r2 =  = 0.047996. This means that the model explains only 4.79% of the total variation of the dependent variable, supply (y). 

The coefficient of correlation, denoted by r is given by the square root of the coefficient of determination, r2. Hence it is obtained as r = =  = 0.21908. So the correlation coefficient between the supply in thousand units and price in thousand dollars is given by 0.21908. 

The supply as predicted by the fitted regression model for the price equal to $50,000 is obtained by plugging in X = 50,000 in the regression equation: y= 54.076 + 0.029 X.

Then y = 54.076 + 0.029 x 50000 = 1504.076 units 

In order to aid the Allied Corporation in increasing their productivity in terms of the output of the line workers, four programs, namely A, B, C and D were designed. Twenty employees were randomly selected and assigned to any one of the four programs and their daily output were compared with one another using ANOVA to check whether one program performed better in bettering productivity than the others or not. The following table shows the observed output for each group.

Program A

Program B

Program C

Program D

150

150

185

175

130

120

220

150

120

135

190

120

180

160

180

130

145

110

175

175

Table 4: Line worker output for Groups A, B, C, D

 The following table gives the descriptive statistics of the output per day of the employees in each program group.

SUMMARY TABLE

Groups

Count

Sum

Average

Variance

Program A

5

725

145

525

Program B

5

675

135

425

Program C

5

950

190

312.5

Program D

5

750

150

637.5

Table 5 : Summary measures of Worker Output for each Group (A, B, C, D)

The following table gives the results of the ANOVA where the alpha or level of significance was taken to be 0.05.

ANOVA TABLE

Source of Variation

Sum of Squares

(SS)

Degree of freedom

(df)

Mean Squares

(MS)

Observed F –  statistic

P- value

F critical Value

Between Groups variation

8750

3

2916.667

6.140351

0.00557

3.238872

Within Groups variation

7600

16

475

Total variation

16350

19

Table 6 : Output for the test for Significance of the Regression Model 

The ANOVA table shows that the value of the observed statistic is 6.14 and the critical value is 3.2388. This means that the observed statistic is greater than the critical value and so it is suggested that there exists a difference in the output among the four groups A, B, C and D. The p-value was seen to be 0.005 which is less than the level of significance 0.05 and this too supports the rejection of the null hypothesis which asserts that no difference exists. Then looking at the mean output for the four groups, it is seen that group C has an output of 190 which is markedly greater than the output of group A, B and D. Therefore it is suggested that the company make use of program C to increase the daily productivity of all its line workers. 

The weekly sales data of a product of a company of size 7 for one week has been provided and it is of interest to establish the relationship of the weekly sales (y) with that of their competitor’s price (x1) and their own expenditure on advertising (x2).

The following table shows the data given on the same.

Week

Price(x1)

Advertising(x2)

Sales(y)

1

0.33

5

20

2

0.25

2

14

3

0.44

7

22

4

0.4

9

21

5

0.35

4

16

6

0.39

8

19

7

0.29

9

15

Table 7: Data Given

The regression equation obtained by fitting the given data is given as follows:

y = 3.5976 + 41.32 x1 + 0.0132 x2

The following tables shows the regression output for the fitted model:

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 90.0%

Upper 90.0%

Intercept

3.597615

4.052244

0.887808

0.424805

-7.65322

14.84845

-5.04115

12.23638

Price of Competitor(x1)

41.32002

13.33736

3.098065

0.036289

4.289567

78.35048

12.88681

69.75324

Advertising Expenditure (x2)

0.013242

0.327592

0.040422

0.969694

-0.8963

0.922782

-0.68513

0.711617

 Table 8: Regression Output- Estimated Coefficients and Significance 

Regression Statistics

Multiple R

0.877814

R Square

0.770558

Adjusted R Square

0.655837

Standard Error

1.83741

Observations

7

Table 9: Regression model fit measures 

The significance of the model is given by the ANOVA table as given below which gives the results of the F-test for significance of the model. The p-value was obtained as 0.052 which is less than the significance level 0.1 or 10% level of significance and hence the significance was found to be significant in explaining variation of weekly sales of the company. 

Source of Variation

Degree of freedom

(df)

Sum of Squares

(SS)

Mean Squares

(MS)

Observed F –  statistic

P- value

Regression

2

45.35284

22.67642

6.716801

0.052644

Residual

4

13.5043

3.376075

Total variation

6

58.85714

Table 10: Significance test of the regression model

From table the significance of the variable Advertising expenditure of the company was found to be insignificant at 0.1 level as its p-value is 0.969 and thus greater than 0.1. Comparatively the competitor’s price had a p-value of 0.03 which is less than 0.1 and hence significantly related to the weekly product sales of the company.

Based on part (C), the insignificant variable advertising expense was dropped from the regression model and the following regression equation as apparent from the table below was obtained: y =3.5817 + 41.603 x1

Coefficients

Standard Error

t Stat

P-value

Lower 90%

Upper 90%

Lower 90.0%

Upper 90.0%

Intercept

3.581788

3.608215

0.992676

0.366447

-5.69342

12.857

-3.68894

10.85252

Price of Competitor (x1)

41.60305

10.15521

4.096719

0.009385

15.49825

67.70786

21.13981

62.0663

Table 11: Regression Output- Estimated Coefficients and Significance

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.877761

R Square

0.770464

Adjusted R Square

0.724557

Standard Error

1.643765

Observations

7

Table 12: Regression model fit measures

ANOVA TABLE

Source of Variation

Degree of freedom

(df)

Sum of Squares

(SS)

Mean Squares

(MS)

Observed F –  statistic

P- value

Regression

1

45.34733

45.34733

16.78311

0.009385

Residual

5

13.50981

2.701963

Total variation

6

58.85714

Table 14: Significance test of the regression model

The new regression model implies that with unit increase in the price of the product by the competitor company, the sales per week of the company increases by 41.603 units and that if the competitor company had been giving away their products for free the weekly sales of the company would be 3.581 units.

Order your essay today and save 30% with the discount code ESSAYHELP