1Slide© Cengage Learning. All Rights Reserved
2Slide© Cengage Learning. All Rights Reserved
Chapter 11
Inferences About Population Variances
Inference about a Population Variance
Inferences about Two Population Variances
3Slide© Cengage Learning. All Rights Reserved
Inferences About a Population Variance
Chi-Squared Distribution
Interval Estimation
Hypothesis Testing
4Slide© Cengage Learning. All Rights Reserved
Chi-Squared Distribution
We can use the chi-squared distribution to construct
interval estimates and do hypothesis tests about a
population variance.
The sampling distribution of (n – 1)s2/ 2 has a chi-
squared distribution whenever a simple random
sample of size n is selected from a normal population.
The chi-squared distribution is the sum of squared
standardized normal random variables such as
(Z1)
2 + (Z2)
2 + (Z3)
2 and so on.
5Slide© Cengage Learning. All Rights Reserved
Examples of Sampling Distribution of (n – 1)s2/
2
0
With 2 degrees
of freedom
2
2
( 1)
n s
With 5 degrees
of freedom
With 10 degrees
of freedom
6Slide© Cengage Learning. All Rights Reserved
2
2 2
.975 .025
Chi-Squared Distribution
For example, there is a 0.95 probability of obtaining a
2 (chi-squared) value
such that
We shall use the notation to denote the value for
the chi-squared distribution that gives an area of a to
the right of the stated value.
2
a
2
a
7Slide© Cengage Learning. All Rights Reserved
95% of the
possible 2 values
2
0
0.025
2
.025
0.025
2
.975
Interval Estimation of 2
2
2 2
.975 .0252
( 1)n s
8Slide© Cengage Learning. All Rights Reserved
Interval Estimation of 2
( ) ( )
/ ( / )
n s n s
1 1
2
2
2
2
2
1 2
2
a a
( ) ( )
/ ( / )
n s n s
1 1
2
2
2
2
2
1 2
2
a a
2 2 2
(1 / 2) / 2a a
2
2 2
(1 / 2) / 22
( 1)n s
a a
Substituting (n – 1)s2/2 for the 2 we get
Performing algebraic manipulation we get
There is a (1 – a) probability of obtaining a 2 value
such that
9Slide© Cengage Learning. All Rights Reserved
Interval Estimate of a Population Variance
Interval Estimation of 2
( ) ( )
/ ( / )
n s n s
1 1
2
2
2
2
2
1 2
2
a a
( ) ( )
/ ( / )
n s n s
1 1
2
2
2
2
2
1 2
2
a a
where the values are based on a chi-squared
distribution with n – 1 degrees of freedom and
where 1 – a is the confidence coefficient.
10Slide© Cengage Learning. All Rights Reserved
Interval Estimation of
Interval Estimate of a Population Standard Deviation
Taking the square root of the upper and lower
limits of the variance interval provides the confidence
interval for the population standard deviation.
2 2
2 2
/ 2 (1 / 2)
( 1) ( 1)n s n s
a a
11Slide© Cengage Learning. All Rights Reserved
Buyer’s Digest rates thermostats
manufactured for home temperature
control. In a recent test, 10 thermostats
manufactured by ThermoRite were
selected and placed in a test room that
was maintained at a temperature of 20oC.
The temperature readings of the ten thermostats are
shown on the next slide.
Interval Estimation of 2
Example: Buyer’s Digest (A)
12Slide© Cengage Learning. All Rights Reserved
Interval Estimation of 2
We will use the 10 readings below to
Construct a 95% confidence interval
estimate of the population variance.
Example: Buyer’s Digest (A)
Temperature 19.7 19.9 20.1 20.7 20.8 19.4 20.1 20.3 19.9 19.6
Thermostat 1 2 3 4 5 6 7 8 9 10
13Slide© Cengage Learning. All Rights Reserved
Degrees
of Freedom .99 .975 .95 .90 .10 .05 .025 .01
5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666
10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209
Area in Upper Tail
Interval Estimation of 2
Selected Values from the Chi-Squared Distribution Table
Our value
2
.975
For n – 1 = 10 – 1 = 9 d.f. and a = 0.05
14Slide© Cengage Learning. All Rights Reserved
Interval Estimation of 2
2
0
0.025
2
2
.0252
( 1)
2.700
n s
Area in
Upper Tail
= 0.975
2.700
For n – 1 = 10 – 1 = 9 d.f. and a = 0.05
15Slide© Cengage Learning. All Rights Reserved
Degrees
of Freedom .99 .975 .95 .90 .10 .05 .025 .01
5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666
10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209
Area in Upper Tail
Interval Estimation of 2
Selected Values from the Chi-Squared Distribution Table
For n – 1 = 10 – 1 = 9 d.f. and a = 0.05
Our value
2
.025
16Slide© Cengage Learning. All Rights Reserved
2
0
0.025
2.700
Interval Estimation of 2
n – 1 = 10 – 1 = 9 degrees of freedom and a = 0.05
2
2
( 1)
2.700
19.023
n s
19.023
Area in Upper
Tail = 0.025
17Slide© Cengage Learning. All Rights Reserved
Sample variance s2 provides a point estimate of 2.
2
2
( ) 1.94
0.216
1 9
i
x x
s
n
2(10 1)0.216 (10 1)0.216
19.02 2.70
Interval Estimation of 2
0.102 < 2 < 0.72
A 95% confidence interval for the population variance
is given by:
18Slide© Cengage Learning. All Rights Reserved
Left-Tailed Test
Hypothesis Testing
About a Population Variance
2
2
0
2
1
( )n s
2
2
0
2
1
( )n s
where is the hypothesized value
for the population variance
2
0
•Test Statistic
•Hypotheses
2 2
0 0: H
2 2
1 0: H
19Slide© Cengage Learning. All Rights Reserved
Left-Tailed Test (continued)
Hypothesis Testing
About a Population Variance
Reject H0
if p-value < a
p-Value approach:
Critical value approach:
•Rejection Rule
Reject H0 if
2 2
( 1 )a
where is based on a chi-squared
distribution with n – 1 d.f.
2
( 1 )a
20Slide© Cengage Learning. All Rights Reserved
Right-Tailed Test
Hypothesis Testing
About a Population Variance
H0
2
0
2
: H0
2
0
2
:
2 2
1 0
: H
2
2
0
2
1
( )n s
2
2
0
2
1
( )n s
where is the hypothesized value
for the population variance
2
0
•Test Statistic
•Hypotheses
21Slide© Cengage Learning. All Rights Reserved
Right-Tailed Test (continued)
Hypothesis Testing
About a Population Variance
Reject H0 if
2 2
a
Reject H0 if p-value < a
2
awhere is based on a chi-squared
distribution with n – 1 d.f.
p-Value approach:
Critical value approach:
•Rejection Rule
22Slide© Cengage Learning. All Rights Reserved
Two-Tailed Test
Hypothesis Testing
About a Population Variance
2
2
0
2
1
( )n s
2
2
0
2
1
( )n s
where is the hypothesized value
for the population variance
2
0
•Test Statistic
•Hypotheses
2 2
1 0
: H
H0
2
0
2
:
H0
2
0
2
:
23Slide© Cengage Learning. All Rights Reserved
Two-Tailed Test (continued)
Hypothesis Testing
About a Population Variance
Reject H0 if p-value < a
p-Value approach:
Critical value approach:
•Rejection Rule
2 2 2 2
( 1 /2 ) /2 or a a Reject H0 if
where are based on a
chi-squared distribution with n – 1 d.f.
2 2
( 1 /2 ) /2 and a a
24Slide© Cengage Learning. All Rights Reserved
Recall that Buyer’s Digest is rating
ThermoRite thermostats. Buyer’s Digest
gives an “acceptable” rating to a thermo-
stat with a temperature variance of
0.15
or less.
Hypothesis Testing
About a Population Variance
Example: Buyer’s Digest (B)
We will do a hypothesis test (with
a = 0.10)
to determine whether the ThermoRite
thermostat’s temperature variance is “acceptable”.
25Slide© Cengage Learning. All Rights Reserved
Hypothesis Testing
About a Population Variance
Using the 10 readings, we will
conduct a hypothesis test (with a = 0.10)
to determine whether the ThermoRite
thermostat’s temperature variance is
“acceptable”.
Example: Buyer’s Digest (B)
Temperature 19.7 19.9 20.1 20.7 20.8 19.4 20.1 20.3 19.9 19.6
Thermostat 1 2 3 4 5 6 7 8 9 10
26Slide© Cengage Learning. All Rights Reserved
Hypotheses
2
0 : 0.15H
2
1 : 0.15H
Hypothesis Testing
About a Population Variance
Reject H0 if
2 > 14.684
Rejection Rule
27Slide© Cengage Learning. All Rights Reserved
Degrees
of Freedom .99 .975 .95 .90 .10 .05 .025 .01
5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666
10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209
Area in Upper Tail
Selected Values from the Chi-Squared Distribution Table
For n – 1 = 10 – 1 = 9 d.f. and a = 0.10
Hypothesis Testing
About a Population Variance
Our value
2
.10
28Slide© Cengage Learning. All Rights Reserved
2
0 14.684
Area in Upper
Tail = 0.10
Hypothesis Testing
About a Population Variance
Rejection Region
2 2
2
2
( 1) 9
0.15
n s s
Reject H0
29Slide© Cengage Learning. All Rights Reserved
Test Statistic
2 9(0.216)
12.96
0.15
Hypothesis Testing
About a Population Variance
Because 2 = 12.96 is less than 14.684, we cannot
reject H0. The sample variance s
2 = 0.216 is insufficient
evidence to conclude that the temperature variance
for ThermoRite thermostats is not acceptable.
Conclusion
The sample variance s2 = 0.216
30Slide© Cengage Learning. All Rights Reserved
Using the p-Value
• The sample variance of s2 = 0.216 is insufficient
evidence to conclude that the temperature
variance is not acceptable (>0.15).
• Because the p–value > a = 0.10, we cannot reject the
null hypothesis.
• The rejection region for the ThermoRite
thermostat example is in the upper tail; so the
appropriate p-value is less than 0.90 (2 = 4.168)
and greater than 0.10 (2 = 14.684).
Hypothesis Testing
About a Population Variance
A precise p-value can be found using
Minitab, SPSS or Excel.
31Slide© Cengage Learning. All Rights Reserved
One-Tailed Test
•Test Statistic
•Hypotheses
Hypothesis Testing About the
Variances of Two Populations
Denote the population providing the
larger sample variance as population 1.
2 2
0 1 2: H
2 2
1 1 2: H
2
1
2
2
s
F
s
32Slide© Cengage Learning. All Rights Reserved
One-Tailed Test (continued)
Reject H0 if p-value < a
where the value of Fa is based on an
F distribution with n1 – 1 (numerator)
and n2 – 1 (denominator) d.f.
p-Value approach:
Critical value approach:
•Rejection Rule
Hypothesis Testing About the
Variances of Two Populations
Reject H0 if F > Fa
33Slide© Cengage Learning. All Rights Reserved
Two-Tailed Test
•Test Statistic
•Hypotheses
Hypothesis Testing About the
Variances of Two Populations
H0 1
2
2
2
: H0 1
2
2
2
:
2 2
1 1 2
: H
Denote the population providing the
larger sample variance as population 1.
2
1
2
2
s
F
s
34Slide© Cengage Learning. All Rights Reserved
Two-Tailed Test (continued)
Reject H0 if p-value < ap-Value approach:
Critical value approach:
•Rejection Rule
Hypothesis Testing About the
Variances of Two Populations
Reject H0 if F > Fa/2
where the value of Fa/2 is based on an
F distribution with n1 – 1 (numerator)
and n2 – 1 (denominator) d.f.
35Slide© Cengage Learning. All Rights Reserved
Buyer’s Digest has conducted the
same test, as was described earlier, on
another 10 thermostats, this time
manufactured by TempKing. The
temperature readings of the ten
thermostats are listed on the next slide.
Hypothesis Testing About the
Variances of Two Populations
Example: Buyer’s Digest (C)
We will do a hypothesis test with a = 0.10 to see
if the variances are equal for ThermoRite’s thermostats
and TempKing’s thermostats.
36Slide© Cengage Learning. All Rights Reserved
Hypothesis Testing About the
Variances of Two Populations
Example: Buyer’s Digest (C)
ThermoRite Sample
TempKing Sample
Temperature 19.7 19.9 20.1 20.7 20.8 19.4 20.1 20.3 19.9 19.6
Thermostat 1 2 3 4 5 6 7 8 9 10
Temperature 19.8 19.1 20.1 21.2 20.8 20.9 20.1 19.2 19.6 19.7
Thermostat 1 2 3 4 5 6 7 8 9 10
37Slide© Cengage Learning. All Rights Reserved
Hypotheses
H0 1
2
2
2
: H0 1
2
2
2
:
2 2
1 1 2
: H
Hypothesis Testing About the
Variances of Two Populations
Reject H0 if F > 3.18
The F distribution table (on next slide) shows that with
with a = 0.10, 9 d.f. (numerator), and 9 d.f. (denominator),
F.05 = 3.18.
(Their variances are not equal)
(TempKing and ThermoRite thermostats
have the same temperature variance)
Rejection Rule
38Slide© Cengage Learning. All Rights Reserved
Denominator Area in
Degrees Upper
of Freedom Tail 7 8 9 10 15
8 .10 2.62 2.59 2.56 2.54 2.46
.05 3.50 3.44 3.39 3.35 3.22
.025 4.53 4.43 4.36 4.30 4.10
.01 6.18 6.03 5.91 5.81 5.52
9 .10 2.51 2.47 2.44 2.42 2.34
.05 3.29 3.23 3.18 3.14 3.01
.025 4.20 4.10 4.03 3.96 3.77
.01 5.61 5.47 5.35 5.26 4.96
Numerator Degrees of Freedom
Selected Values from the F Distribution Table
Hypothesis Testing About the
Variances of Two Populations
39Slide© Cengage Learning. All Rights Reserved
Test Statistic
Hypothesis Testing About the
Variances of Two Populations
We cannot reject H0. F = 2.53 < F.05 = 3.18.
There is insufficient evidence to conclude that
the population variances differ for the two
thermostat brands.
Conclusion
2
1
2
2
s
F
s
= 0.546/0.216 = 2.53
TempKing’s sample variance is 0.546
ThermoRite’s sample variance is 0.216
40Slide© Cengage Learning. All Rights Reserved
Determining and Using the p-Value
Hypothesis Testing About the
Variances of Two Populations
• Because a = 0.10, we have p-value > a and therefore
we cannot reject the null hypothesis.
• But this is a two-tailed test; after doubling the upper-
tail area, the p-value is between 0.20 and 0.10. (A
precise p-value can be found using SPSS, Minitab or
Excel.)
• Because F = 2.53 is between 2.44 and 3.18, the area
in the upper tail of the distribution is between 0.10
and 0.05.
Area in Upper Tail .10 .05 .025 .01
F Value (df1 = 9, df2 = 9) 2.44 3.18 4.03 5.35
41Slide© Cengage Learning. All Rights Reserved
End of Chapter 11
1Slide© Cengage Learning. All Rights Reserved
2Slide© Cengage Learning. All Rights Reserve
d
Chapter 14
Simple Linear Regressio
n
Simple Linear Regression Model
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significanc
e
Using the
Estimated Regression Equation
for Estimation and Prediction
Computer Solution
Residual Analysis: Validating Model Assumptions
3Slide© Cengage Learning. All Rights Reserved
Simple Linear Regression Model
Y =
b0
+ b1x +e
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
The simple linear regression model is:
The equation that describes how y is related to x and
an error term is called the regression model.
4Slide© Cengage Learning. All Rights Reserved
Simple Linear Regression Equation
The simple linear regression equation is:
• E(Y) is the expected value of Y for a given x value.
• b1 is the slope of the regression line.
• b0 is the y intercept of the regression line.
• Graph of the regression equation is a straight line.
E(Y) = b0 + b1x
5Slide© Cengage Learning. All Rights Reserved
Simple Linear Regression Equation
Positive Linear Relationship
E(Y)
x
Slope b
1
is positive
Regression line
Intercept
b0
6Slide© Cengage Learning. All Rights Reserved
Simple Linear Regression Equation
Negative Linear Relationship
E(Y)
x
Slope b1
is negative
Regression lineIntercept
b0
7Slide© Cengage Learning. All Rights Reserved
Simple Linear Regression Equation
No Relationship
E(Y)
x
Slope b1
is
0
Regression line
Intercept
b0
8Slide© Cengage Learning. All Rights Reserved
Estimated Simple Linear Regression Equation
The estimated simple linear regression equation
0 1ŷ b b x
• is the estimated value of Y for a given x value.ŷ
• b1 is the slope of the line.
• b0 is the y intercept of the line.
• The graph is called the estimated regression line.
9Slide© Cengage Learning. All Rights Reserved
Estimation Process
Regression Model
Y = b0 + b1x +e
Regression Equation
E(Y) = b0 + b1x
Unknown Parameters
b0, b1
Sample Data:
x y
x1 y1
. .
. .
xn yn
b0 and b1
provide estimates of
b0 and b1
Estimated
Regression Equation
Sample Statistics
b0, b1
0 1ŷ b b x
10Slide© Cengage Learning. All Rights Reserved
Least Squares Method
Least Squares Criterion
min (y yi i )
2
where:
yi = observed value of the dependent variable
for the ith observation
^yi = estimated value of the dependent variable
for the ith
observation
11Slide© Cengage Learning. All Rights Reserved
Least Squares Method
Slope for the Estimated Regression Equation
1 2
( )
( )
( )
i
i
i
x x y y
b
x x
12Slide© Cengage Learning. All Rights Reserved
y-Intercept for the Estimated Regression Equation
Least Squares Method
0 1b y b x
where:
xi = value of independent variable for ith
observation
n = total number of observations
_
y = mean value for dependent variable
_
x = mean value for independent variable
yi = value of dependent variable for ith
observation
13Slide© Cengage Learning. All Rights Reserved
Kako Auto periodically has
a special week-long sale.
As part of the advertising
campaign Kako runs one or
more television commercia
ls
during the weekend preceding the sale. Data from a
sample of 5 previous sales are shown on the next slide.
Simple Linear Regression
Example: Kako Auto Sales
14Slide© Cengage Learning. All Rights Reserved
Simple Linear Regression
Example: Kako Auto Sales
Number of
TV Ads
Number of
Cars So
ld
1
3
2
1
3
14
24
18
1
7
27
15Slide© Cengage Learning. All Rights Reserved
Estimated Regression Equation
ˆ 10 5y x
1 2
( )( )
20
5
( ) 4
i i
i
x x y y
b
x x
0 1 20 5( 2 ) 10b y b x
Slope for the Estimated Regression Equation
y-Intercept for the Estimated Regression Equation
Estimated Regression Equation
16Slide© Cengage Learning. All Rights Reserved
Scatter Diagram and Trend Line
y = 5x +
10
0
5
10
15
20
25
30
0 1 2 3 4
TV Ads
C
a
r
s
S
o
ld
17Slide© Cengage Learning. All Rights Reserved
Coefficient of Determination
Relationship Among SST, SSR, SSE
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
SST = SSR + SSE
2
( )iy y
2ˆ( )iy y
2ˆ( )i iy y
18Slide© Cengage Learning. All Rights Reserved
The coefficient of determination is:
Coefficient of Determination
where:
SSR = sum of squares due to regression
SST = total sum of squares
r2 = SSR/SST
19Slide© Cengage Learning. All Rights Reserved
Coefficient of Determination
r2 = SSR/SST = 100/114 = 0.8772
The regression relationship is very strong; 88%
of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
20Slide© Cengage Learning. All Rights Reserved
Sample Correlation Coefficient
2
1
) of(sign rbr
xy
ionDeterminat oft Coefficien ) of(sign
1
br
xy
where:
b1 = the slope of the estimated regression
equation xbby
10
ˆ
21Slide© Cengage Learning. All Rights Reserved
2
1
) of(sign rbr
xy
The sign of b1 in the equation is “+”.ˆ 10 5y x
= + .8772xyr
Sample Correlation Coefficient
rxy = +0.9366
22Slide© Cengage Learning. All Rights Reserved
Assumptions About the Error Term e
1. The error e is a random variable with mean of zero.
2. The variance of e , denoted by 2, is the same for
all values of the independent variable.
3. The values of e are independent.
4. The error e is a normally distributed random
variable.
23Slide© Cengage Learning. All Rights Reserved
Testing for Significance
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of b1 is zero.
Two tests are commonly used:
t Test and F Test
Both the t test and F test require an estimate of 2,
the variance of e in the regression model.
24Slide© Cengage Learning. All Rights Reserved
Testing for Significance
An Estimate of
2
10
2
)()ˆ(SSE
iiii
xbbyyy
where:
s 2 = MSE = SSE/(n 2)
The mean square error (MSE) provides the estimate
of 2, and the notation s2 is also used.
25Slide© Cengage Learning. All Rights Reserved
Testing for Significance
An Estimate of
2
SSE
MSE
n
s
• To estimate we take the square root of 2.
• The resulting s is called the standard error of
the estimate.
26Slide© Cengage Learning. All Rights Reserved
Testing for Significance: t Test
Hypotheses
Test Statistic
0 1: 0H b
1: 0aH b
1
1
b
b
t
s
27Slide© Cengage Learning. All Rights Reserved
Rejection Rule
Testing for Significance: t Test
where:
t
is based on a t distribution
with n – 2 degrees of freedom
Reject H0 if p-value < or t < -t or t > t
28Slide© Cengage Learning. All Rights Reserved
1. Determine the hypotheses.
2. Specify the level of significance.
3. Select the test statistic.
= 0.05
4. State the rejection rule. Reject H0 if p-value < 0.05 or |t| > 3.182 (with
3 degrees of freedom)
Testing for Significance: t Test
0 1: 0H b
1: 0aH b
1
1
b
b
t
s
29Slide© Cengage Learning. All Rights Reserved
Testing for Significance: t Test
5. Compute the value of the test statistic.
6. Determine whether to reject H0.
t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than 0.02. (Also,
t = 4.63 > 3.182.) We can reject H0.
1
1 5 4.63
1.08b
b
t
s
30Slide© Cengage Learning. All Rights Reserved
Confidence Interval for b1
H0 is rejected if the hypothesized value of b1 is not
included in
the confidence interval for b1.
We can use a 95% confidence interval for b1 to test
the hypotheses just used in the t test.
31Slide© Cengage Learning. All Rights Reserved
Confidence Interval for b1
The form of a confidence interval for b1 is:
11 / 2
bb t s
where is the t value providing an area
of /2 in the upper tail of a t distribution
with n – 2 degrees of freedom
2/
t
b1 is the
point
estimator
is the
margin
of error
1/ 2 b
t s
32Slide© Cengage Learning. All Rights Reserved
Confidence Interval for b1
Reject H0 if 0 is not included in
the confidence interval for b1.
0 is not included in the confidence interval.
Reject H0
= 5 +/- 3.182(1.08) = 5 +/- 3.44
12/1 b
stb
or 1.56 to 8.44
Rejection Rule
95% Confidence Interval for b1
Conclusion
33Slide© Cengage Learning. All Rights Reserved
Hypotheses
Test Statistic
Testing for Significance: F Test
F = MSR/MSE
0 1: 0H b
1: 0aH b
34Slide© Cengage Learning. All Rights Reserved
Rejection Rule
Testing for Significance: F Test
where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n – 2 degrees of freedom in the denominator
Reject H0 if
p-value <
or F > F
35Slide© Cengage Learning. All Rights Reserved
1. Determine the hypotheses.
2. Specify the level of significance.
3. Select the test statistic.
= 0.05
4. State the rejection rule. Reject H0 if p-value < 0.05 or F > 10.13 (with 1 d.f.
in numerator and
3 d.f. in denominator)
Testing for Significance: F Test
0 1: 0H b
1: 0aH b
F = MSR/MSE
36Slide© Cengage Learning. All Rights Reserved
Testing for Significance: F Test
5. Compute the value of the test statistic.
6. Determine whether to reject H0.
F = 17.44 provides an area of 0.025 in the
upper tail. Thus, the p-value corresponding to F
= 21.43 is less than 2(0.025) = 0.05. Hence, we
reject H0.
F = MSR/MSE = 100/4.667 = 21.43
The statistical evidence is sufficient to conclude
that we have a significant relationship between the
number of TV ads aired and the number of cars sold.
37Slide© Cengage Learning. All Rights Reserved
Some Cautions about the
Interpretation of Significance Tests
Just because we are able to reject H0: b1 = 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.
Rejecting H0: b1 = 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-effect
relationship is present between x and y.
38Slide© Cengage Learning. All Rights Reserved
Using the Estimated Regression Equation
for Estimation and Prediction
/ y t sp y p 2
where:
confidence coefficient is 1 – and
t/2 is based on a t distribution
with n – 2 degrees of freedom
Confidence Interval Estimate of E(yp)
Prediction Interval Estimate of yp
39Slide© Cengage Learning. All Rights Reserved
Point Estimation
If 3 TV ads are run prior to a sale, we expect
the mean number of cars sold to be:
ŷ = 10 + 5(3) = 25 cars
40Slide© Cengage Learning. All Rights Reserved
Excel’s Confidence Interval Output
D E F G
1 CONFIDENCE INTERVAL
2 x p 3
3 x bar 2.0
4 x p -x bar
1.0
5 (x p -x bar)
2
1.0
6 (x p -x bar)
2
4.0
7 Variance of y hat 2.1000
8 Std. Dev of y hat 1.4491
9 t Value 3.1824
10 Margin of Error 4.6118
11 Point Estimate 25.0
12 Lower Limit 20.39
13 Upper Limit 29.61
Confidence Interval for E(yp)
41Slide© Cengage Learning. All Rights Reserved
The 95% confidence interval estimate of the mean
number of cars sold when 3 TV ads are run is:
Confidence Interval for E(yp)
25 + 4.61 = 20.39 to 29.61 cars
42Slide© Cengage Learning. All Rights Reserved
Excel’s Prediction Interval Output
H I
1 PREDICTION INTERVAL
2 Variance of yp 6.76667
3 Std. Dev. of y p 2.60128
4 Margin of Error 8.27844
5 Lower Limit 16.72
6 Upper Limit 33.28
7
Prediction Interval for yp
43Slide© Cengage Learning. All Rights Reserved
The 95% prediction interval estimate of the
number of cars sold in one particular week when 3
TV ads are run is:
Prediction Interval for yp
25 + 8.28 = 16.72 to 33.28 cars
44Slide© Cengage Learning. All Rights Reserved
Residual Analysis
ˆ
i iy y
Much of the residual analysis is based on an
examination of graphical plots.
Residual for Observation i
The residuals provide the best information about e .
If the assumptions about the error term e appear
questionable, the hypothesis tests about the
significance of the regression relationship and the
interval estimation results may not be valid.
45Slide© Cengage Learning. All Rights Reserved
Residual Plot Against x
If the assumption that the variance of e is the same
for all values of x is valid, and the assumed
regression model is an adequate representation of the
relationship between the variables, then
The residual plot should give an overall
impression of a horizontal band of points
46Slide© Cengage Learning. All Rights Reserved
x
ˆy y
0
Good Pattern
R
e
si
d
u
a
l
Residual Plot Against x
47Slide© Cengage Learning. All Rights Reserved
Residual Plot Against x
x
ˆy y
0
R
e
si
d
u
a
l
Nonconstant Variance
48Slide© Cengage Learning. All Rights Reserved
Residual Plot Against x
x
ˆy y
0
R
e
si
d
u
a
l
Model Form Not Adequate
49Slide© Cengage Learning. All Rights Reserved
Residuals
Observation Predicted Cars Sold Residuals
1 15
-1
2 25 -1
3 20
-2
4 15 2
5 25 2
Residual Plot Against x
50Slide© Cengage Learning. All Rights Reserved
Residual Plot Against x
TV Ads Residual Plot
-3
-2
-1
0
1
2
3
0 1 2 3 4
TV Ads
R
e
s
id
u
a
ls
51Slide© Cengage Learning. All Rights Reserved
Residual Analysis: Autocorrelation
Often, the data used for regression studies in
business and economics are collected over time.
It is not uncommon for the value of Y at one time
period to be related to the value of Y at previous time
periods.
In this case, we say autocorrelation (or serial
correlation) is present in the data.
52Slide© Cengage Learning. All Rights Reserved
Residual Analysis: Autocorrelation
With positive autocorrelation, we expect a positive
residual in one period to be followed by a positive
residual in the next period.
With positive autocorrelation, we expect a negative
residual in one period to be followed by a negative
residual in the next period.
With negative autocorrelation, we expect a positive
residual in one period to be followed by a negative
residual in the next period, then a positive residual,
and so on.
53Slide© Cengage Learning. All Rights Reserved
Residual Analysis: Autocorrelation
When autocorrelation is present, one of the
regression assumptions is violated: the error terms
are not independent.
When autocorrelation is present, serious errors can
be made in performing tests of significance based
upon the assumed regression model.
The Durbin-Watson statistic can be used to detect
first-order autocorrelation.
54Slide© Cengage Learning. All Rights Reserved
Durbin-Watson Test Statistic
Residual Analysis: Autocorrelation
d
e e
e
t
t
t
n
t
t
n
( )1
2
2
2
1
d
e e
e
t t
t
n
t
t
n
( )1
2
2
2
1
ˆi i ie y yThe ith residual is denoted
55Slide© Cengage Learning. All Rights Reserved
Residual Analysis: Autocorrelation
Durbin-Watson Test Statistic
• A value of two indicates no autocorrelation.
• If successive values of the residuals are close
together (positive autocorrelation is present),
the statistic will be small.
• The statistic ranges in value from zero to four.
• If successive values are far apart (negative
autocorrelation is present), the statistic will
be large.
56Slide© Cengage Learning. All Rights Reserved
Suppose the values of e (residuals) are not
independent but are related in the following manner:
where r is a parameter with an absolute value less than
one and zt is a normally and independently distributed
random variable with a mean of zero and variance of 2.
We see that if r = 0, the error terms are not related.
Residual Analysis: Autocorrelation
et = r et-1 + zt
The Durbin-Watson test uses the residuals to
determine whether r = 0.
57Slide© Cengage Learning. All Rights Reserved
Residual Analysis: Autocorrelation
The null hypothesis always is:
The alternative hypothesis is:
to test for positive autocorrelation
a : 0H r
to test for negative autocorrelation
a : 0H r
to test for pos. or neg. autocorrelation
a : 0H r
r 0 : 0H there is no autocorrelation
58Slide© Cengage Learning. All Rights Reserved
Residual Analysis: Autocorrelation
A Sample Of Critical Values For The
Durbin-Watson Test For Autocorrelation
Significance Points of dL and dU: = .05
Number of Independent Variables
1 2 3 4 5
n dL dU dL dU dL dU dU dU dU dU
15 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 0.56 2.21
16 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 0.62 2.15
17 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 0.67 2.10
18 1.16 1.39 1.05 1.53 0.93 1.69 0.82 1.87 0.71 2.06
59Slide© Cengage Learning. All Rights Reserved
Positive
autocor-
relation
Incon-
clusive
No evidence of
positive autocorrelation
Residual Analysis: Autocorrelation
0 dL dU 2 44-dL4-dU
Negative
autocor-
relation
Incon-
clusive
No evidence of
negative autocorrelation
0 dL dU 2 44-dL4-dU
Incon-
clusive
No evidence of
autocorrelation
0 dL dU 2 44-dL4-dU
Incon-
clusive
Negative
autocor-
relation
Positive
autocor-
relation
60Slide© Cengage Learning. All Rights Reserved
End of Chapter 14
1
Slide © Cengage Learning. All Rights Reserved
2
Slide © Cengage Learning. All Rights Reserved
Chapter 13, Part A
Analysis of Variance and Experimental Design
■ Introduction to Analysis of Variance
■ Analysis of Variance: Testing for the Equality of
k Population Means
■ Multiple Comparison Procedures
3
Slide © Cengage Learning. All Rights Reserved
Introduction to Analysis of Variance
Analysis of Variance (ANOVA) can be used to test
for the equality of three or more population means.
Data obtained from observational or experimental
studies can be used for the analysis.
We want to use the sample results to test the
following hypotheses:
H0: µ1 = µ2 = µ3 = . . . = µk
Ha: Not all population means are equal
4
Slide © Cengage Learning. All Rights Reserved
Introduction to Analysis of Variance
H0: µ1 = µ2 = µ3 = . . . = µk
H1: Not all population means are equal
If H0 is rejected, we cannot conclude that all
population means are different.
Rejecting H0 means that at least two population
means have different values.
5
Slide © Cengage Learning. All Rights Reserved
■ Sampling Distribution of Given H0 is True x
Introduction to Analysis of Variance
µ 1x 3x2x
Sample means are close together
because there is only
one sampling distribution
when H0 is true.
2
2
x n
σ
σ =
6
Slide © Cengage Learning. All Rights Reserved
Introduction to Analysis of Variance
■ Sampling Distribution of Given H0 is False x
µ3 1x 2x3x µ1 µ2
Sample means come from
different sampling distributions
and are not as close together
when H0 is false.
7
Slide © Cengage Learning. All Rights Reserved
For each population, the response variable is
normally distributed.
Assumptions for Analysis of Variance
The variance of the response variable, denoted σ 2,
is the same for all of the populations.
The observations must be independent.
8
Slide © Cengage Learning. All Rights Reserved
Analysis of Variance:
Testing for the Equality of k Population Means
■ Between-Treatments Estimate of Population Variance
■ Within-Treatments Estimate of Population Variance
■ Comparing the Variance Estimates: The
F
Test
■ ANOVA Table
9
Slide © Cengage Learning. All Rights Reserved
Between-Treatments Estimate
of Population Variance
■ A between-treatment estimate of σ 2 is called the
mean square treatment and is denoted MSTR.
2
1
( )
MSTR
1
k
j j
j
n x x
k
=
−
=
−
∑
Denominator represents
the degrees of freedom
associated with SSTR
Numerator is the
sum of squares
due to treatments
and is denoted SSTR
10
Slide © Cengage Learning. All Rights Reserved
■ The estimate of σ 2 based on the variation of the
sample observations within each sample is called the
mean square error and is denoted by MSE.
Within-Samples Estimate
of Population Variance
kn
sn
T
k
j
jj
−
−
=
∑
=1
2)1(
MSE
Denominator represents
the degrees of freedom
associated with SSE
Numerator is the
sum of squares
due to error
and is denoted SSE
11
Slide © Cengage Learning. All Rights Reserved
Comparing the Variance Estimates: The F Test
n If the null hypothesis is true and the ANOVA
assumptions are valid, the sampling distribution of
MSTR/MSE is an F distribution with MSTR d.f.
equal to k – 1 and MSE d.f. equal to nT – k.
n If the means of the k populations are not equal, the
value of MSTR/MSE will be inflated because MSTR
overestimates σ 2.
n Hence, we will reject H0 if the resulting value of
MSTR/MSE appears to be too large to have been
selected at random from the appropriate
F
distribution.
12
Slide © Cengage Learning. All Rights Reserved
Test for the Equality of k Population Means
F =
MSTR/MSE
H0: µ1 = µ2 = µ3 = . . . = µk
H1: Not all population means are equal
■ Hypotheses
■ Test Statistic
13
Slide © Cengage Learning. All Rights Reserved
Test for the Equality of k Population Means
■ Rejection Rule
where the value of F
α
is based on an
F distribution with k – 1 numerator d.f.
and nT – k denominator d.f.
Reject H0 if p-value < α
p-value Approach:
Critical Value Approach: Reject H0 if F > Fα
14
Slide © Cengage Learning. All Rights Reserved
Sampling Distribution of MSTR/MSE
■ Rejection Region
Do Not
Reject H0
Reject H0
MSTR/MSE
Critical Value
Fα
Sampling Distribution
of MSTR/MSE
α
15
Slide © Cengage Learning. All Rights Reserved
ANOVA Table
SST is partitioned
into SSTR and SSE.
SST’s degrees of freedom
(d.f.) are partitioned into
SSTR’s d.f. and SSE’s d.f.
Treatment
Error
Total
SSTR
SSE
SST
k – 1
nT – k
nT – 1
MSTR
MSE
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Square
MSTR/MSE
F
16
Slide © Cengage Learning. All Rights Reserved
ANOVA Table
SST divided by its degrees of freedom nT – 1 is the
overall sample variance that would be obtained if we
treated the entire set of observations as one data set.
With the entire data set as one sample, the formula
for computing the total sum of squares, SST, is:
2
1 1
SST ( ) SSTR SSE
jnk
ij
j i
x x
= =
= − = +∑∑
17
Slide © Cengage Learning. All Rights Reserved
ANOVA Table
ANOVA can be viewed as the process of partitioning
the total sum of squares and the degrees of freedom
into their corresponding sources: treatments and error.
Dividing the sum of squares by the appropriate
degrees of freedom provides the variance estimates
and the F value used to test the hypothesis of equal
population means.
18
Slide © Cengage Learning. All Rights Reserved
■ Example: Kako Manufacturing
Test for the Equality of k Population Means
Susi Kako would like to know if
there is any significant difference in
the mean number of hours worked per
week for the department managers
at her three manufacturing plants
(in Budapest, Balaton, and Bratislava).
19
Slide © Cengage Learning. All Rights Reserved
■ Example: Kako Manufacturing
Test for the Equality of k Population Means
A simple random sample of five
managers from each of the three plants
was taken and the number of hours
worked by each manager for the
previous week is shown on the next
slide.
Conduct an F test using α = 0.05.
20
Slide © Cengage Learning. All Rights Reserved
1
2
3
4
5
48
54
57
54
62
73
63
66
64
74
51
63
61
54
56
Plant 1
Budapest
Plant 2
Balaton
Plant 3
Bratislava Observation
Sample Mean
Sample Variance
55 68 57
26.0 26.5 24.5
Test for the Equality of k Population Means
21
Slide © Cengage Learning. All Rights Reserved
Test for the Equality of k Population Means
H0: µ 1 = µ 2 = µ 3
H1: Not all the means are equal
where:
µ 1 = mean number of hours worked per
week by the managers at Plant 1
µ 2 = mean number of hours worked per
week by the managers at Plant 2
µ 3 = mean number of hours worked per
week by the managers at Plant 3
1. Develop the hypotheses.
n p -Value and Critical Value Approaches
22
Slide © Cengage Learning. All Rights Reserved
2. Specify the level of significance. α = 0.05
Test for the Equality of k Population Means
n p -Value and Critical Value Approaches
3. Compute the value of the test statistic.
MSTR = 490/(3 – 1) = 245
SSTR = 5(55 – 60)2 + 5(68 – 60)2 + 5(57 – 60)2 = 490
= (55 + 68 + 57)/3 = 60 x
(Sample sizes are all equal.)
Mean Square Due to Treatments
23
Slide © Cengage Learning. All Rights Reserved
3. Compute the value of the test statistic.
Test for the Equality of k Population Means
MSE = 308/(15 – 3) = 25.667
SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308
Mean Square Due to Error
(continued)
F = MSTR/MSE = 245/25.667 =
9.55
n p -Value and Critical Value Approaches
24
Slide © Cengage Learning. All Rights Reserved
Treatment
Error
Total
490
308
798
2
12
14
245
25.667
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Square
9.55
F
Test for the Equality of k Population Means
n ANOVA Table
25
Slide © Cengage Learning. All Rights Reserved
Test for the Equality of k Population Means
5. Determine whether to reject H0.
We have sufficient evidence to conclude that the
mean number of hours worked per week by
department managers is not the same at all 3 plant.
The p-value < 0.05, so we reject H0.
With 2 numerator d.f. and 12 denominator d.f.,
the p-value is .01 for F = 6.93. Therefore, the
p-value is less than 0.01 for F = 9.55.
n p –Value Approach
4. Compute the p –value.
26
Slide © Cengage Learning. All Rights Reserved
5. Determine whether to reject H0.
Because F = 9.55 > 3.89, we reject H0.
n Critical Value Approach
4. Determine the critical value and rejection rule.
Reject H0 if F > 3.89
Test for the Equality of k Population Means
We have sufficient evidence to conclude that the
mean number of hours worked per week by
department managers is not the same at all 3 plant.
Based on an F distribution with 2 numerator
d.f. and 12 denominator d.f., F.05 = 3.89.
27
Slide © Cengage Learning. All Rights Reserved
Multiple Comparison Procedures
■ Suppose that analysis of variance has provided
statistical evidence to reject the null hypothesis of
equal population means.
■ Fisher’s least significant difference (LSD) procedure
can be used to determine where the differences
occur.
28
Slide © Cengage Learning. All Rights Reserved
Fisher’s LSD Procedure
■ Test Statistic
1 1MSE( )
i j
i j
x x
t
n n
−
=
+
■ Hypotheses
µ µ−0 : i jH
µ µ≠: a i jH
29
Slide © Cengage Learning. All Rights Reserved
Fisher’s LSD Procedure
where the value of ta/2 is based on a
t distribution with nT – k degrees of freedom.
■ Rejection Rule
Reject H0 if p-value < α
p-value Approach:
Critical Value Approach:
Reject H0 if t < -ta/2 or t > ta/2
30
Slide © Cengage Learning. All Rights Reserved
Fisher’s LSD Procedure
Based on the Test Statistic xi – xj
■ Test Statistic
_ _
/2
1 1LSD MSE( )
i j
t n nα= +
where
−i jx x
Reject H0 if > LSD −i jx x
■ Hypotheses
■ Rejection Rule
µ µ−0 : i jH
µ µ≠: a i jH
31
Slide © Cengage Learning. All Rights Reserved
Fisher’s LSD Procedure
Based on the Test Statistic xi – xj
■ Example: Kako Manufacturing
Recall that Susi Kako wants to know
if there is any significant difference in
the mean number of hours worked per
week for the department managers
at her three manufacturing plants.
Analysis of variance has provided
statistical evidence to reject the null
hypothesis of equal population means.
Fisher’s least significant difference (LSD) procedure
can be used to determine where the differences occur.
32
Slide © Cengage Learning. All Rights Reserved
Fisher’s LSD Procedure
Based on the Test Statistic xi – xj
For α = 0.05 and nT – k = 15 – 3 = 12
degrees of freedom, t.025 = 2.179
LSD = + =2 179 25 667 15
1
5 6 98. . ( ) .LSD = + =2 179 25 667
1
5
1
5 6 98. . ( ) .
/2
1 1LSD MSE( )
i j
t n nα= +
MSE value was
computed earlier
33
Slide © Cengage Learning. All Rights Reserved
Fisher’s LSD Procedure
Based on the Test Statistic xi – xj
■ LSD for Plants 1 and 2
• Conclusion
• Test Statistic
−1 2x x = |55 – 68| = 13
Reject H0 if > 6.98 −1 2x x
• Rejection Rule
µ µ−0 1 2: H
µ µ≠1 2: aH
• Hypotheses (A)
The mean number of hours worked at Plant 1 is
not equal to the mean number worked at Plant 2.
34
Slide © Cengage Learning. All Rights Reserved
■ LSD for Plants 1 and 3
Fisher’s LSD Procedure
Based on the Test Statistic xi – xj
• Conclusion
• Test Statistic
−1 3x x = |55 – 57| = 2
Reject H0 if > 6.98 −1 3x x
• Rejection Rule
µ µ−0 1 3: H
µ µ≠1 3: aH
• Hypotheses (B)
There is no significant difference between the mean
number of hours worked at Plant 1 and the mean
number of hours worked at Plant 3.
35
Slide © Cengage Learning. All Rights Reserved
■ LSD for Plants 2 and 3
Fisher’s LSD Procedure
Based on the Test Statistic xi – xj
• Conclusion
• Test Statistic
−2 3x x = |68 – 57| = 11
Reject H0 if > 6.98 −2 3x x
• Rejection Rule
µ µ−0 2 3: H
µ µ≠2 3: aH
• Hypotheses (C)
The mean number of hours worked at Plant 2 is
not equal to the mean number worked at Plant 3.
36
Slide © Cengage Learning. All Rights Reserved
Type I Error Rates
■ The experimentwise Type I error rate gets larger for
problems with more populations (larger k).
αEW = 1 – (1 – α)(k – 1)!
■ The comparisonwise Type I error rate α indicates the
level of significance associated with a single pairwise
comparison.
■ The experimentwise Type I error rate αEW is the
probability of making a Type I error on at least one of
the (k – 1)! pairwise comparisons.
37
Slide © Cengage Learning. All Rights Reserved
End of Chapter 13, Part A