# Need discussion for BUS308 Statistics for Managers details & lecture below.

Read Lecture 2. React to the material in the lecture. Is there anything you found to be unclear? How could you use these ideas within your degree area?

BUS308Week 3 Lecture 2

Examining Differences – ANOVA and Chi Square

Expected Outcomes

After reading this lecture, the student should be familiar with:

1. Conducting hypothesis tests with the ANVOA and Chi Square tests

2. How to interpret the Analysis of Variance test output

3. How to interpret Determining significant differences between group means

4. The basics of the Chi Square Distribution.

Overview

This week we introduced the ANOVA test for multiple mean equality and the Chi Square

tests for distributions. This lecture will focus on interpreting the outcomes of both tests. The

process of setting them up will be covered in Lecture 3 for this week.

ANOVA

Hypothesis Test

The week 3 question 1 asks if the average salary per grade is equal? While this might

seem like a no-brainer (we expect each grade to have higher average salaries), we need to test all

assumed relationships. This is much like our detectives saying “we need to exclude you from the

suspect pool; where were you last night?” This example will, of course use the compa-ratio

instead of the salary values you will use in the homework.

The ANOVA test is found in the Data | Analysis tab.

Step 5 in the hypothesis testing process asks us to “Perform the test.” Here is a screen

shot of the ANOVA output for a test of the null hypothesis: “All grade compa-ratio means are

equal.” For this question we will be using the ANOVA-Single Factor option as we are testing

mean equality for a single factor, Grades. We will briefly cover the other ANOVA options in

Lecture 3 for this week.

Note that The ANOVA single factor output includes the test name, a summary table, and

an ANOVA table. The summary table that gives us the count, sum, average, and variance for the

compa-ratios by the analysis groups (in this case our grades). Note that we are assuming equal

variances within the grades within the population for this example, and your assignment. This

may not actually be true for this example (note the values in the Variance column), but we will

ignore this for now. ANOVA is somewhat robust around violations on the variance equality

assumption – means it may still produce acceptable results with unequal variances. There is a

non-parametric alternate if the variances are too different, but we do not cover it in this course.

Please note that the column and row values are present in this screenshot. These will be needed

as references in question 2.

The next table is the meat of the test. While for all practical purposes, we are only

interested in the highlighted p-value, knowing what the other values are is helpful. When we

introduced ANVOA in lecture 1, we discussed the between and within groups variation. As you

recall, the between groups focused on the data set as a single group and not distinct groups. For

the Between Groups row, we have an Sum of Squares (SS) value, which is a raw estimate of the

variation that exists. The degrees of freedom (df) for Between Groups equals the number of

groups (k) we have minus 1 (k-1), which equals 5 for our 6 groups. The Mean Square variation

estimate equals the SS divided by the df.

The Within Group focuses on the average variation for all our groups. SS gives us the

same raw estimate as for the BG row. The df for Within Groups is the total count (N) minus the

number of groups (N-k), or 44 for our 50 employees in the 6 groups. MSwg equals SS/df.

The F statistic is calculated by dividing the MSbg by MSwg. The next column gives us

our p-value followed by the critical value of F (when the p-value would be exactly 0.05). The

total line is the sum of the SS values and the overall df which equal the total count -1 (N – 1).

(As with the t and F tests, we could make our decision by comparing the calculated F

value (in cell O20, with critical value of F in cell Q20. We reject the null when the calculated F

is greater than the critical F. The critical value of F or any statistic in an Excel output table is the

value that exactly provides a p-value equaling our selected value for Alpha. However, we will

continue to use the P-value in our decisions.)

Now that we have our test results, we can complete step 6 of the hypothesis testing

procedure.

Step 6: Conclusions and Interpretation

What is the p-value? Found in the table, it is 0.0186 (rounded).

(Side note: at times Excel will produce a p-value that looks something like 3.8E-14. This

is called the scientific or exponential format. It is the same as writing 3.8 * 10-14 and

equals 0.000000000000038. A simple way of knowing how many 0s go between the

decimal point and the first non-zero number is to subtract 1 from the E value, so with E-

14, we have 13 zeros. At any rate, any Excel p-value using E-xx format will always be

less than 0.05.)

Decision: Reject the null hypothesis.

Why? P-value is less than 0.05.

Conclusion: at least one mean differs across the grades.

Question 2: Group Comparisons

Now that we know at least one grade compa-ratio mean is not equal to the rest, we need

to determine which mean(s) differ. We do this by creating ranges of the possible difference in

the population mean values. Remember, that our sample results are only a close approximation

of the actual population mean. We can estimate the range of values that the population mean

actually equals (remember that discussion of the sampling distribution of the mean from last

week). So, using the variation that exists in our groups, we estimate the range of differences

between means (the possible outcomes of subtracting one mean from another).

The following screen shot shows a completed comparison table for the grade related

compa-ratio means.

Let’s look at what this table tells us before focusing on how to develop the values

(covered in Lecture 3 for this week). Looking at the Groups Compared Column, we see the

comparison groups listed, A-B for grades A and B, A-C for grades A and C, etc. The next

column is the difference between the average compa-ratio values for each pair of grades. The T

value column is the value for a 95% two tail test for the degrees of freedom we have. (Lecture 3

discusses how to identify the correct value). Note that it is the same value for all of our

comparison groups, the explanation comes in Lecture 3.

The next column, labeled the +/- term, is the margin of error that exists for the mean

difference being examined. This is a function of sampling error that exists within each sample

mean. These are all of the values we need to create a range of values that represent, with a 95%

confidence, what the actual population mean differences are likely to be. We subtract this value

from the mean (in column B) to get our low-end estimate (Low column values), and we add it to

the mean to get our high-end estimate (High column values).

Now, we need to decide which of these ranges indicates a significantly different pair of

means (within the population) and which ranges indicate the likelihood of equal population

means (non-significant differences). This is fairly simple, if the range contains a 0 (that is, one

endpoint is negative and the other is positive), then the difference is not significant (since a mean

difference of 0 would never be significant). Notice in the table, that the A-B, A-C, and A-D

range all contain 0, and the results are not significant different. The A-E and A-F comparisons,

however have positive values for each end, and do not contain 0; these means are different in the

population.

We now know how to interpret an ANOVA table and an accompanying table of

differences for significant mean differences between and among groups.

Chi Square Tests

With the Chi Square tests, we are going to move from looking at population parameters,

such as means and standard deviations, and move to looking at patterns or distributions. The

shape or distribution of variables is often an important way of identifying differences that could

be important. For example, we already suspect that males and females are not distributed across

the grades in a similar manner. We will confirm or refute this idea in the weekly assignment.

Generally, when looking at distributions and patterns we can create groups within our

variable of interest. For example, the Grades variable is already divided into 6 groups, making it

easy to count how many employees exist in each group. But what about a continuous variable

such as Compa-ratio, where no such clear division into separate groups exists. This is not a

problem as we can always divide any range of values into groups such as quartiles (4 groups) or

any other number of distinct ranges. Most variables can be subdivided this way.

The Chi Square test is actually a group of comparisons that depend upon the size of the

table the data is displayed in. We will examine different tables and tests in Lecture 3, for this

lecture we want to focus on how to interpret the outcome of a Chi Square test – as outcomes are

the same regardless of the table size. The details of setting up the data will be covered in Lecture

3.

Example – Question 3

The third question for this week asks about employee grade distribution. We are

concerned here about the possible impact of an uneven distribution of males and females in

grades and how this might impact average salaries. While we are concerned about an uneven

distribution, our null hypothesis is always about equality, so the null would respond to a question

such as are males and females distributed across the grades in a similar pattern; that is, we are

either males or females more likely to be in some grades rather than others.

A similar question can be asked about degrees, are graduate and undergraduate degrees

distributed across grades in a similar pattern? If not, this might be part of the cause for unequal

salary averages.

The step 5 output for a Chi Square test is very simple, it is the p-value, the probability of

getting a chi square value as large or larger than what we see if the null hypothesis is true.

That’s it – the data is set up, the Chi Square test function is selected from the Fx statistical list,

and we have the p-value. There is not output table to examine.

So, for an examination of are degrees distributed across grades in a similar manner, we

would have an actual distribution table (counts of what exists) looking like this:

Place the actual distribution in the table below.

A B C D E F Total

UnderG 7 5 3 2 5 3 25

Grad 8 2 2 3 7 3 25

Total 15 7 5 5 12 6 50

This table would be compared to an expected table where we show what we expect if the null

hypothesis was correct. (Setting up this table is discussed in Lecture 3.) Then we just get our

answer.

So, steps 5 and 6 would look like:

Step 5: Conduct the test. 0.85 (the Chi Square p-value from the Chisq.Test function

Step 6: Conclusion and Interpretation

What is the p-value? 0.85

Decision on rejecting the null: Do Not Reject the null hypothesis.

Why? P-value is > 0.05.

Conclusion on impact of degrees? Degrees are distributed equally across the grades

and do not seem to have any correlation with grades. This suggests they are not an

important factor in explaining differing salary averages among grades.

Of course, a bit more of getting the Chi Square result depends on the data set up than

with the other tests, but the overall interpretation is quite similar – does the p-value indicate we

should reject or not reject the null hypothesis claim as a description of the population?

Summary

Both the ANOVA and Chi Square tests follow the same basic logic developed last week

with the F and t-tests. The analysis is started with developing the first four (4) hypothesis testing

steps which set-up the purpose and decision-making rules for the analysis.

Running the tests (step 5) will be covered in the third lecture for this week.

Step 6 (Interpretation) is also done in the same fashion as last week. Look for the p-value

for each test and compare it to the alpha criteria. If the p-value is less than alpha, we reject the

null hypothesis.

When the null is rejected in the ANOVA test, we then create difference intervals to

determine which pair of means differs. If any of these intervals contains the value 0 (meaning

one end is a negative value and the other is a positive value), we can say that those means are not

significantly different within the population.

The Chi Square has two tests that were presented. One test looks at a single group

compared to an expected distribution, which we provide. The other version compares two or

more groups to an expected distribution which is generated by the existing distributions. How

these “expected” tables are generated will be discussed in Lecture 3 for this week.

Please ask your instructor if you have any questions about this material.

When you have finished with this lecture, please respond to Discussion thread 2 for this

week with your initial response and responses to others over a couple of days.

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.