# Need discussion for BUS308 details & lecture below formula has to be inputed in excel within discussion read instructions thoroughly & do discussion correctly the 1st time around

Read Lecture 3. React to the material in this lecture. Is there anything you found to be unclear about setting up and using Excel for these statistical techniques? Looking at the data, present an ANOVA on the differences by grade mean on a variable—other than compa-ratio or salary—you feel might be important in answering our equal pay for equal work question. Interpret your results.

BUS308 Week 3 Lecture 3

Setting up ANOVA and Chi Square

Expected Outcomes

After reading this lecture, the student should know how to:

1. Set-up the data for an ANOVA analysis.

2. Set-up and perform an ANOVA test.

3. Set-up a table of mean differences.

4. Set-up and perform a Chi Square test.

Overview

Setting up the ANOVA test is quite similar to how the t and F tests were set up. The Chi

Square set-up is a bit more complex, as it is not found in the Data Analysis list of tools.

ANOVA

The set-up of ANOVA within Excel is very similar to how we set up the F and T tests

last week; place the data set in appropriate groups and then use the ANOVA input box. One

difference this week is that the Fx (or Formulas) list does not include an option for ANOVA, so

we need to use the Data | Analysis tools.

Data Set-up

Single Factor. As with the t-test, ANOVA has a couple of versions to select between.

Each is used to answer slightly different questions, and these will be examined below. The most

significant difference lies in the data table used for each version.

We will be working primarily with the ANOAV Single Factor, which deals with

examining possible differences between the means of a single variable within different groups.

A question of whether or not the mean compa-ratios are equal across the grades is an example of

the kind of question answered with this approach.

Question 1. Week 3’s first question is about salary mean equality across the grades. Our

lecture example will deal with compa-ratio mean equality across the grades. The set-up for the

Single Factor ANOVA we just went through assumed this. The initial steps in the hypothesis

testing process are similar to what we have done before:

Step 1: Ho: All Compa-Ratio means are equal across the grades

Ha: At least one compa-ratio mean differs

Notice that these are the standard ANOVA – Single factor null and alternate hypothesis

statements that identify the specific variable (compa-ratio) and statistic (mean) that we

are testing, and merely say “no difference” and “at least one differs.”

Step 2: Alpha = 0.05

Step 3: F statistic and Single Factor ANOVA; used to test multiple means

Step 4: Decision Rule: Reject Ho if the p-value < 0.05

Step 5: Conduct the test – place the test function in cell K08.

As with the F and T tests, we need to group the data into distinct groups. For example, if

we are going to test the compa-ratio mean across grades, then the data must be set-up in a

table with grades across the top, as in the screen shot below. Note that as was done with

the T and F test input data, the raw or initial data was listed and then sorted. Values were

then copied into related groups; we used male and female groups for the F and t tests and

grade groups for this test.

Test Set-up. Go to the Data | Analysis and select ANOVA Single Factor gives us the

following input screen. This is completed for our compa-ratio test. Notice that the entire table

range, including the column labels, is entered into the Input box as a single entry.

We do need to check the labels box, as Excel needs to be explicitly told that some of the

data range is not numeric. Our normal alpha value of 0.05 is automatically filled in, but you can

change this value. The last entry is where we want to the output table to start. As with the T and

F tests, this cell is the upper left corner of the output and is given as K08 for question 1 this

week.

Clicking OK gives us the data output that we examined in Lecture 2 for this week.

Here is a video on ANOVA: https://screencast-o-matic.com/watch/cb6jecIkLg

Other ANOVA Versions

Two-Factor. While we will not work with either of the two-factor forms, a brief

explanation will help show the difference and usefulness of these forms. The ANOVA Two-

Factor without replication allows us to test the means of two factors at once. An example of this

kind of question might be are the compa-ratio means equal across grades when sorted by gender?

The outcome of this test gives us the significance of each group (grade average and gender

average) as if the other variable was held constant. In other words, it removes some of the

variation on what we are measuring.

A data set-up table for this version might look like this:

A B C D E F

Male

Female

The values in each cell would be a measure for each cell. For example, male salary

in grade A. For situations where we have multiple values, we could use the average

or median value.

https://screencast-o-matic.com/watch/cb6jecIkLg

For the with replication version, the more significant test is to see if the variables interact

with each other rather than simply examining mean equality. This requires multiple data points

A B C D E F

Male

Female

The values in each cell would be measures for each group. For example, we could

use the minimum, maximum, and mean for each grade and gender group.

for each of the groups (females in grade C, for example). For more information on these

versions of ANOVA, please go to some web-based statistics sites. The data input for the with

and without replication are quite similar – the entire data input box including any top and side

labels.

Question 2. This question asks for the mean difference intervals so we can identify the

significantly different grade means. The formula for developing the range to examine mean

differences is: (mean1 – mean2) +/- t* sqrt(mse*(1/n1 + 1/n2)).

Ok – breathe. Most of the values we need are in the ANVOA table, and Excel will let us

set up a table and do all these actions one step at a time. The completed table was examined in

Lecture 2, so let’s step back from the complex table and develop it one cell at a time. This is the

same as the old adage “How do we eat an elephant? One bit at a time.”

Before starting on the table, we need to recall where the different outcomes are from our

ANVOA table. See the screenshot below – this is the same as in Lecture 2, with some of the

values we will be using bolded for easy identification.

Now, let’s take a look at setting up the values in the table. The following screen shot is

of the same table, but different cells display the formulas used to create the values rather than the

values. This can help us see the relationships.

Let’s take a look at each column and see how the calculations are set up.

Row 31 contains the names of the values we want in each column, starting with the

groups we want to compare. Going down Column B, we simply list the grade pairs we will look

at in each row, such as A-B (comparing grades A and B), etc. Just set up a convenient label

telling us what the row refers to, something like A-B.

Column C, labeled Mean Diff., is where set up our first values, the difference between

the two means. The generic formula is =ABS(Mean1 – Mean2).

• the ABS function provides the absolute value, always providing a positive

difference and eliminating any negative signs (as if we always subtracted the

smaller value from the larger value). This is not needed; the author just likes it.

• The A-B row (row 32) shows =ABS($N$11 – N12). These cell references refer

to the mean values located in the Summary table from our ANOVA results. Cell

N11 refers to the mean of grade A, while cell N12 refers to the mean of grade B.

• The next row contains the reference to grade A (N11) but changes the second

reference to the location of the grade C mean (N13). Repeat this pattern all the

way down the table, referencing the two grades being compared in each row.

• Don’t worry the dollar signs right now, we will cover these after we have

completed a full row of formulas.

In column D, we have the t-value used to provide our confidence in the range outcomes.

Since we are building our ranges based on the ANOVA results, the df for every row remains the

same rather than changing with each pair of grades. The formula for finding a specific t-value

based on a desired probability and df is “=T.INV.2T(alpha, df).” We are using the 2-tail value

for t as we want to cut off values at other ends for our range, rather than just focusing on one

end. Since we want a 95% interval (consistent with our alpha = 0.05), use .05 to tell Excel what

percent to cut off from the extremes (0.025 on each tail) from the t distribution. The df for each

pair is the df associated with the within groups variation, found in cell M23 and equaling 44.

The resulting cell formula becomes: =T.INV.2T(0.05, $M$23). We can use the same copying

approach to copy this value to the end of the table.

Note: The lower the alpha used, the higher our level of confidence and the larger the

range. A 100% confidence results in a range from – infinity to + infinity, of no help whatsoever.

A larger alpha value gives us a smaller interval and less confidence that the range contains the

actual difference of the means within the population.

Column E develops our range constant that is added and subtracted to the mean. This is

similar to a margin of error that we discussed earlier. The general formula cell entries in this

row is: =t*SQRT(MSwg* (1/count1 + 1/count2)), where MSwg is the MS value for the Within

Group row from the ANVOA table, and 1 and 2 refer to the groups being compared. For the

comparison of Grades A and F shown in Row 36, the specific formula shows,

=D36*SQRT($N$23 * (1/$l$12 + 1/L17)).

• D36 refers to the T-value found in column D. (You could enter the actual t value, use

an absolute reference to a single cell, or use the value in each row – they all work.)

The SQRT is Excel’s code for taking the square root of whatever is within the ( ).

• The $N$23 is the cell reference to the MSwg measure in the ANOVA table. This is

the common variance estimate for the samples, so adding the $ makes sense.

• The (1/$L$12 and 1/L17) are the references to the counts for grades A and E that are

found in the Summary part of the ANOVA output.

Now, let’s develop the ranges. The low-end value of the difference range (column F)

equals the Mean Diff. (column C) minus the +/- term (column E), so the formula for row 31

would be =C31 – E31; for row 32, the values change to =C32 – E32, etc. The high-end value

(column H) for the range equals column C + column E, or =C31 + E31, etc.

We discussed how to interpret the significance of each interval in Lecture 2 and will not

repeat that here.

Now, to make things a bit easier. Notice the dollar signs around some of the cell

references. For example, the dollar signs found in N12; these are made by typing N12 and then

pressing F4. These tell Excel if we copy this cell keep N12 as a constant. Without these, copying

the cell would change values we want to remain the same. What does this mean? If you want to

try copying cells rather than writing the formula in each cell, try the following.

• Using just cell C31, move the cursor to the bottom right corner of the cell. When it is

place correctly at the corner, the cursor will change to a small +.

• When you see the +, depress the left mouse button and pull the cursor down one cell

to C32.

• You should now see =($N$11 – N13) rather than =($N$11 – N12). The relative

reference of cell N12 went down 1 row as you pulled the cell down one row.

What this means is that after you set up the entire row 31 (from column C thru column I) you can

highlight the entire range, place the cursor on the far-right corner, and after you see the + drag all

of the cells down from row 31 to row 38, where we start to compare grade B. First, delete the

mess in row 37, which is just a separator row. Then in cell C38, change the references to $N$12

and N13 (for grades B and C), do the same in cell E38 to the related counts in $L$12 and L13.

Highlight and drag the range down to C42 and make the appropriate adjustments again. Do this

until you have reached and edited the cells in row 49. You should now have all the table

calculations done, and are ready to make your comparison decisions in columns J and K.

Note when your cursor is on a cell value with an = in it, such as =Nll in a formula

pressing F4 will place $ signs in front of both the row and cell. Pressing F4 a second time places

the $ sign in front of the row value; pressing it a third time places the $ sign in front of the

column value. Pressing it a fourth time removes all of the $ signs.

Chi Square Tests

This lecture will look at setting up two related Chi Square tests. The first, called the

Goodness of Fit Test, involves a single row of counts, such as with the die example we discussed

in the Lecture 2 for week 1. This form of the test would answer a question such as are the dice

we tossed fair – that is did we get the distribution for each face that we expected? The second is

called the Contingency Table analysis involves multiple rows in the table, such as we might have

if we looked at how degrees (undergraduate and graduate) are distributed across the grades.

Both Chi Square statistical values are calculated the same way. Both of these tests will use

counts (how many) rather than the measurements (how much) we have been using to date.

The Chi Square tests use the difference between an actual distribution/counts and an

expected distribution to reach decisions on the similarity or difference in patterns. The Chi

Square distribution examines the differences between what we see (actual counts per group) and

what we expect in each group. Once we have these two counts, the actual calculation of the Chi

Square statistic (which Excel can do for us automatically) is:

∑ (Observed count – Expected count)^2/(Expected count).

This is simply the sum (∑) of the squared differences between what we saw and what we

expected) divided by our expected count. The Chi Square statistic is also evaluated with a

degree of freedom measure that varies with each test.

The expected values are obviously critical to outcomes with this test, and they can be

developed in several different ways if they are not already known. These approaches depend

upon the complexity of the situation and will be discussed below.

Two input tables are required for all Chi Square test set-ups. The first table is the

“actual” or “observed” counts, a table showing how many items fit into each group we care

about. The second is a table showing the expected counts.

Example

The assignment does not ask for a simple 1 row table of counts, a Goodness of Fit test;

but we will start with this simple example first. In the goodness of fit test, our table is a single

row showing the counts. Recall from week 1 that we looked at how many times each value from

the showing faces of a pair of dice showed up when we tossed the pair of dice 50 times. We got

the following distribution of scores.

Outcomes from tossing a pair of dice

Count showing 2 3 4 5 6 7 8 9 10 11 12

Frequency seen 1 2 4 3 9 12 7 5 4 1 2

In the language of a Chi Square test, the frequency seen row would be called the “Actual”

data, it is simply the count of how many we see that fit any criteria, such as sum of dots on the

showing faces of the dice. Typically, the Actual counts are easy to get, simply count what is

seen.

The “Expected” counts are sometimes harder figure out. For example, what is the

expected number of 2’s when we toss the dice 50 times? Why? We could say we expect each

value to occur the same number of times and use 50/11 (number of possible outcomes) as the

expected value. In some situations, this would be fine (note: expected values do not need to be

whole numbers). In this case, that is probably not the best choice. Fortunately, probability

theory can give us an answer.

There are 36 possible outcome combinations – we have 6 outcomes for die 2 for

each of the 6 outcomes on die 1; 6 * 6 = 36. So for a run of 36 tosses, a “perfect”

distribution showing each of the possible outcomes would look like:

Count showing 2 3 4 5 6 7 8 9 10 11 12

Expected 1 2 3 4 5 6 5 4 3 2 1

To translate this to a run of 50, we would multiply each frequency by 50/36. So our Expected

outcome would look like (rounded to 2 decimal points):

Count showing 2 3 4 5 6 7 8 9 10 11 12

Actual 1 2 4 3 9 12 7 5 4 1 2

Expected 1.39 2.78 4.17 5.56 6.94 8.33 6.94 5.56 4.17 2.78 1.39

Going to the Fx Statistical list and picking CHISQ.TEST(actual range, expected range), we get a

value of 0.877. This is the probability of getting a value up to what we have. Since we are

interested in the probability of getting a value as large or larger, to get the p-value we use

=CHISQ.TEST(actual range, expected range) (this result is our p-value).

So, if we were testing a null hypothesis of No difference from Expected, we would not reject this

null. Based on these 50 tosses, the dice cannot be said to be unfair or biased. You could

calculate the Chi Square statistic long hand; for this example it would be:

Chi = ((1-1.39)^2)/1.39 + ((2 – 2.78)^2)/2.78 + … + ((2-1.39)^2)/1.39 = 5.2. The Chi Square df

for a single row table is (number of cells – 1) or (11 – 1) = 10 for this example. Now, Excel can

find the Chi Square value using the p-value found from CHISQ.TEST by using

CHISQ.INV.RT(probability, df). Since we have the p-value which is the probability in the right

tail of our distributions, we use the RT tail of the Chi Square distribution to find the cut-off value

of 5.2 = CHISQ.INV.RT(0.877,10) = 5.2.

Example – Question 3

The third question for this week asks about employee grade distribution. We are

concerned here about the possible impact of an uneven distribution of males and females in

grades and how this might impact average salaries. If employees are not distributed in a similar

pattern, we can expect that this grade difference could be a factor in the observed salary

difference.

While we are concerned about an uneven distribution, our null hypothesis is always about

equality, so the null would respond to a question such as are males and females distributed across

the grades in a similar pattern; that is, we are either males or females more likely to be in some

grades rather than others.

A similar question can be asked about degrees, are graduate and undergraduate degrees

distributed across grades in a similar pattern? If not, this might be part of the cause for unequal

salary averages.

The data for this test would be found in a contingency table with rows showing the

degree and columns showing grades. Set-up of this table is fairly simple and involves copying

the variables we want (grade and Deg, in this example), sorting them by grade and then Deg, and

simply counting how many fit each cell (degree – grade match). Our final actual count table is

shown below.

Deg Grade

0 A

Place the actual distribution in the table below. 0 A

A B C D E F Total 0 A

UnderG 7 5 3 2 5 3 25 0 A

Grad 8 2 2 3 7 3 25 0 A

Total 15 7 5 5 12 6 50 0 A

The second table for each form is the expected value table. It will have the same row and

column totals as the actual table has. This is an important check to ensure that the tables are set

up correctly. The set-up of the Contingency Table Expected values is slightly more complicated

than for the Goodness-of-Fit expected table.

In general, we do not have a specific expected frequency count for these tables, so we

need to create them using the information available to us from the Actual table. For each cell in

the Expected table, we multiply its row total times its column total and divide by the grand total

(50). For example, in the above table, the expected entry for Grad in grade D would be the Grad

total (25) times the Grade D total (5) divided by the grand total (50); this gives us 25*5/50 = 2.5

for that cell. We can use the cell formulas shown below to create the first column values, and

drag them across the rows thru grades B to F. See the screen print below.

Now that we have our data tables created, we can look at performing the Chi Square

Contingency Table analysis using the hypothesis testing procedure.

Step 1: Ho: Grad and Undergrad degrees are distributed in a similar fashion.

Ha: Grad and Undergrad degrees are not distributed in a similar fashion.

(Note that an alternate wording could be that Degrees and grades are unrelated (not correlated)

versus the alternate that they are significantly correlated. Both interpretations are appropriate for

the contingency table test.)

Step 2: Alpha = 0.05

Step 3: Chi Square statistic and Contingency table test, used for count data

Step 4: Decision Rule: Reject the null hypothesis if the p-value is < 0.05.

Step 5: Conduct the test.

As with the F and T-tests, we use the Fx (or Formulas) list of statistical tools. The CHISQ.TEST

function has inputs for the actual and Expected ranges and returns the p-value. This data entry is

exactly the same as we saw in the F and T-test examples last week. The Chi square does not

have a function listed in the Data | Analysis functions. We get a p-value of 0.85 (rounded) using

=CHISQ.TEST(L58:Q59,L63:Q64). Note that the row and total column values are NOT

included in the data ranges. (See the above screen print of the input tables.)

Step 6: Conclusion and Interpretation

What is the p-value? 0.85

Decision on rejecting the null: Do Not Reject the null hypothesis

Why? P-value is > 0.05.

Conclusion on impact of degrees? Degrees are distributed equally across the grades and do not

seem to have any correlation with grades. This suggests they are not an important factor in

explaining differing salary averages among grades.

Here is a video on Chi Square: https://screencast-o-matic.com/watch/cb6jffIk8T

NOTE: There are some issues with both versions of the Chi Square test when we have

20% or more of the cells with expected values less than 5. In most cases, this presents a p-value

that is too small, potentially causing incorrect rejections of the null. There are conflicting

recommendations on what to do with this issue. Some say make what is called the Yates’

correction (do a search on this), others say combine columns to reduce the number of small cells,

and still others say just be aware of this if your rejection p-value is close to alpha. We are

choosing not to emphasize this issue, but merely leave it up to you to investigate if it becomes a

concern in your professional life.

Question 4

Having looked at grade mean differences for compa-ratios and educational degree

distribution, neither seems to help answer our equal pay question. The compa-ratios show that

not all of the grades have an equal average, with some senior grades having higher averages than

the lower grades. This could be due to poorly aligned midpoints (higher midpoints would lower

the average compa-ratios in those grades) or to a pattern of paying relatively more for the higher

graded work. We do not know right now. At any rate, since none of this week’s analysis

focused on gender, we have not really gained any additional insights into pay practices based on

gender.

Summary

In most respects setting up the ANOVA test is similar to what we did with the F and t-

tests. The principle difference lies with the number of columns we have. The input data table

for ANVOA should have multiple columns each headed by a group name (such as A, B, C, etc.

for our grades) with the data values for each group listed below (such as all grade A salaries

listed under the A label, etc.). The set-up window for ANVOA will have the entire data range

(labels and values) entered as a single range (such as G1:K12). ANOVA is found in the Data |

Analysis tab.

The set-up for the Chi Square tests is a bit more complicated as it involves not only the

actual data being set up in one table but also the expected values that are used for comparison

purposes being set up in a separate table. Both tables consist of counts rather than actual values

form the data set – for example, the number of employees in each grade.

The expected distribution table set differs depending upon which Chi Square test we are

doing. If we are comparing a single distribution (such as number of employees per grade), we

would set-up a single row expected table that matched the distribution we were concerned with;

possibly equal number in each grade, or a decreasing number in each grade such as a pyramid

might have, or more in the middle, etc.

If, however, we are looking at comparing several distributions, such as male and females

across the grades; the expected table is generated using the actual distribution. For each cell in

the expected table, we would find the value of the row total * the column total divided by the

grand total for the respective values in the actual table.

In both cases, the Chi Square set-up (found in the Fx or Formula links) asks us to identify

the range of the actual values and then the range of the expected values.

Please ask your instructor if you have any questions about this material.

When you have finished with this lecture, please respond to Discussion thread 3 for this

week with your initial response and responses to others over a couple of days before reading the

third lecture for the week.

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.