Assignment - Paper Answers

SOCW 6311: Social Work Research in Practice II

Please note that this is a master level course so master level work. Please check the grammar, use APA format and you have to use the reading that I have provided to you. You must answer all the questions that I post. Thank you.

Week 4

Readings

• Dudley, J. R. (2014). Social work evaluation: Enhancing what we do. (2nd ed.) Chicago, IL: Lyceum Books.

o Chapter 9, “Is the Intervention Effective?” (pp. 226–236: Read from “Determining a Causal Relationship” to “Outcome Evaluations for Practice”)

• Plummer, S.-B., Makris, S., & Brocksen S. (Eds.). (2014b). Social work case studies: Concentration year. Baltimore, MD: Laureate International Universities Publishing. [Vital Source e- reader].

Read the following section:

o “Social Work Research:

Chi Square

” (pp. 63–65)

• Stocks, J. T. (2010). Statistics for social workers. In B. Thyer (Ed.), The handbook of social work research methods (2nd ed., pp. 75–118). Thousand Oaks, CA: Sage.

• Trochim, W. M. K. (2006). Internal validity. Retrieved fromhttp://www.socialresearchmethods.net/kb/intval.php

Be sure to click on all the links in the narrative.

• Document:

Week 4: A Short Course in Statistics Handout

(PDF)

Week 4: A Short Course in Statistics Handout

This information was prepared to call your attention to some basic concepts underlying

statistical procedures and to illustrate what types of research questions can be

addressed by different statistical tests. You may not fully understand these tests without

further study. However, you are strongly encouraged to note distinctions related to type

of measurement used in gathering data and the choice of statistical tests. Feel free to

post questions in the “Contact the Instructor” section of the course.

Statistical symbols:

µ mu (population mean)

α alpha (degree of error acceptable for incorrectly rejecting the null hypothesis,

probability that results are unlikely to occur by chance)

≠ (not equal)

≥ (greater than or equal to)

≤ less than or equal to)

ᴦ (sample correlation)

ρ rho (population correlation)

t r (t score)

z (standard score based on standard deviation)

χ
2

Chi square (statistical test for variables that are not interval or ratio scale, (i.e.

nominal or ordinal))

p (probability that results are due to chance)

Descriptives:

Descriptives are statistical tests that summarize a data set.

They include calculations of measures of central tendency (mean, median, and mode),

and dispersion (e.g., standard deviation and range).

Note: The measures of central tendency depend on the measurement level of the

variable (nominal, ordinal, interval, or ratio). If you do not recall the definitions for these

levels of measurement, see

http://www.ats.ucla.edu/stat/mult_pkg/whatstat/nominal_ordinal_interval.htm

You can only calculate a mean and standard deviation for interval or ratio scale

variables.

For nominal or ordinal variables, you can examine the frequency of responses. For

example, you can calculate the percentage of participants who are male and female; or

the percentage of survey respondents who are in favor, against, or undecided.

Often nominal data is recorded with numbers, e.g. male=1, female=2. Sometimes

people are tempted to calculate a mean using these coding numbers. But that would be

meaningless. Many questionnaires (even course evaluations) use a likert scale to

represent attitudes along a continuum (e.g. Strongly like … Strongly dislike). These too

are often assigned a number for data entry, e.g. 1–5. Suppose that most of the

responses were in the middle of a scale (3 on a scale of 1–5). A researcher could

observe that the mode is 3, but it would not be reasonable to say that the average

(mean) is 3 unless there were exact differences between 1 and 2, 2 and 3 etc. The

numbers on a scale such as this are ordered from low to high or high to low, but there is

no way to say that there is a quantifiably equal difference between each of the choices.

In other words, the responses are ordered, but not necessarily equal. Strongly agree is

not five times as large as strongly disagree. (See the textbook for differences between

ordinal and interval scale measures.)

Inferential Statistics:

Statistical tests for analysis of differences or relationships are Inferential,

allowing a researcher to infer relationships between variables.

All statistical tests have what are called assumptions. These are essentially rules that

indicate that the analysis is appropriate for the type of data. Two key types of

assumptions relate to whether the samples are random and the measurement levels.

Other assumptions have to do with whether the variables are normally distributed. The

determination of statistical significance is based on the assumption of the normal

distribution. A full course in statistics would be needed to explain this fully. The key point

for our purposes is that some statistical procedures require a normal distribution and

others do not.

Understanding Statistical Significance

Regardless of what statistical test you use to test hypotheses, you will be looking to see

whether the results are statistically significant. The statistic p is the probability that the

results of a study would occur simply by chance. Essentially, a p that is less than or

equal to a predetermined (α) alpha level (commonly .05) means that we can reject a null

hypothesis. A null hypothesis always states that there is no difference or no relationship

between the groups or variables. When we reject the null hypothesis, we conclude (but

don’t prove) that there is a difference or a relationship. This is what we generally want to

know.

Parametric Tests:

Parametric tests are tests that require variables to be measured at interval or ratio

scale and for the variables to be normally distributed.

These tests compare the means between groups. That is why they require the data to

be at an interval or ratio scale. They make use of the standard deviation to determine

whether the results are likely to occur or very unlikely in a normal distribution. If they are

very unlikely to occur, then they are considered statistically significant. This means that

the results are unlikely to occur simply by chance.

The T test

Common uses:

 To compare mean from a sample group to a known mean from a population

 To compare the mean between two samples

o The research question for a t test comparing the mean scores between

two samples is: Is there a difference in scores between group 1 and group

2? The hypotheses tested would be:

H0: µgroup1 = µgroup2

H1: µgroup1 ≠ µgroup2

 To compare pre- and post-test scores for one sample

o The research question for a t test comparing the mean scores for a

sample with pre and posttests is: Is there a difference in scores between

time 1 and time 2? The hypotheses tested would be :

H0: µpre = µpost

H1: µpre ≠ µpost

Example of the form for reporting results: The results of the test were not statistically

significant, t (57) = .282, p = .779, thus the null hypothesis is not rejected. There is not a

difference in between pre and post scores for participants in terms of a measure of

knowledge (for example).

An explanation: The t is a value calculated using means and standard deviations and a

relationship to a normal distribution. If you calculated the t using a formula, you would

compare the obtained t to a table of t values that is based on one less than the number

of participants (n-1). n-1 represents the degrees of freedom. The obtained t must be

greater than a critical value of t in order to be significant. For example, if statistical

analysis software calculated that p = .779, this result is much greater than .05, the usual

alpha-level which most researchers use to establish significance. In order for the t test

to be significant, it would need to have a p ≤ .05.

ANOVA (Analysis of variance)

Common uses: Similar to the t test. However, it can be used when there are more than

two groups.

The hypotheses would be

H0: µgroup1 = µgroup2 = µgroup3 = µgroup4

H1: The means are not all equal (some may be equal)

Correlation

Common use: to examine whether two variables are related, that is, they vary together.

The calculation of a correlation coefficient (r or rho) is based on means and standard

deviations. This requires that both (or all) variables are measured at an interval or ratio

level.

The coefficient can range from -1 to +1. An r of 1 is a perfect correlation. A + means that

as one variable increases, so does the other. A – means that as one variable increases,

the other decreases.

The research question for correlation is: “Is there a relationship between variable 1 and

one or more other variables?”

The hypotheses for a Pearson correlation:

H0: ρ = 0 (there is no correlation)

H1: ρ ≠ 0 (there is a real correlation)

Non-parametric Tests

Nonparametric tests are tests that do not require variable to be measured at

interval or ratio scale and do not require the variables to be normally distributed.

Chi Square

Common uses: Chi square tests of independence and measures of association and

agreement for nominal and ordinal data.

The research question for a chi square test for independence is: Is there a relationship

between the independent variable and a dependent variable?

The hypotheses are:

H0 (The null hypothesis) There is no difference in the proportions in each category of

one variable between the groups (defined as categories of another variable).

Or:

The frequency distribution for variable 2 has the same proportions for both categories of

variable 1.

H1 (The alternative hypothesis) There is a difference in the proportions in each category

of one variable between the groups (defined as categories of another variable).

The calculations are based on comparing the observed frequency in each category to

what would be expected if the proportions were equal. (If the proportions between

observed and expected frequencies are equal, then there is no difference.)

See the SOCW 6311: Week 4 Working With Data

Assignment

Handout to explore the

Crosstabs procedure for chi square analysis.

Other non-parametric tests:

Spearman rho: A correlation test for rank ordered (ordinal scale) variables.

• Document: Week 4 Handout: Chi-Square findings (PDF)

Week 4 Handout: Chi-Square Findings

The chi square test for independence is used to determine whether there is a relationship between

the two variables that are categorical in the level of measurement. In this case, the variables are:

employment level and treatment condition. It tests whether there is a difference between groups.

The research question for the study is: Is there a relationship between the independent variable,

treatment, and the dependent variable, employment level? In other words, is there a difference in

the number of participants who are not employed, employed part-time and employed full-time in

the program and the control group (i.e., waitlist group)?

The hypotheses are:

H0 (The null hypothesis): There is no difference in the proportions of individuals in the three

employment categories between the treatment group and the waitlist group. In other words, the

frequency distribution for variable 2 (employment) has the same proportions for both categories

of variable 1 (program participation).

** It is the null hypothesis that is actually tested by the statistic. A chi square statistic

that is found to be statistically significant, (e.g. p< .05) indicates that we can reject the

null hypothesis (understanding that there is less than a 5% chance that the relationship

between the variables is due to chance).

H1 (The alternative hypothesis): There is a difference in the proportions of individuals in the

three employment categories between the treatment group and the waitlist group.

** The alternative hypothesis states that there is a difference. It would allow us to say

that it appears that the treatment (voc rehab program) is effective in increasing the

employment status of participants.

Assume that the data has been collected to answer the above research question. Someone has

entered the data into SPSS. A chi-square test was conducted, and you were given the following

SPSS output data:

Assignment

Working With Data

Statistical analysis software is a valuable tool that helps researchers perform the complex calculations. However, to use such a tool effectively, the study must be well designed. The social worker must understand all the relationships involved in the study. He or she must understand the study’s purpose and select the most appropriate design. The social worker must correctly represent the relationship being examined and the variables involved. Finally, he or she must enter those variables correctly into the software package. This assignment will allow you to analyze in detail the decisions made in the “Social Work Research: Chi Square” case study and the relationship between study design and statistical analysis. Assume that the data has been entered into SPSS and you’ve been given the output of the chi-square results. (See Week 4 Handout: Chi-Square findings).

a 1-page paper of the following:

• An analysis of the relationship between study design and statistical analysis used in the case study that includes: ◦An explanation of why you think that the agency created a plan to evaluate the program

• An explanation of why the social work agency in the case study chose to use a chi-square statistic to evaluate whether there is a difference between those who participated in the program and those who did not (Hint: Think about the level of measurement of the variables)

• A description of the research design in terms of observations (O) and interventions (X) for each group.

• Interpret the chi-square output data. What do the data say about the program?

Statistics for Social
Workers

J. Timothy Stocks

tatrstrrsrefers to a branch ot mathematics dealing ‘”‘th the direct de–

tion of sample or population characteristics and the an.ll)’5i• of popula·
lion characteri>tics b)’ inference from samples. It co•·ers J wide range of
content, including th~ collection, organization, and interpretJtion of
data. It is divided into two broad categoric>: de;cnptive >lathrics

and

inferential >lJt ost ics.

Descriptive statistics involves the CQnlputation of statistics or pnr.1meters to describe a
sample’ or a popu lation _~ All t he data arc available and used in <.omputntlon o f t hese aggregate characteristics. T his may involve reports of central tendency or v.~r i al>il i ty of
single variables (univariate statistics). ll also may involve enumeration of the I’Ciation-
sh ips between or among two or moo·e variables’ (bivariate or multivariJte stot istics}.
Descriptiw statistics arc used 10 provide information about a large m.b> of data in a form
that ma)’ be easily understood. The defining characteristic of descriptive ;tJtistks b that
the product is a report, not .on inference.

Inferential statisti<> imolvc’ the construction of a probable description of the charac·
teristics of a population b•sed on s.unple data. We compute statistics from .1 pJrtial;et of
the population data (a samplt) to estimate the population parameters. Thrse ting mathe-
m:uics tO provide evidence for the exi

Descriptive Statistics

Measures of Central Tendency
Measures of central tenden’)’ are individual numbers that typify the tot.tl set of ~cores.
The three most frequently used mca>urcs of centraltendenq are the arithmetic mean, the
mode, and the median.

Arir!Jmeric .\1ea11. The arithmetic mean usually is simply called the mca11. It also is called
the m-erage. It is computed b)’ adding up all of a set of scores and dwidmg by the number
of scores in the set. The algebraic representation of this is

76 PA11 f

• OuANTifAllVi AffkOAGHU: fouHo~;noM Of Ot.r”‘ CO ltf(TIO’J

~, =l:: X ,
1

where 11 represents the popu I at ion mean, X represems an individual score, and rr is t he
number of scores being adde(l.

The formula for the sample mean is the same except t hat the mean is represented by
the variable lener with a bar above it:

– l:;X X= –.
II

Following are t he numbers of class periods skipped by 20 seventh-graders d uring
I week: {1, 6,2,6, 15,2(),3,20, 17, 11, 15, 18,8,3, 17, 16, 14, 17,0, 101. Wecomputethe
mean by adding up the class periods missed and dh•iding by 20:

l:;X 219 •
J.l = — = – = 10.9o.

II 2

Mode. The mode is the most frequently appearing score. It really is not so much a measure
of centrality as it is a measure of typicalness. It is found by o rganizing scores int o a fre-

quency distribution and determining which score has t he greatest fre-

TABLE 6 . 1 Truancy Scores
quency. Table 6. 1 displays the truancy scores arranged in a frequency
distribution.

Score

20
19

18
1

IS
1

10
9
8
7

2
1
0

frequ ency

2
0
1

3
1

2
I

0
0
l

I
0
1

0
2
0
0
2

Because 17 is the most frequently appearing number, the mode (or
modal number) of class periods skipped is 17.

Unlike the mean or median, a distribution o f scores can have more
than one mode.

,llfedinrr. lf we take all the scores in a set of scores, place t hem in o rder
from least to greatest, and count in to the middle, then the score in the
middle is the median. This is easy enough if there is an odd number of
scores. However, if there is an even number of scores, then there is no
single score in the middle. In this case, t he two middle scores are
selected, and their average is the median.

There a.re 20 scores in the previous example. The median would be
the a”erage of the lOth and lith scores. We usc t he frequency table to
find these scores, which are 14 and J 5. T hus, the median is 14.5.

Measures of Variabi li ty
Whereas measures of central tendency are used to estimate a typical
score in a dimibution, measures of variability may be thought of ns a
way in which to measure departu re from typic<~lness. They pro"i

information on how “spread out” scores in a d istribution are.

If
10

:.aJ
13

c .. …nu 6 • STAnsnu t<~~ Soc&AL Wouta~ 77

to the maximum ( highest) score. h is obtained by subtracting the 111ini murn score flom
lhe maximum ~cor~.

Let us compute th.- rang.- for the following dJt.l ~ct:

/1, 6, 10, 14, 18,22/.

‘T’he n1inimum i!) 2, and tht.” tnJximum is 22:

Range = 22 – 2 20.

Sum ofSquaus. The sum of squares is a measure of the total amount of variability in” set
of scores. Jts na me tells how to wmpute it. Smu ofsqunres is short (or sum ofsqumed dc1ti
til ion scores. It is represented by the S)’lnbol SS.

The formulas for sample and population sums ot squares are the same except for sam-
ple and populat•on mean symbob:

SS = I(X ~tl’

Using the dJtJ set fo r t11e range, the sum of squnres would be computed as i

‘ldble6.2.

V.~rinuce. Another name for variance i~ mean square. This is short for mean of squared
devintron score<. 1l1is is obtained by dividi ng the sum of squares by the number of scores (11). It is a me,tsure of the average amount of variabilit y associated with each score in a set of scores. The population variance fOI'mu la is

ss
a2= -.

whc1e cr2 is the syn>bol for populn tion variance, SS is the symbol fo r sum of squares, and
11 st,uJds for th e number of scores in the population.

The variance for the example we used to compute
sum of squares would be

TAOLE 6.2 Computing the Sum of Squares

X X m

2 tO

6 6

10 ]

18 >6

12 10

NOTE, !X~ 72; n- 6; ~ • 12; l:(X – p)’ ~ 780

(X – m)’

100

4
4
36
100

2 280
(J –= 46.67.

The sample variJnce is not an unbi.as.ed estin1a1o1
of thf population variance. If we compute the vari
anccs for these samples using the SS/11 formula, then
the- san1ple vadn nccs wil1 average o ut smaller than
the population val’iance. For th is rc:~son, the sample
variance is computed differently froru the population
variance:

ss
sl = – – .

II – I

CHA,Ut 6 • Sr”n~nn HJa SOCIAl wouus 77

to the maximum (highc;t) score. h is obtained by subtracting the minimum scoo·c from
the maximum score.

let us compute the rnnge for the following data set:

12. 6, 10, 14, 18.221 .

The minimum is 2. and the maximum is 22:

Range 22-2 = 20.

Sum of8qo~t~res. The ,um of squares;, a measure of the total amoun t o f variability in a set
of score~. It> name tells how to compute it. Sum of 51Jo.arcs is short for ;um of squared dco•i-
atiou scores. It is reprewnt<>tl by the symlxll SS.

The formulas for <.omple and popul.llion sums of squares are the ~arne except tor S

ss l.(X -X)’

Usi ng the data set for the range, t he su m of squares would be computed ns i n
T.,b)e 6.2.

\~rta11u. Another name for variance is mean square. This is short for menn of 51JIUtred
devontw11 scores. This os obtained by dividing the sum of squares by the number of ><.ores (n). It is a measure of t he averoge ••m ount of var iability associated w ith each score in a set of scores. T he popula tio n variance for m11ln is

ss
¢ =- .

where o ‘ is th e symbol foo· population v•o·ia.nc.e, SS is t he symbol fo o· Slim o f squares. a11d
11 stands for the numbet of scores in the population.

The •-..ria nee for the example we used to compute
sum of squar~s would be

TABu 6.2 Computing the Sum of Squares

X X-m

2 – 10

6 -6

10 -2

14 +2

18 +6

22 +10

HOT£: r.x- 72: n; ti; p = 12: l:lX Ill’= 250.

(X- m)’

100

j(,

4
4

tOO

280
cr2 =

6
~ 46.67.

The snmple variance is uot Jn \Ulbiased estimalor
o f’ t he population variance. Jf we com pute t he vari-
ances for these samples using th” SShr formu la, then
the sample variances will average out smaller than
thc population ••ariance. For this reJson, the sample
Vllriance is computed differe ntly from the population
variance:

ss r =-.
n – J

78 PAll I • QuAiuu.ot.nvt A”MACH(S.:. FouHDAIIOif”i Of O.AIA CoLLfcnow

The n – 1 i> a correction fac tor for this tendency to undcre>tima te. I t is c.1 lled
degree• of freedon1. If

.1 280
> =–

6 – 1
280 6
5 = 5.

Sumdard Deviatron. Although the variance is a measure of average variability associJtc’ on a d ifferent sc.lle from the score itself. Tlw variance measures avel·
age squared deviation from the mean. To get ” me

Using the same .ct of numbers as before, the population standard deviation would be

cr -/46.67 = 6.83 .

and the sample st.mdard deviation would be

s J56 = 7.’18.

For a normally d istribured set of scores, n ppwximately 68% of all ;cores will be within
ll •tanrlard deviation of 1 he mean.

Measures of Relationship
T.1ble 6.3 shows the relat iortship between number of >treSsors experienC of corporal punishment during the same wee.k.

One can use ,·eg,·cssion procedures to dcrivr the line that best fo ts the data. This line is
rcfel’l’ed to as a regression line (or line of best ii 1 o r prediction I inc). Su ch a line bas been
.CJiculated for the example plot. It has a Y ime,·cept of – 3.555 t11id a slope of + 1.279. T his
gives us the prediction equation of

Y,_. = 3.555 t 1.279X,

where Yis fi-equ ency o f

Slope is the ch•ngc in Y for a unit increase in X. So, the slope of 11.279 meam that””
increase in stres.ors (X) of 1 will be accomp.ulicd by an increase in predicted frequency of
~orporal punishment (I’) of + 1.279 incidents per week. If the slope were a negati’e
number, then an increase in X would be accompanied by a pred ictcd decrease in Y.

The equation does not give the actual value of Y (called the obt.tined or obserwd
score); rather, it giv~s a prediction of the value of Y for a certain value of X. Fo r

–
Cu,”na 6 • SrAliSnc

r iQUIO 6.1 8

Frequency ol Stre

Punishment

~
6 0

c . Y P’td; – 3.555 + 1.279X ..
” 5 0 r:r
e …
c 4 ..
E
.r:

3 til
·;:
” Q.

2 0

0
0 1 2 3 4 5 6 7 8 9

Stressors

example, if X were 3 , rhen we would predi<.t t hal Y would be - 3.555 + 1.279(3) ~ - 3.5

+ 3.837 ~ 0.282.

Tuu 6 . 3 frequency of
Sttessors and Use of
Corporal Punishment

Sue-ssors Pun1.shm~nt

3 0

4 }

s 3
6 4

7 ~

8 6

q 8

1() 9

T he regression li ne is the line that predicts Y >UCh t hat t he error
of p redictio n is minim ized. Error is d efined as the d ifference
between the predicted score and the obtaine

E= Y Y..,.. ..

\\~1en X= 4, there arc two obL1ined ”alues of Y: I and 2. The
p redicted value of Y is

Y,,…t = – 3.555 I 1.279( 4) = – 3.555 + S. l l6 ~ 1.56 1.
rhe error of prediction i~ E =I – 1.561 = -0.561 fu r Y = I, and

E – 2 – 1.561 = +0.<139fnr Y=2 . If we square each error difference score and sum the squares.

then we get a quantity called the enor sum of sq.ure;., which i;.
r~presented b)•

SSI: L( Y – Y,..,.,)’.

T he regressi011 line io !he o ne line that give> the sm.11lcst va lue
fo r SSt.

80 P~oar 1 • QUAtHnAnvE A ,ROACHES: FouNOAHO~r~~$ of DAtA Conte I!Otf

The SSE is a measure of the lOla I variability of obtained score values around their pre-
dicted values. There are two other ;un” of squares !hat are important to undcr>tanding
correlation and regri’SSion.

The total sum of squ.m:s (SS1) i$ a measure of the total variabilit)’ of the obtained
score values around the mean oft he obtained scores. The SST is represented by

SST = L(Y-Y)’.

The remaining sum of squa 1·cs is coiled the regression sum of S

SSR L( v, …. – Y)’.

The SSR is a measure of the tot.d variabil ity of the predicted score values around the
mean of the obtained scores.

An important and interesting feature of the>e three sums of squares is that the sum of
the SSR and SSE is equal to the SS1:

SST SSR- SSE.

This leads us to three o ther imponnnt stat istics: t he proportion of variance explJined
(I’VE) , the correlation coefficient, ond the standard error of estim ate.

Proportion of \Iarin nee Expluir~ctl. T ht I’VE is a measure of how good Lhc rcs,·cssion line
p red icts obtained scores. The values of PV£ 1·ange fro m 0 ( no p red ictive value) to I ( pre-
diction with perfect accurJLy). The cqunt ion fo r PV£ is

SSR
J>vE – – ·

SST

There also is a computational equation for the PVE. which is

where

PVE – ( SSXY )’
SSX • SSY’

SSXY is the “co variance” ~um of ;qua res: l.(X – X)( Y – Y ),
SSX is t he sum of squares for vn rinble X: IlX – XJ’, and
SSYis the sum of squares for varinblc Y: 2:( Y – Y)’.

The procedure fo r computing these sums of squares is outlined in Table 6.4.
The proportion of v.triance in the freque ncy of corporal punishment thnl may be

explained by stressors experienced ;,

( 4 6L5)1 3782.25
l’VE = – = = 0 .953.

(48.1)(825) 3968.25

TABLE 6.4 Computation of r2 (PVE)

y Y – y (Y- Y)’ X X x (X – X)’ (X X)( Y Y)
3 -33 10 .89 0 -4 5 20 .2 5 +1405

4 -2 3 5.29 -lS 12 .25 +80S

4 -23 529 2 -15 6 .25 < 5.

5 – Ll 1.69 3 1.5 2.25 • 1.95

6 -ol 0 .09 < -o5 0.25 0 IS

7 +0./ 0.49 5 ·10.5 0.25 035

8 + II 2.89 6 ; 1.5 2 .25 • 2.55

7 TO.! 0.49 7 12.5 6 .25 11.75

9 +27 7.29 R t3.5 12.25 -19.

10 +3 I 13 69 9 “‘5 20.25 16.

NOTE: Y – 6.3; SSY – 48. l; X = 4.5; S5X = 82.5; S5XY • •6 l S

The PVEsometimes is en lied th~ coefticient of determination and is represented by the
symbol r’.

Correlation Co~ffirirm. A correlation coellicient also is a 111easure of th e strength of rela-
tionship between two variables. The correlation cocfficicnt is represented by the letter r
and can take on values between – 1 and + I inclu~ivc. The correlation coefficient always has
the same sign a.< the slope. If one squares a correlation coefficient, then

SSXY
r = -vr.;;S50sx””•””S;;;S;;o;Y

For our examph: data, the correlation coefficient would be

+61.5 ~ 61.5 +61.5
R — = = = -0.976 .

./(18.1)(82.5) ¥’3968.25 62.994

Standard Error of Em mate. The standard error of estimate is the

The fi rst s tep is to compute the variance error (s:.J:

..1
‘E

SSE
n-2

Notice that the value for degrees of freedom is 11 2 rather than 11 – l. The reason why
we subtract 2 in this instance is that variance error (and standard Cfi’Or of c:stimatc) is a
statistic describing characteristics of two variables. T hey deal with the error involved in
the prediction of Y (one variable) from X {the other v.triable) .

‘l he standard error of estimate is the square root of the variance error:

Sf.= …j(ij.

The standard error of estimate tells us hO\v spread out scores are with respect to their
predicted values. If the error· scores ( E = Y- Y,.o~> are normally distributed around the
prediction line, then about 68% of actual scores will foil between ±I :;,; of their predicted
values.

We can calculate the standard error of estimate using the foUowing computing formula:

( n-1) ( I — r 2)(——-) , u-2
where

s,. is the standard deviation of Y,
r is the correlation coefficient fo r X and Y, and

n is tl1e sample si7.c.

for the example dat..1, this would be

S£ = 2.3lli ((J — .953) :~ = D = 2.311 ((0.47)~)
= 2.311J0.053 = (0.230)(0.727) = 0 .167.

Inferential Statistics: Hypothesis Testing

The Null and Alternative Hypotheses

Classical ;tatistical hypothesis testing is based on the evaluation of two rival hypothescs:
the null hypothesis and the alrermltive hypothesis.

We try to dete<:t relationsh ips by identifying changes that are unl ikely to have occurred simp!)• bccau~e of random fluctuat ions .

The null hypothesis is the hypotltcsis that there is no relationship between two vari-
ables. This implies that if the null hypothesis is true, then any apparent relationship in
Mmples i> the resuh of random flu ctuations in the dependent meas ure or sampling error.

Statistical hypothesis tests arc carried out on samples. for example, in nn experi-
ment!// two-gro11p posttcst-only design, there would be a sample whose members
received an intervention and a sample whose members did not. Both of these would be
probability samples from a larger population. The interven tion >ample would reprcse>11

Figure 6.2
The Null Hypothesis
and Type I Error

C14Anu 6 • StAJtmu f

the popula tion of all individuals as if they had received the i.ntervt•ntion. Th e control
sample would be repre·ed the inten-emion.

lf the intervention had no effect, then th e populations would be iden tical. However, it
would be unlikely that two samples from two ident ical popula tions would he ident ical. So,
although the sample mea ns would be diffe rent, they would not rcpre>CtH any effect of t he
independent variable. The apparent difference would be due to sampling error.

Statistical hypothC$is tests invoh·e e’-aluating evidence from .amples to make inler-
ences about populations. II is for this reason that the null hypothe>i> is a statement about
population parameters. For example, o ne null hypothe>iS for I he previous design cou ld be
stated as

or as

H, : ll = ~to = 0.

H, stands for the null hypothC$iS. It is J letter H with J ” ro subscript. It is a statement
t.ha t the m~ans of the experime ntal ( Mean I) and cont rol ( Mean 2) popultnio’ls arc eq ual.

To <:>tablish that a relat ionship exists between th e in tervention (independent Vilfi:tble)
and the outcome (measure o f the dependent variable), we must collect evidenOthesis.

Strictly speaking, we do not mak~ J decision as to whether the nul] hypoth eoi:. is
correct. \Ve evaluate the evidence to determine the ext<·nL to which it •cncls to confirn"' or disconfi rm the null hypothesis. If the evide nce wct·e suc.h that it is unlikely that an observed relationship would have ocwrrcd as the re.ult of sampling e r ror, then we would reject the null hypothesis. If the eviden«: were more ambiguous, then we would f.1il to reject the null hypothesis. The terms re;err and fail to rrjm carry the implicit under

vVhen we reject the n ull hypothesh and it is true, we ltJve committed a Type I error. By
setting certain statistic•! criteria beforehand, we can ~”tablish the prombiliry that we “•ill
commit a ‘JYpe l error. \\’c decide what proportion of the time we arc willing to commit a
Type l error. This proportion ( proba bility) is called a l1>ha (o:). If we n1e willing to reject
the null hypothesis when it is true onl)• I in 20 times, thc11 we set our a level at .05. If’ on ly I
in 100 time>, then we set it at .0 I.

Tbe probability that we will fail to rejeOthesis when it is true (correct
deci;ion) ts 1 – a (Figure 6.2).

Situahon: NULL HYPOTH ESIS TRUE

Deas1on ACSlllt

Reject H, 1’ype I Error
ex • the probability or rejecting the Null Hypo thes is when it is true

Fail to Reject H, Correct

Decision

I a= the probability of not rejecttng the Nun Hypothesis wllcn

11 is true.

84 PAII t I • Qv.umr:.WI~ A PI’I\OACHH: Fourwt. lt

Figure G.:Y
The Nu ll Hypothesis
and u Level

The fol!pwing hypothesis would be evaluated by c<>mparing the difference between
sample means:

If’ we carried out multiple samples from populations with identical. n>eans (the null
hypothesis was true), then we would find that most of the vallles for the differences
between the sample means wou ld not be 0. Figure 6.3 represents a distribm ion of the dif·
fercn ces between sample means drawn from identical populations.

The mean d ifference for the total distribution of samp le means is 0, and the standard
deviation is 5. I f the differences are normally distributed, then approximately 68% of
lhese differences will be between – 5 (z = – 1) and +5 (z= +l). Fully 95% of the differences
in the distribution will fall between the range of -9.8 ( z =-1 .96} and +9.8 (z = +1 .96). If
we drew a random sa mple from each population, it ‘~ould not be unusual to find a di ffer-
ence between sample means of as mnch as 9 .8, even though the population means were
the same.

On the other hand, we would expect to fin d a difference more than 9.8 about 1 in 20
times. If we set our criterion fo r rejecting the null hypothesis such that a mean difference
must be greater than +9.8 or less than – 9.8, tben we would commit a Type I error only 1
in 20 times (.OS) on average. O ur (J. level ( the probability of committing a Type l error)
would be set at .05.

The probability that a relationship or a difference of a certain size would be seen in a
sample if the nuU hypothesis were true is represented by p. To reject the null hypothesis,
p mu~t be less than or equal to

1 – u = .95

– 4 – 3 – 1 0 +1 +3 +4
z

– 20 – 15 – 10 – 5 0 +5 +10 +15 +20
X, -x2
a = .05

CH..,tU 6 • Sr.r.nsnu •o• SoctAt Wo~·~ui 85

Rejecting the H0: We believe that it i~ likely that the relationship in the sample IS gcncr
alizablc to the population.

Not rejutmg the H,; We do not believe that we have >umcient e1•idence to draw infer-
ences about the populat ion.

For the previous example, let us imagine that we ha-e set a= .OS. Al;o, imagine thJt we
obtained a difference betwt-en the sample me.ms of 10. The probability that we would
obtain a difference of +10 or – 10 would be equivalent to the probability of a z ~core
g reater than +2.0 plus the probabilit y of a z ~core less th.111 – 2.0 o r .0228 + .0228 = .0156.
This is o ur p value; p = .0456. Because p

Some texts create the impression that the alternative (or research or experimental)
hypothes~ b simply tbc opposite of the null hypothesis. In fact, sometimes d1is nail·c
alternative h)pothesis is used. However, it generally is not particularly useful to
researchers. Usually. we nrc inrertsted i n defecting an in lcrvention effccl of a particu l :~r
size. On certnin measu,·c,, we would be interested in .mwll effects (<:.g., death rate), whereas on others, o nly l~rger effects would be of interest.

When we are inter<5ted in an effect of a particular •ize. we use a specific altemnti1e hypotbesil. that takes the following form:

H, : f.l 1 – ~,.,;:: id I,

where dis a difference of a particular size. If the test is a nondirectional I<'St, then the dif- ference in the alternative hypothesis would be expressed as an absolute value, ldl, to ohnw that either ,t positive or neg.tt tve differe~tct~ ;, involv~d.

lt is custo mary to exprc>S the mea11 d i ffere nce in an II , in units of standard deviat ion.
Such scores are called zsco,·es. T he diffe(ence is called an effect size. Effect sizes frequently
are used in meta-analyse> of outcome studies to compare the relati\c cllicacy of different
t )’Pes of intencntioos acrOS> ‘tudies.

Cohen (1988) groups effect sizes into small, medium , and large cntegorics. The criteda
for each arc al follows:

Small effect >iu (d ~ .2): It is appro:rimatcly the effect size for the average difference in
height (i.e., 0.5 inches and < = 2.1) between 15- and 16 year-old girls.

Medium effect size (d • .5): It is ap proximately the effect size fo r t he average differc11ce
in heigh t ( i.e., 1.0 inches and s~ 2.0) bNwccn 14- aud 18· year-old g ir ls.

Large cff

l ntuit iv<:ly. it would se..-m t hat we wo uld want to detect even ve1y >mall effect si ~t·s in
our research. llo1Vever, t here is a practicdl trade-off involved. All o ther things being equal.
the consistcllt detection of unaU effect >izc’ requires very large (1l > 200) sample size,,

Because ‘cry large sample sizes require resources thdt might not be readily available,
they might not be practical for all studies. Furthermore. there are c~rtail1 outcome vari-
ables for which we would not be part icuia l’l y in terested in small effec t>.

If we rejeCt t he null hypothesis, t hen we implicitly huvc decided that t he evidence >Up-
ports the alternative hypothesis. If the alttrnative hypothc

86 P1o11r I • Qt•MmTM •; e A ?PI\OAC HtS: Fou NDAti ON) o, 0.-.tA Contr’fiO’I

Figur• 6 .4

The Null Hypoth

Decision

Reject 1io

Fail to Reject

H•

Siluation: ALTERNATIVE HYPOTHESIS TRUE

Result

Correct 0 edslon

1 -13 a t he
Alternative

probabinty of rejecling tho Null Hypothesis when the
Hypothesis is true. The power ot a test.

Type II E n· or
I}~ the p r
Altornatlvo

obability of not rejecling the Null Hypothesis w11e 11 the
Hypothesis is true.

Beta(~) is t he probdbility o f committing a Type rr error. This probability is eStdblished
when we set our criterion for rejecting the null hypothesis. The probdbility of a correct
decision (I – f3) is an importdnt probability. It is so important that it has a nJmc~power.
Power refers to the probability t h.u “e will detect an eff«t of the size we have sckctcd.

We should decide on the power (I – (3) as well as the a level before we carry out a sta-
tistical test. just as with Type 1 error, we should decide beforehand how often we are will-
ing to make a Type 11 error (fail to detect a certain effect size). This is our f3 level. The
procedure for making such determinat ions is discussed in Cohen ( 1988).

Assumptio ns for Statisti cal Hypothesis Tests

Although assumptions arc diffc •·cm leu different tests, all tests of the uull hypo1 hcsis shn re
two related assumptions: randomness nud independence.

T he randomness assum ption is t hnt sample members m ust be randomly selected from
the populatio n being evaluate d. If the sample is being divided into groups (e.g., trc:>tment
and control), then assignment to gro ups al. must be random. This is referred to as mn-
rlom selection and random fWigmnem.

The mathematical models that underlie statistical hypothesis testing depend on ran-
dom sampling. If the samples Jre not random. then •

The independence asswnption t.\ that one member’s score •

Again, the mathematical models are dependent on the independence of sample scores.
l f t he scores are not independent, t hen the probability (p) is, as before. >i mply n number
t h•t has little to do with the p ro babilit)’ of a Type I erro r.

Parametric and Nonpara metric Hypothesis Tests

Traditionally. hypothesis tests arc g rouped into parametric and nonp.trJntCt ric tests. T he
names are misleading given th at one class of test has no more or less to do with popula-
tion parameters than t he other. T he difference between t he two tests lies in the mathe
matical assumptions used to compute the likelihood of a Type I error.

Parametric tests are based on the assumption that t he populations from whkh the
samples are drown are norm.•lly di~t rihuted. Non parametric tests do not have this rigid

C HAJ>TEJI 6 • STATI \11(~ 1011: SOCIAl WO !U({I\S 87

assumption. T hus, a non parametric test can be carr ied out on a broader range of data
than can a parametric test. Nonparametric lests remain serviceable even in circumstances
where parametric procedures collapse.

When the populations from which we sample are nor mally distributed , and when all
the other assumptions of t he parametric test are met, parametric test~ are slightly more
powerful than non parametr ic tests. However, when the parametr ic assu mptions are not
met, nonparametric tests are more powerful.

Specific Hypothesis Tests

\•Ve now investigate several frequently used hypothesis te.m and issues surrounding their
appropria te use. Where appropriate, parametric and nonparametric tes ts are presented
together for ead1 type of design.

Single-Sample Hypothesis Tests

These are tests i n which a single sample is drawn. Comparisons are made between sample
values and population parameters to see whether the sample differs in a statistically sig-
nificant way fro m the parent populnt.ion. Occasionally, these tests are used to determine
~

For example, we might wish to gather evidence as to whether a particular population
was normally distributed. We would take a randon1 sample from this population and com·
pare the

Typicrully, these tests are not used for experiments. T hey tend to be used to demonstrate
that certain strata within populations differ from t he population as a whole.

Here, we investigate two single-sample test~:

L Single-sample rtest (interval or ratio scale)

2. x’ (chi-square) goodness of fit test (nominal scale)

TIJe Single-Srmrple t Test. This rest usually is used to sec whether a strotum of a population
is different on average from the population as a whole (e.g., are the mean wages received
by social workers in Lansing different from the mean for aU social workers in M ichigaJJ?) .

The null hypothesis for t his test is t hat the mean wages fo r a particular strntum
(l ansing social workers) of the population and the population as a whole ( Michigan
social wor kers) will be the same:

where !lo is the mean wage fo r the population and ~t 1 is the mean wage fo r t he stratum.
The assumptions of the single-sample t test are as follows:

Randomness: Sample members must be randomly drawn from the pop ulation.

fndeptmdence: Sa mple (X) scores rnust be independent of each other.

Sct1liug:The dependent m~sure (X scores) must be interval or ratio.

Norma l distribr

88 PAIIT I • QUANnrAnVf At-nOA.t-H£s: Fo u i\OAnotn o• OA t A Cou.£CIION

These asswnprioos are lientiJIIy “f•tal” ones. E’·en slight violations of the lir..t two
assumptions can introduce major error into the compmation of p value~.

Violation of the assumption of,, normal distribution will introduce >Ome error into
the computation of p vJiues. Unless the population distribution is markedly different
fro m a normal distribution, rhe erro” will tend to be slight (e.g., a re ported p v.tlue of.0

Jctu ally will be a p value of .057). This is what is meant whe n some-one snys t ha t the t test
is a <•robust" test.

T he tstatistic fo•· t he sing le sample t te;t is computed by subtr:ocr ing t he null hypotbe-
• is (popula tion) mean from t h e s”mple mean and dividing by th e sta ndard error of th e
n1ean.

T he fo rmu la for r…,, (pronOlii1Ced “t obr•ined”) is

As the absolute value of ‘·• get> larger, tht> more unlikely it is that such a difference
could occur if the null hypothc>sis is true. At a certain point, tht’ probabilit)’ (p) of obtam-
ing a t so large becomes sufficiently small (rt’acbt’S the a. level) that we rcjt’

T he critical value oft (the v.d ue t hat too. must equal or exceed to reject the null hypoth-
esis) depends o n the degrees of freedom. For a single-sample rtest,the degree> of freedom
ure df= n – I , whe re” is the s.omp k >itt’.

Let us look at how to compute ‘”k
v.re know from a statewide SUI’VC)’ I hat the average time taken to complete an outpa-

tient rehabilitation p rogram r-or .o certain injury, X, is 46.6 d ays. We w ish to see whethe r
clients seen at o u r clinic nrc taking longer o r ;horter than the state average.

We randomly sa mple 16 fil e< from the pa>t year, We review these c.1>cS anu dete•mine
the length of program for each of the clients in the sample. The mean n umber of days to
complete rehabilitation a t our clinic is 19.875 days. This is lower than the populat ion
mean of 46.6 days. The question is whether this result is statistically significant. I> itlikel)’
that this sample could ha,·e been drawn from a population with a mean of 46.6?

To determine thi>, we ne..’

Th e standard erro r of the lliCJn i> calculated by d ividing the standard deviation by t he
square root of the sample size or

;

_s_ = l 1.888 = l 1.888 =

2
_
9

72.

/ii Jl6 4

We take th e fo rmu la for t,,…, Joel p lug in our n umbers 10 obLain

29.875- 46.6

2.972

-1 6.725 8
2.972 – 5.62

We look up the tabled t val u e {I., ) at 15 degrees offreroom. This turns out to be 2. 131
for a nondirectional test at (X .05 (sec • t•ble of the critical values for the ttt»t, non texts). The absolute , .. Jue of r.,.. = 5.628. This is greater

than t”” = 2.131, so we reject the n ull hypothesis. The e-.-idencc suggests thot clicnls in o ur
clinic average fewer days in rehabilitation thon is t he case in the statewide population.

T he effect size index for a test o f means is d and is computed as follows fo r a single-
sample t test:

d = ~~o .
s

The effect size for our example would be as follows:

d = 29.875 – 46.6
11.888

which would be classifie d as a large effect.

-16.725
11.888 = 1.4069′

1he x’ Cootfne;s-of· Fit Test . Th e.%’ goodness- of-fit test is a single·sam pic test. lL is used in
t he evaluation of 11ominal (categorical) variables. The test involves comparisons between
observed and expected frequencies wi thin strata in a sample. Expected freq uencies are
derived from either population v-alues or t heoretical values. Observed frequencie-s are
those derived from the sample.

T he null hypothesis for !he x’ test is that the population from which the s.1mple has
been drawn will have !he same proportion of members in each category as the empirical
or theoretical null hypothesis population:

where

P., is the proportion o r case~ •.vitbin category kin the null hypothesis population
(expected), and

P01 is the proportion of cases within category k in the population from which the test
sample was drawn (observed).

The assumptio n> fo r thet’ goodness-of fit test arc as follows:

• Randomness: Sample members m ust be randornly drawn from the populnt i<)ll. • Independence: Snmplc scores m ust be independent of each other. O ne im plication of

this is that categories must be mut ually cxclu;ive (no case may appear in more than
one category).

• Scaling: The dependent measure (categories) m ust be nominal.
• expected frequenck$: No exl’ected frequency within a category should be less !han I,

and no more than 20% of the expected frequencies should be less than 5.

As “ith all tests of !he nuU hypothesis, the x’ test begins with the assumptions of ran ·
domness and independence. Deriving fr o m thc.~c assumptions is the requirement that the
categor ies in the cross-tabulation must be mutunlly exclusive and exhaustive.

Mutually exclusive means t hat an individual may not be in more than one categot)’ per
variable. ExiJaustive means that all categories of int ere;t arc covered.

These assumpliom nrc listed more or less in o rder of i.n1portance. Violations of the first
three assumptions are essentially “fatal” ones. Even slight violations of the first two
assumptions can introduce major errors into the computation of p values.

90 PA~-r l • OVAinllAt•vt Al’tfiOo\CI!CS: FouNOo\TION

They} goodness-of-fit test is basically a h>rgc-sam plc test. Whc11 the c·xpectcd frequen
cies are small (expected frequency les.~ thnn I or atlc:1~t 20o,(, of expected ft·equ,•ncics less
than 5), the probabilities associated with the X’ t~St will be in accurate.

The usual pt·occdtu’c in this case is either to increase expc led frc<1ucncb b)' colbp, ing adj.>ccnt C<>tcgorics (also called cells) <>r to u.

The workers at the Interdenom ina tional Social Services Center in St. Win ifre d
Township wanted to see whether they were servi ng people o f all fniths (and those of no
fit ith) equ:11l)’· The)’ had census 11gures indicating that religious preferences in the town>hip
were as follows: Ch risti~n (64%), Jewish (10%), Muslim (8%), other religionino preference
(14%). and agnostic/atheist ( 4%).

The workers randomly sampled 50 clients from those seen during the previous year.
Befor• they drew the sample, they calculated the expected freq uency for each category. To
obtain rhe expected frequencies for the sample, the)’ converted the percentage for each
preference to a decimal proportion and multiplied ir by 50. Thus, the expected frequency
for Christians was 64% of 50 or .64 x 50 : 32, the Jewish category was 10% of 50 or
. 10 x 50 = 5, and so on. Table 6.5 depicts the expected frequencies.

TABLE 6.5 Expected Frequencies for Religious Preferences

Expected
fr(!q uency

Christi (In

Jewish

t\i1uslim Other/No Preference Agnostic/ Atheist

4 7 2

Two (40%) of our expected frequencies (Muslim and agnostichlllteist) are less than 5.
Given that the maximum allowable is 20%, we are violating a test assumption . We can
remedy this by collapsing categories (merging two or more categories into one) Ot’ by
increasing the sample size. However, thet·e is no c.ategoq• that we could reasonably com·
bir1e with agnostic/atheist. lt would not work to combine this C<\tegory with any of the other categol'ics because the latter ar• religious individuals, whereas atheists and agnostics aJe not religious.

However, we could increase the sample size. To get a sample in which onl)• one (20%)
o f the expected frequencies was less than 5, we would need a sample large enough so that
8% ( percentage of the population identifying as Muslim ) of il would equal 5:

0.08 • 11 = 5

” = – 5- = 62.5 “‘ 6J.
0.08

So, our sample size would need to be 63, givi11g us th e expected frcq ucncio.:> show11 in
Table 6.6. On!)’ one of live (20%) of the expect«l frequencies is less I han 5, and nQne of
them is less tha n I, so the s:un ple size assumption is mel. The results of a random sample
of 63 cases were as found in Table 6.7.

TABLE 6.6 New Expected Frequencies for Religious Prefere~ce; ‘ · < · ;. : •: •: • . . ~ ' * •

Christian Jewish Muslim Other/No P(eference Agn ostic:/ Atheist

————————–
~>:pecte.fl

frcq uc:nc;·
~0.32 6.30 5.04 8 82 2 52

TABLE 6.7 Observed and Expected Frequencies for Religious Preferences

Christian Jewish Muslim Other/ No Preference Agno$tic/ Ath~isl:

Expected 40.3L &.30 5.04 8 .82 2.52
rr~(j ll CrtCy

Obse1·.-cd 49 2 2 9
frequency

The null hypothesis fo r this example is th;~ l the p roporlion of peo ple living in St.
Win ifred T<>wnship who identify 1vith each religious categor)’ will be the sam.: as the pro·
portion of people who have received services at the Interdenominational Services Center
in St. Winifred 1b w nship who identify wit·h each relig io us catt:gory.

The null hypoth~sis expresses the expectation that observed and expected frequencies
will not be differem. Notice the similari ty ben~<.>en the nu ll hypothesis and the numerator
of the ,,, .•. test statistic:

/v IJ&

X2 = “‘ (Jo – rd 0 0 1 L- fE .

T he form ula tells us to >U btract the e xpe

The x\.,. is evaluated by comparing it to a cr-itical value

For ax’ goodness of fit, the degrees of freedom are equal to the number of ,,ategories
(c) min us I or df = c- L In our case, we have five categories (Christian. Jewish, Muslim,
otherino prefere nce, and agnostic/athe;st), so df = 5- I = 4.

The critical value fo r X’ at C< = .05 an d df =4 is X' .," = 9.49. We have calculllted 7.'.,., as 23. 1295. Because X1<,1>1 is greater than X.~ena , we reject the null hypothesh:. The evidence .sug-
gests that people of all faiths (and those of no faith) are not being sec11 proportionately to
their representations in the township.

Earlier, we discussed the use of t he effect size measure d for the t test. Jt is an appropri-
ale measure of eftect size: fO r a test of means. However, Lhc X2 test doc,~ not compare

92 PAI\T I • Q UAIITI TA.Tivt A PPfiOAW £s: fou~OAliONS O f DATA Coll.ECTI OM

TABLE 6.8 Computation of x’ …

Observed (f
0

) Expected (f,) fo – fe lfc – f,)’ (f.- t,)’

49 4032 +8.68 75.3424 17.4404

2 6-30 -4.30 18.4900 2.9349

2 5 04 – 3.04 9.24 16 1.8337

9 .8.82 – 0. 18 0.0324 0.0037

2.S2 – 1.52 2.310• 0.9!68

!’JOT!.: I
(f, – f,)’

17,4404 + 2.9349 + I 8337 + 0.0037 + 0.9168= :t’,, = 23.1295.
f,

means. It compares frequencies (or proportions}. Therefore, a d ifferent effect size index is
used for the X’ test-w. This measure of effect size ranges from 0 to I . Cohen ( !988) clas-
sifi es these effect s izes into three categories:

Small effe

Medium effect size: w ~ .30

Large effect size: w ~ .50

The effect size c.oefficient for a x! goodness-of-fi t test is computed according to the fol-
lowing formula:

where N = the total sample size.
For the St. Winifred Township example,

IV= J(23.! 295/ 63}- J(0.367l) = 0.6059,

which would be classiGed as a large effect.

Hypothesis Tests for Two Related Samples
These are Jests in which either a single sample is drawn and rneasLtremen ts are taken at
rwo times or two samples are drawn and members of the sample are individually matched
o n som e altribute. ~vfeasureJDeDts are taken fot each member of the matched groups.

We· investigate three examples of two related sample tests in this section:

I. Dependent (matched, paired, correlated) samples t test (in terval or ratio scale)

2. Wilcoxon matched pairs, signed rank.~ test (ordinal scale)

3. McNemar change test ( nominal scale)

C1MPH~ 6 • Sunsncs FOR Sot-IAt \’IOKKUlS 93

Difference Scores. The dependent r test and the Wilcoxon matched pairs, signed ranks test
evaluate d ifference scores. These may be differences between scores f
x; – X1 =X0 ,

X, is the first of a pair of scores,

x; is the second of a pair of scores. and
X

0
is the d ifference between the two.

The null hypothesis for all these tests is that the samples came from popub tions in
which the expected differences are zero.

Tlte Dependenr. Samples t Test. This also is called the correlated, paired, or matched t test.
The nu ll hypothesis for this test is that the mean of the differences between the paired
scores is 0:

where

J.l.xo = the mean diffe rence between the populations from which the samples were
d rav.1n) and

)!00 “” the mean difference between the populations specified by the null hypothesis.

Because the null hypotnesis typically Sp
The t statistic for the dependent t test is the mean of the sample differences divided by
the standard error of the mean difference or

Xo – l’oo
lobt = 5= ·

XD

As the absolute va.lue of t. gets larger, the more unlikely it is that such a difference could
occur if the nnll ll)’pothesis is true. AI a certain point, the p robability (p) of obtaining at so
large becomes sufficiently small (reaches the alpha level) that we reject the null hypothesis.

The assumptions of the dependem t test are as follows:

Randomness: Sample members must be randomly d rawn from the population.

Tndependence: Xvscores must be independen t of each other.
Sca ling: The Mpcndt’nt measure (X

0
scores) must be interval or ratio.

No r·mal distribution: The population of X
0

scores must be normally distributed .

These a>sumptions a re list ed more or less in order of import>l 11cc. Viola tions of the t1 rsl
t hree asswup tions i1re essen t ially “dea th penalty” violation.. Eve n slight violation. “r the
(ht two assumpti011s can intr oduce majo r e rror in to th e comp ullll ion or p values. Sim i lady,
dilTnence scores computed fro1n ~””‘O sel!t of ordi nal data tnay inwrporate major error.

Violation of th~ assu mption of a normal distribution “ill introduce some error into
the computation of p values. However. Wllcss the population distribution is markedly dif
fcrent fi-om a normal di>tribu tion, the errors will tend to be slight (e.g., a reported p value
of .042 actually will be a p value of .057). Th is is what is ml·an t wh en someone ‘”YS thnt
the t test is a “‘robu~t .. test.

Still, cvm thoug h t he erwr is sli~;ht, the nonpt~ irs,
sig ned ranks test (discussed in the next section} prob;,bly will yield a more accu rate test
when there are viulation~ of this normal dislribution as.su.mpliun.

Let us look at the proc<"urc nf
dcpn:s~;un.

Ten clienL~ were rand omly s~kcted r,·om clients seen fo r d ep ression problcn” a t a (l,un –
m unity cent~r. ‘I ‘hey were pretested (X,) with t he BDI, r<·cd ved I he treatment, ;,nd t he n were posrtested (~)wi th t h e same inst ru111e n1.. The m ean of the d iffe rence scores (.k0 ) wa.s - L This means that tJ K· aven1ge: chtUl.gC' in BD f scnrefi fron1 pcelC'Sl tu pn:-:ttest was a dtcrease of I poinl. The standard deviation of the ditlcrcnce s.:ort> \\’aS l.H .

‘I he ne>.’t step is the cnmpntation of the ‘landard error ol tllc mean. Wedhdde the stan-
dard deviation by the square rout of t he s.unpk siu: to get t he standard c·rror of th e mean:

.< XD = 1.'33/ V 10 - l .;l3j 3 .16 = 0 .•12.

\ Ve plug the value.< into the formula li>r t.,.:

XI>
r\”lobt = –

-‘xl’>

– 1
-~ – .1..38
0.42 ..

Fo1· a = .05 and rlf ~ 11 – I = 10 – I -9, r, … = 2.262 (sec a t<~nle of critical values for the 1 te,r, nondire.:tional, fo und in m ost stali>Li” texts). Because lt …. l – 2 .. l8 is greater !loan or
equal tn the critical \’;liuc, we reject the null hyp(llhcsis at a= .05.

The cff~ctsi/e index for tbiotc.,l i’ ll and is rom puled a; foUows:

;

For the depr~ssion intervention cx,unplc,

-1-0 – 1
d = = = – 0.752.

1.33 1.33

w hich wou ld be classifier! ns ” medium effect.

CHAI’rER 6 • SI All~ucs Hl!t Socu .. l Woll.~Eas 95

lv’ilc&X011 Matched Pairs, Signed Ranks Test. The Wilcoxon matched pairs, signed ranks test
is a nonparametric test for the evalua tion of d ifference scores. The test involves ranking
d ifference scores as 10 how far they are from 0. The difference score closest to 0 receives
the rank of I, the next score receives the rank of 2, and so on. The ranks for diffe rence
scores below 0 are given a negative sign, whereas those above 0 are given a positive s ign.
T he null hypothesis is t hat the sample comes from a population of di fference scores in
“‘ hich the expected difference score is 0.

The assumptions fo r t he Wilcoxon matched pairs, signed ranks test are as follows:

• Ratufomness: Sample members must be randomly drawn fro m the population.
• independence: XD scores 111ust be independen t of each other.
• Scaling: T he dependent measure (XD scores) must be ordi nal (interval or ratio dif-

ferences must be converted to ranks).

Let us look at the procedure for computing the Wilcoxon matched pairs, signed ranks
test statistic. We use the same example as for t he t test. The dependent measure is t he BDI,
a measure of depression. Scores on the BDI are not normally distributed, tending to be
positively skewed.

Ten clients were randomly selected from clients seen for deprcs.~ion problems at a com-
mun ity center. They were pretested w·ith the BDI~ received the treatment, and I hen were
posttested with t he same instrument. We c.ompute the difference scores (post -pre) f

o r

each indi,·idual. We assign a rank to each difference score based on irs closeness to 0.
Difference scores ofO do not receive a rank. Tied ranks receive the average nlllk for the tie.

So, if we look at Table 6.9, we see that there is one difference score of 0 that goes
unranked. There are five difference so::ores of eit her – 1 or +L These cover t he first five
ranks {I, 2, 3, 4, 5), giving an average rank of 3. T here are three difference scores of – 2
(and none of +2). T hese cover the next three ranks (6, 7, 8) , giving an average rank of 7.
The una! score is – 3, which is given the rank of 9.

TABLE 6.9 Computation of the Wilcoxon T .. ,

Signed Ranks

JD Number Pretcsl Postte.st Difference Rank Positive Negati ve

17 16 – 1 3 3

2 19 t8 -1 3 3

3 18 15 -3 9 9

4 18 17 -1 3 3

s 16 16 0
6 16 17 +1 3 3

7 18 16 – 2 7 7

8 21 19 – 2 7 7

9 18 19 .+1 3 3

10 18 16 – 2 7 7

NOTE: Sum of ranks for less, frequent ~ign ~ 6:

9 6 t-‘11111 I • QUAWhlAII\11 Al•f’II(IA(tUI\: r t i UNOATI(Hn ()I I)AlA (.OU I CI101i

T he M(Ore wa, PO>i
tivc or ncg.uivc.

We then determine which ,ign (JXl,ithe or neg.ttive) apJl<'ared 1.-s~ fre~!(n. lkcause th e positive sign ,tppearctf only twice (comp>rctf to
~even tim~s for lhc ncg:.uivc sill.n)~ w~: add up I he rank~ in the pO$itivc column .lnd obtain
1>. rhi•” I he IC\1 \l3l”lic v~lue for the Wiln mJI<.hed J>J II.,, stgncd r:lnks test.

Th e IC> I. stati>l icis w iled ‘f.,1, . This is an 11 ppcrcase T a nd is not the >flllll’ as the >tatistic
us<:d with the (lo'"erc.tse) I distribullon.

There are two other ipect to the Wilwxon 7.1,. • hat shoul11 be ad,lresscd:

1. The Wilcoxon T…, is cvaluat<·d according to rhe ruombtr of nontcro differentc ~cores. So, we should subt ract I from the o rigina l 11 fo r each
2. Unlike most other t~>l &ratistic~. the Wilcoxon T,,, must be lrss tlta11 or equa l to t he
c ritical value to ,·eject the null hypothc>is.

We consult a table of critica l values for I he W ilcoxon T(scc t ahlc of .:ritical values for
Wilcoxon Tin any general swristics book) Jnd stt whether obe result (7.,.. = 6) was sig·
nificant at o. = .05. lle<:ause there wa. one differen ce score equal to 0, the corrected 11 = 9. The critical value for the Wilcoxon 7"a t n=9 and a .05 is T.,. = 5. 1:,.. = 6 is not less than or equ•lto the critic.ol value, so we fail to reject the nuU h)·polhesi> at o.- .05.

There is n o weD-accepted post h oc measure of effect sizt for Otd in:d tesL~ of rela ted
scores. One possib le measure would be proportion of nonoverlapping scores as a measure
of effect. Cohen ( 1988) brieOy discu~s this measure, called U.

The p1·ocedure bc:gins with compul ing the miniJuum and maximum ~cores for each of
the two related g roups. We choose the least maximum and the greatest minimum. Tbi>
establish es the end points for the overlap range.

We count t he n umber of scores in both groups w ithin this mngc (including rhe end
JX>ints) and divide by the total number of scores. This gives a proportion of overlapping
score.o;. Subt ract t his number from I , and wr o btain the p ropottion of nunoverlapping
$Cores. T his indc.~ ranges from 0 to I. Lower proportions arc indicative of ~mallcr effects,
and higher on~> are indicative of larg<·r effects.

Cohe11 ( 1988) calcula tes equivalent< between U a nd d, which would imply the foUow· ing definition> of strength of effect:

Small ct rect slzr

Uugc (‘tfect SIZC

d~ ~

d:.8

u- .IS
u- .33
u ~ 47

f”Or the example da1~, the minimum scooc for th e prctC\l wa& 16, and the mnximum
~core w;1~ 2 1. The poSit(!St miuimum and ua.tximllln -;cores wt:r~ 15 .md llJ. rc-‘>petti\•cly.
‘I h e grc•test minimum is 16 •• md lht lcastm.l.ximum is 19.

Of 20 total ‘>(\)1 e.,, 1 ~ f~U with in thi, 1werl.•1> r.onge. The p ru(‘
CHAnt~ 6 • STAT1srtcs rQR SQetAL Wcnrxus 97

.WcNmmr Change Test. The Mc:-icmar change test is used for pre- and post intervention
designs “‘here the variables in the anai)’Sis arc dichotomously scored (e.g., improved ~.
not impro,•ed, same,.,_ different, increase ‘s. decrease).

The layout for the McJ-:emar change test is shown in Figure 6.5. Cell A cont.Un> the
number of indh~dual.s who changed from+ to-. Cell B contains the number of individ-
uals who recei,ed +on both measu rement>. Cell C contains the number of individuals
who received – on both measurements. Cell D contains the number of individullh who
changed from – to +. The null hypot hesis is expressed “‘

where

P, is t he proportion of cases shifting from+ to- (decreasing) in the null hypothesis
population, and

P
0

is the proponion of ca,.,; shifting from – to + (increasing) in the ouU hypothesi’
population.

The assumptions for the McNemar change test are sintilar to those for the X’ test:

Rrmrlomness: Sample members must be randomly drawn from the population.

Independence: Withi n-group sa111 plc sco•cs must be independent of each other (although
llerween-group scores [pre· ~nd poM1c~1 ~cores] will necessarily be dependent).

Smling: The dependent measure (categol’ies) must be nomi nal.

F.xpected frequencies: No expected freq ue11cy within a category should be less than 5.

A special case of X’..,, b t he test >tatistic for the McNemar change test:

where

t _ (If,. .fi,f – I ) 2
‘”” – f, + fn

J. =the frequency in Cell A, and
fn =the freq uency in Cell D.

Th ·is is a test statistic with df = I , For rlf I , we need to include s·omcthiug called the
Yates correction for continuity in the equation. This is – I, which appears in the n ur.-‘1~ 1′”
tor of the test statistic.

Figure 6.5

McNemar Change
Test layout

Before +

After

A B
c 0

98 PART I • QuAutlfi~T•vt A PI’AOAC HlS! Fou~JDAfiONS OF Ot.rA CotUCliON

Let us imagine that we are interested in marijuana use among high school students. We
also are interested in change in marijuana ust over time. Jmagine that we collected survey
data on a random sample of ninth-graders in 2007.1n 2009, we surveyed the same sample
that had been in ninth grade in 2007. We fo und that 32 of 65 students said that they used
marijuana during the previous year, as compared 10 23 of 65 in 2009. The results are sum-
marized in Table 6. 10.

TABLE 6.10 Observed and Expected Frequencies for the McNemar
Change Test

2009

None

Marijuana

2007

Marijvana 2 (Cell A) 21 (Cell S)

None 31 (Cell C) 11 (Cell 0)

Total 33 32

l’o!
23

42
65

Cell A repn-serm thMe studeitts who had used marijuaM in 2007 hut who had nOf used
it in 2009. Cell B shows the number of students who had used marijuana in both 2007 and
2009. CeU C shows the number of students who did not use marijuana either in 2007 or in
2009. Cell D shows the number of students who did not use marijuana in 2007 but who did
use it in 2009.

So, the sum of Cells A and D is the total number of students whose patterns of mari-
juano use changed. The nuU hypothesis fo r the McNemar change test is th at changing from
nonuse to use would be just as likely as changing from use to nonuse.

In other words, of the I 3 individuals who c.ha11ged their pauern of marijuana usc, “e
would expect half (6.5} to go from not using 10 using and the other half (6.5) to go from
using to not using if the null hypothesis were true.

Tile calculation of the McNemar change test statistic is shown in Table 6. 1 L
!’or df ~ 1 and C/. ~ .05, x\,, = 3.84 (see a I
tistics texts). Because x ‘,., = 4.92, we would reject the null hypolhesis at u = .OS. We would
conclude that there was in fact aJl increase in marijuana use between 2007 and 2009.

TABLE 6.11 Computation of the McNemar Change Test Statistic

( JI~ – f01)-1

2 11 8

NOTE: 7~1 = 4.923.

64

(If. – f. l- 1 I’
f..,. + fl)

4 ,9230767

CHAot1U 6 e STATISTICS fO-. SOCI~l W O’-I(rll\ 99

The effect size coefficient for a M’:-lemar change test is wand is computed according
to the following formula:

For the high school survey,

w = J(4.923/65) “‘ Jo.o757 = 0.2752,

which wo uld be classified as a medium effect.

Hypothes is Tests fQr Two Ind e p e nde nt S amples

These are tests in ‘•hich a sam ple is randomly drawn and individ uals fro m the sample Jrc
rJ.ndomly assigned to one of two experimental conditions.

We investigate three examples of two independent samples tests:

I. Independent samples (group) /test (interval or ratio scale)

2. vV”dcoxonfMann-Whitney (WfM-W) test (ordinal scale)

3. ;(2 test of independence (2 X k) ( uominal scale)

l11depeudent Samples 1 Test. T his sometimes is CJIIcd the g roup t test. It is a test of mcJ.ns
whose null hypothesis is fo r mally stated •• follows:

Following are the assum ptions of t he independent t rest:

Randomness: Sample members m usr be randomly drawn from the populotion and ran·
dom ly assigned to o ne of the ‘-“0 groups.

ltrdepe11dence: Scores must be independent of e.1ch or her.

Scalitrg: The dependenr measure musr be inrervlll or ratio.

Normal distribution: T he populations from which tbe individuals in the samples were
d r,own must be normally distribured.

Homogeneity of variances (a,’- a ,’): ‘ f he samples must be drawn from populatious
whose variances are eq ual.

Equality of sample sizes ( “• = n,): ‘ I he samples m ust be of the same sir.e.

As before, these assumptions are listed more or less in o rder of imp o rtance. T he fir. r
three assumptions are rbe ” fa tal” assum pt ion;.

Violation o f the nonnaliry assumption will make for Jess accurate p val ues. However,
unlc;.s Lhe population dist r iburion is markedly diiTerent from a normal d isrr iburion, the
errors will tend to be slight. Slill, e”en though the error is slight. the oonparamcrric W /M-
W test probably will be more accurate when the norma lit)• assum prion is violated.

The independent groups t tesr alw is fair!)’ robu>t .-ith respect to \•iolation of the
homogeneiry of variances assumption and the equal sample size assumprion. A problem
may .orise when both of these assumptions are violated Jtthe same time.

100 PAnl I • OUANntAuvt Art~AoAc.ul\~ Fou~~rooAT ION> o• 0″‘” Ct~ur
If the ,maller variance •~ mthc “11allca >.~mple.then the probability of,, I ypc II ca ror ( 1101
deteaing an exi;,ting dilfcrcn.i£ th(‘ larger \’ariancc is i 11 til<' when it i> true) anne.a’<".

If there is no ..tSsodarion lk·twt-en s.;1mplt”‘ Mit.’ ~lnd vari:wcc. then ”iol.l1ion of c:.u.h of
thc>e .~S»umptions is not partiCufMiy problem.uic. There may be fairly ,,ub>t.mtial di~
crrpJncies bctwet•n s. .. mplc si1C!’\ withnut much effect on Lhc dtc.ur~cy o i Ottr /’ cMim.lttl’!.
Similarly, if e- very other n~~nmption i!) mel, 1hcu a slight difference in v11riam:c:. will not
h ave a fa rge effect on probability estimates.

T he t stat i~tic for the independent 1 lc
x , – x2
lut-·1 —

Sx 1- … ~

Be«luse rwo sample mean• arc computed, 2 degrees of freedom are lost:

df 110 + n, – 2.
where

“• = number of scores for the first group, and

11
2

= number of scores for the seco11d group.

Following is an example ot the ll>e o( the independent t test statistic. We whh to sec
wl1ethf:r there is a difference i11 ((•vel of soci.al act iv ity in children depending 011 whether
they are in after-school care <>r h0111c (.(ltc . Because more childre11 attendcample of 16 children in afteHchoof care
(Group I ) and 14 childien in home care (Group 2) was drawn. The dcpcnclcnt meJsure
v,•as a score on a socir1 l activity ).CJ )e in whk h lower scores represent less soc ial aclivity and
higher scores represent more social activity.

We c\’aluate tl1is with an independent 1 tc.\L The first step in calculating ‘·•• i, to com·
pule the sample mean for each group. The next step is to compute the stJndard error of
the mean. Howe•·er,the pl'()(cdure for doing thi< i~ a little different from that u«<< before. A> lou might recall. the standard error of the mean is the standard dcvi,ation d” aded by
the square root oi the sample ‘ire:

$

.,;;; \/sl !.. II
This also is equivalent to the squ:HC •·oot· o f the variance times the inverse of the,., , .

p te size (l/11).
Unf{‘trtunately) we c:u•not u~t..· lhis IOI’tnuln for t+ae standa rd error o f lhc mean. It is I he

“ttdnda l’d crroJ’ for a sinr,l<.- ... amplt. Bccauo,r we have two sample:, in ,m iudcpcndt•nt WOU(JS lCsi, the formula has to he Jitert·tf J bit.

Th~ first difference i\ in the (orrnuiJ for •he: va ria nce. TIH! variM1u: i’ the \Uill o l
..qual’l.”> divided b)’ the deg~C·c~ of lrct’dom. ll•s tht same he…- eX(Cpt that we have two
‘oms of squan:s (one for Group I and one for Group 2). and o u1 degree< of freedom Jr(' 11 1 rt. 2. Thi• gives "' the folfowint: cquJtion:

ss, ss1
” ‘ I II• 2’

CH.t.PHR 6 • Su.nsncs f OR SOC IAL W ORKERS 101

s; is the pooled estimate of the variance based on two groups,
55

1
is the sum of squares fo r Group I ,

SS, is the sum of squares for Group 2,

n
1

is the number of scores in Group J, and

n, is the number of scores in Group 2.

Because there are two groups, we do not multiply s: times (1/n); rather, we multiply it
by i lin,+ I In,). We take the square root of this and obtain the pooled standard error of
the mean:

S.\’1-Xl = , (I 1) s- – + -P IlL nz .
The means and sums of squares for our example are presented in Table 6.1 2. Now, let

us tq• computing t..,,.

TABLE 6.12 Group Statistics

Group Mean Sum of Squafcs ”
27.8B <1330.40 16

Home care 21.36 17{)7. 16 \4

First, we compute the pooled standard error of the mean (also called the standard
error of the mean difference). We begin by calculating the pooled variance:

ss, + ssl 43:;0.40 + 1101.16 6037.56
28 = 215.63 . = n, + n2 – 2 16+14-2

From the estimate fo r the pooled vari
s2 – +- = ( 1 I) I’ tll ll2, 2 15.63 (~ + ~) = ,128.88 = 5.37 16 14
Wt calculate 1

001
:

27.88 – 21.36 6.52
lobt = = — = 1.213 .

5.37 5.37

For ex = .05 and df = 111 + 112 – 2 = I 6 + L4 – 2 = 28, Ia;, = 2.048. Because 1100,1 = 1.213 is
less than the critical value, we fa il to reject the null hypothesis at a. = .OS.

102 PAI!.l I • QuANtiTATIVE AI’P~OACHES: Fou … O-.liOM Of 0ATA co~UtliO’f

There are two post hoc effe<:t size measures for an independent t test. The 11m of these (d) already has lxen di.cmsed:

Note dlatthe numerator is the difference between the two sample m eanl and that th e
denominator is the pooled c>ti mate oft he standard deviation. The pooh.’!! •t andard de,•i-
ation is t he square root of the pooled variance that we calculated earlier:

Sp = fs~ = V215.63 = 14.68.

The effect size for the example would be

d = 27.88 21 36 = 6.52 = 0.44
14 .68 14.68 ,

which would be classified .ts a 1mallto medium effect size.
The other measure is Tl • (eta-.quare). n’ is the proportion of variance explained ( Pifl:) .

This is equivalent to the ‘quared point-biserial correlation coefficient and is computed by

2
/
2 if.
/Obi + d

We ”’ere com paring socinl nc tivity in c hild ren in after-school care vcrMJ> t hose in home
ca re. Children in after-sdtool cure sCC)rcd h igher on social activity than d id c hild ren in
home care. T he differe nce was not statistically s ignificant for <> ur chosen ex = .05.

r.,.,. was 1.2 13 with df • 28. Pu tting these numbers in t h e formu la, we obtain the
following:

l_ ( 1.213)
1

” – ( 1.213)
2 + 28

1.471
29.47 1 = 0’0499′

So, a litde less than 5% of the variability in social activity among the chlldren was
potentially explained by whether they were in after-school care or home cJre.

Wilcoxon/Mann -Whiwey Test. Statistic> texts used t o reter to this te>t as t he Mann-
~Vhitney test. Recent ly, th e name of Wilcoxon has been added to it. The reason t hat
Wilcoxon’s n ame has been added is t hat he developed the test first and published it first
( Wilcoxon, 1945). Unfortunately, m OI’e fo lks noticed the art ide publishtd by Mann a nd
I•Vhitn ey ( 1947) 2 years later.

Tbe W/M-W test is a nonp a1·ametric test th at involves initia lly t reating both samples as
one group and ranking scores from lcn;t to most. After this is done, the freq ue ncies of low
and high ranks between groups arc compared.

The assumptions of the W/M W test are as follows :

Randomness: Sample members must be randomly drawn fr<>m the popuiJtion of inter-
est and randomly a>Signed to one of the two groups.

C U AI’rtll 6 • S IAHSHCS FOR $o cu._t W ORKU$ 103

Independence: Scores m ust be independent of each othe r.

Scaling: The dependent measure must be ordinal (inter val or ratio scores must be con-
verted to ranks).

‘When the assumptions of the t test are met, the r test will be slightly more powel’ful
than the W!M-W test. However, if the distr ibution of population scores is even slightly
d iffe rent from normal, t hen theW /M • W test may be t he more powerful test.

let us look at the procedure for com puti ng t he W/M-W test statistic. We use the same
exam ple as we d id fo r t he independent r test. We evaluated level of social activity in
children in arter-school ca re and in home care. T he dependent measure was a score o n a
social activity scale in which lower scores represent Jess social activity and higher scores
represent more social activity.

The first step in carrying out the W/M· W test is to assign ranks to the scores without
respect to which g roup individuals ‘”ere in. The rank of I goes to the highest score, t he
rank of2 to the next highest score, and so on . Tied ranks receive the average rank. We then
sum t he ran ks within each g roup. The summed ranks are called W1 for G rou p 1 and W,
for Group 2 and are fo und in Table 6.13.

TABLE 6.13 Summed Ranks for the Wilcoxon/ Mann-Whitney Test

Summed ranks

After-School Care

n
1

= 16

w,= 218

Home Care

n
1

= 14

w;-=

247

The test statistic for the W/M-W test is u..,,. We begin by calculating U statistics for
each according to t he fol lo wing equations:

U
111 + ( 111 + l)

1 = 11J n;z. + lFV1
2

n2 + (n2 + 1)
U2=11rnz+ 2 w,

nt(nt + 1} u, = ,,, tiJ + 2 – w,

= ( 16)( 14) + ( l6)(~6 – I} 2 18 = 126

(]
112(n 2 + I}

2 = , J l’l:z. + -=-‘-=,…–‘-
2

w, = ( 16}(14) + ( 14}( 14 – l)
2

182
= 224 +– 247 = 224 + 91 – 247 = 68.

2

We choose the smaller U as u;,.,. Ln this instance, u.,. = u, = 68.

247

u •• , m ust be less tlran or equal to the critical value to reject t he null h ypothesis.
The critical value for the W/M· W U at n, = 16 and at n, = 14, and o: = .OS is U”” = 64.

104 Po\IU I • 0uAN11tAT!V( A1’1’110M.Ht~ : FOU’IDATIO.,.S or Oo\TA CouH.UO\’

U.,..: 142 is not less than or equal to the critical value, so we fail to rejtct the null hypothe-
sis at CL: .05.

As before, t here is no well-established effect size measure fo r the W/M-W test. The U
m easure of nonoverlap probably would be the best bet.

For o ur example data, the minimum and maximum fo r t he after -school care g roup
w ere 2 and 55. whereas they were 7 and 40 for the home care grout>· The greatest mini –
mum is 7, and the le”‘t ma.ximum is 40. All 14 .cores in the home ca re g roup are within
the overlap range, and 12 of l4 scores in the after-school care group are in t he overlap
range. This gi•es us a proportion of overlap of 26/30: .867. The proport•on of nonover-
lap is U I .867″‘ .133. This would be ,, small effect.

X’ Test of lmlcpt!m/ence (2 x k). The assumption> fo r d1e x’ test of indCj>Crtdence are as
follows:

/lat~dom/les.: Sample members must be rnndo mly dra”‘n from the 1>opulation.

/Jillependl’!lre: Sample scores m ust be independent of each other. O ne implication of
this is tha t categories must be mutually exclusi’e (no case m ay appear in more than
one c.1tegory ).

Scaling: The dependent measure (categories) must be nominal.

Expmcd frequmcie$: No expected frequency within a category should be less than 1,
and no more d1an 20% of t he exp«tcd freq uencies sho uld be less t han 5.

As wit h all tests of t he null hypothesis. the x2 test begins with t he assumptions of ran-
d omness and independence. Deriving from t hese assumptions is the requirement that the
categories in the cross·L1 bulation be mulllnl/y exclusive and ex/u~ustive.

Mwunlly rtclusive meaJlS that nn individual may not be ill more thn n one category per
variable. Bxluwsti•-e means that all possible categories are covered.

let us imagine that we are interested in marijuana use among high school students and
sp<-cifically whether there are any diffcrcn= in sutb use between 9th and 12th-graders in our school di>trict. We conduct • proportionate str atified samplt in which we ran-
domly s:~mplc oixt)’-five 9th-graders and fifty-five 12th-g raders from all Mudents in the
district. T he students are surveyed on t heir usc of ((rugs over the past ye.ar under condi-
tio ns guaranteeing co nfiden tiality of response. Table 6.14 depicts reported marijuana use
f o r t he s tudents in the sam ple o ver the past yenr.

TABLE 6.14 Marijuana Use

None

MatiJuanil

l eta I

Grade

9th

12th

42 33

23 22

65 55

Toto!

75

1 ~0

A higher proport ion of 12th-g raders
than 9th-graders in t his sample used mar-
ijua na at least once during t he past year.
The question we are interested in is
whether it is likely that >uch a sample
could have come from a population in
which the proportion.1 of 9th- and 12th-
graders using mc:1rijuana were identicaL

The usual test used to evaluate such
data is the x: test of i ndepcndcnce. The X1
test evaluates the likelihood that a per·
ccived relationsg1ip between propor tions
in categories (called being dependent)

C HAI’TEII: 6 • STATISTIC-S fOR. Soc•AL Wo~Kflt S 105

co uld have come from a po pulatio n in which no such relationship existed (call ed
independence) .

The null hypothesis for this example would be that the same proportion of 9th-graders
as 12th-graders used marijuana during the past year. The null hypot hesis values for this
test are called the expected frequencies. These expected frequencies ior marijuana are cal-
culated so as to be proportionately equal for bot h 9th- and 12th -graders.

Because 45 of

120

of the total sample (9th· and 12th-graders) used marijuana during
the past year, the proportion for t he total sample is 45f!20 = .375. The expected frequency
of marijuana use for the sixty-live 9th-graders would be .375(65) = 24 .375. T he expected
marijuana use fo rthe fifty-five 12th-graders would be .375(55) = 20.625. Table 6.15 shows
the expected frequencies in parentheses.

The%’ test evaluates the likelihe>od of the observed frequency departing from the
expected freq uency. T he null hypothesis is

H,: P”‘- P,,= O,

where P
0

, is the pro port ion of cases within category k in the null hypothesis population
(e.xpected; in this case, this is the expected proportion of students in each of the two gt·ade
levels [9th and 12th] who fell into o ne or t he other use category [marijuana use or no
marijuana usc)}; and P,~ is the proportion of cases wi thin categor y k drawn from the
actual population (observed; in this case, this is the obser ved [or obtaine.d] proportion of
students in eacb of t he two grade levels [9th and 12th] who fell into one or the other use
category [marijuana use or no marijuana use]).

The X’.,, test statistic is

Degrees o f freedom for a x’ test of independence are computed by multiplying the
number of rows minus I times the n umber of columns min us I or

df= (Row – I )(Colum ns- 1)

TABlE 6.15 Observed and Expected Frequencies for Marijuana Use

None
Marijuana

Total

9th

42 (40.625)

23 (24.375}

65

N01’E: Expwcd frequencies are in parentheses.

Grade
12th

33 (34.375)

22 (20.675)

55
Total
75
45
120

For Ollr example, this would be

d/=(2 -1}(2 1)=(1)(1)=1

Re.::all from our dbcussion of the ;’.lcNemar change te:.t that we include the Yates cor
rection for continuit)· in the formula ,,hen df l . The equation for the corrected test sta
tistic is as follows:

X
1 = I: (Vo- fr,l – 0.5)

1

ul>• /c

The form of the equ~tion tells us to suhtr.ltt the expected ;core from the observed
>eore and take the ab:.olute value of the difference (make the difference positive). Then.
subtract O.S fro m the absolute difference (I/., f. I -0.5) and square t he result. Next. divide
by t he expected score. T his is re~1eated for ca
The reader might have noticed that t he con ection for the McNemar c hange test wa,l
I.Q, whereas th e correct ion for the X’ test of independence (and the goodness-ol:fitiCit)
was 0.5. I will not go iuto an)’ detail beyond sa)’ing that this is be.::ause the McNemar
change test uses o nly half of the a••ailable cross-tabulation cells ( two of four) to computl’
its x.’..,., ••hereas all cells Jre used to compute ;c,.. in the independence and goodne~< of· fit tl'sts.

Tnble 6.16 shows how 10 work out the ma rijuJna survey data.
For df= I and ex .05, the critical value fot· x’,,.,. is 3.84. Ou r c alculated value (X’,,,l was

0. 1 09. Bec
As before, the effe.::t c measure is “;which is wmputed a• a post h oc measure by

w – Ji.x’/N).

~or a 2 >< 2 tab le, w;, eq ual to the absolute v.tlue of

T AILE 6.16 Compuution of x’ …
CJb,crved (f0 ) Expected (1, ) (If.- f, J – 0.5)

42 ~() 615 8/~

lJ 14 375 81~

23 ]4.375 .875

n 20 62~ 875
NOTE: 7.’ = 0.01 9 + 0.02l + 0.031 + 0.037 ~ 0. 109.

bbt

(If. – f, J- 0.5)’ (Jf.- f,l – 0 .5)’
f,

0.7651>2’> 0.019

0.76~6lS 0022

0765675 0.031

0.765625 0.037

CHAI’tfft 6 • Sr.t.nsncs FOil So C-I.t.l WOI\I(US 107

For our example,

w = /(O. J09/t 20) = Jo.ooo90S3 – oo3o i
and

w’ = PVE – .0009.

This is an extremely smaU effect size.
f’or 2 x k tabulation, we cannot convert tv to PVE.

Hypothesis Tests fork > 2 Independent Samples

Irnaginc that we wert: in terested in ageist attitudes among sodal \\’Orkers. Specificall)’> we
are interested in whether there are any d ifferences in the magnitudes of ageist attitudes
among (a) hospital social workers. ( b) nursing home social workers, and (c) adult pro tee-
tive services social workers.

We cotdd conduct independent group tests among aU possible pair ings: hospital (a) with
nursing home (b), hospital (a) with protective services (c), and nursing home (b) with pro-
tective services (c).

This gives us three tests. When we conduct o ne test at the ex= .05 levd, we have a
.05 chance of committing a Type I error (rejecting the null hypothesis when it is tr ue) and
a .95 chance of making a correct decision (not rejecting the null hypot~esis when it is
true). If 1ve conduct three tests at u = .05, our chance of commi tting at least one Type I
error increases to about .15 (the precise probability is . 142625). So, we actually are testing
at around 0′. = . 15.

As the number of comparisons incceases, t·he likelihood of rejecting the null hypothe-
sis \”rhen it is true increases. \oVe are ((capitalizing on chattce .’>

One way of dealing with capitalization on chance would be to use a stricter alpha
leveL f’o r three co mpa risons, we m ight cond uct our tests at u “‘ .05/3 “‘ .0 167.
Unfortunately, if we do th is, then we will reduce the po,ver ( I – ~) of o ur test to detect a
possible existing effect.

However, there are tests that allow one to detect whether there are any differences
among groups wiLhout compromising power. This is done by siJnultaneously eva1U(lting
all groups for any differences. If no d ifferences are detected, then we fai l to reject the null
hypothesis and stop. No further tests are conducted because w e already have our ans11w.
The difference> among all gro ups are not sufficien tly large that we can reject the notion
that all of the samples come from the s ame population.

If significant differences are detected, then further pair comparisons are conducted to
determine which pairs arc different. T he screening tests do not tell us whether only one
pair, two pairs, o r all pairs show statistically significant differences. Screening tests show
only that there are some differences among all possible comparisons.

lf we conduct our screening test at a ,. .OS, then we will carry out the pair comparisons
when the null hypothesis is true 1 out of20 times (commit a Type I error). By conducting
the in itial overall screening in a single test, we protect against the compounding o f the
alpha level brought on by multiple comparisons.

We look at three examples of screen ing tests fork> 2 independent samples:

I. One-way analysis o f variance (ANOVA) (interval or ratio scale)

2. Kruskal· Wallis (K· W) test (ordinal scale)

3. X1 test of independence (k x k) (nominal scale)

108 ‘””‘ I • QUANTITATIVl AmtOA.CIILS : fOU”-DATIOJr.S Of DA’rA C.olUCltOh’

One· Way A011dysis of\’ariance. The At\OVA is a test of means. The null hypothesis is

where k is the number of population nocans being estimated.
If all of the means are equal, then it fo llows that the voriance of the means is 0 or

I 10 : &,. = 0.

The test statistic used in A..’\OVA is called F and is calculated as follows:

n_.;
7

where the numerator is the variance of the sample means mu ltiplied by the sample size,
and the denominator is a pooled estimntc of the score variances within the samples.

The assumptions underlying o ne-way ANOVA are as follows:

Randomness: Sample members must be randomly drawn from the population and randomly
assigned to one of the k groups.

Indepelltltllct: Scores must be independent of each other.

Scalir~g: The dependent measure must be interval or ratio.

Normnl distribution: The populations from which the individuals in the sam ples were
drawn must be normally d istributed.

Homoge11ciry of variances (oi = o~ = .. . = o~): The samples must be drawn from pop·
ulntions whose variances arc equal.

&jualiry of sample sizes (n, = n, = … = 11,): The samples must be of the same size.

ANOVA involves taking the variability among scores and detumining which is vari·
ability due to membership in a particular group (variability a.~sociated with group means
or between-group variance) and which is variability associated with unexplained fluctua·
tions (wi thin-group variance).

The totnl variability of scores is divided into one componenl representing the variability
of treatment group means around an overall mean (sometimes called a grand mean) and
another component representing the variability of group scores around their own individ·
ual group means. The variability of group means around the grand mean is called between·
group variance. The variabiliry of individual scores around their own group means is called
within-group variance. This division is rep.–nted by the foUowing equation:

{X – X)~ (X -Xl +(X-X).
Total Within Between

The X with two bars represems the grand mean, which is the mean of all scores with·
out respect to which group they are in. X is a particular score, and the X with one bar is
the mean of the group to which that score belongs.

C.HAPlUt 6 a STATiiliGS roll: SOCIAl W Oill({fi S 109

This equation illustrates that tbe deviatio n of the particul ar score fro m t he grand mean
is the sLun of the deviation of the sco re fro m its g roup mean and the deviation of tbe
g ro up mean fro m t he g rand mean. T his might be a little dearer if we look at a simple data
set. Let us hlke the exam ple about ageist attit udes among hospital social workers (Group I),
nursing ho me social workers (Gro up 2), a11d adult protective services social workers
(Group 3). T be dependent measure quan tifies ageist attitudes (higher scores represent
n1ore ageist sentiment).

There are k = 3 g ro ups, with each containing n = 4 scores. The total number of scores
is N= 12. The group means are 3 (Gro up 1 ), 5 (G roup 2), and 9 (Grotlp 3), and the grand
mea n is 5.67.

There are t hree types of sum of squares calculated in AN OVA. T he fo rm ulas fo r the
sums of sq uares are derived fro m t he deviatio n score C
ss, …
1
is calculated by subtracting the grand mean from each score, squaring the differ-

ences, and add ing up (summing) the squared differences:
=2

ss,.”‘ = (X – Xl .

ss …. m is calculated by subtracting the group mean fro m each score within a group,
squaring the differences, a nd adding up (summing) the squared differences fo r each
g ro up. This gives us t hree s ums of squares: sswoup I’ SSC.,>I>p , . and SS.;ooup>· These are added
up to give us ssv.·ilhin:

– 2 – 2 – 2
ssW”'” = r
s~.~ is calculated by subtracting t he g rand mea n from each group mean, squaring
the diffe rences, and adding up (summing) the squared differences. Then, we multiply the
to tal by the sample size. This is because this sum of squares needs to be weighted. Whereas
N = 12 scores ~~ent to make up SS10,.1, and ( k)(n) = (3)(4) = 12 scores went to m ake up
SS., … ,,,, o nly the k= 3 g roup means went to make upS~””‘”. We m ultiply by 11 = •l so that
S~~ will have t he same ” ‘eig ht as tlte o ther two sums of squares:

S~”‘””‘ = ” I (X – X)’.

The sums of squares arc as fo llow·s:

SS,.;,'”‘ = 20 + 20 + 20 = 60

s~ ….. ,”‘ (4) 18.667 = 74 .667
ss … ,, = 134.667.

The to tal sum of squares (SS~,1 ) is t he sum of the within-g ro up su m of sq
o r

134.667 = 60.00 + 74.667.

110 PAtH I a Q u AN11JA1 1V[ APPI\0A(H£S: FOUIIOAltO~S Of 0 AlA COlltCTIO!.’

Each of these sums o f squares is a component o f a d iffere nt variance. In ANOVA jar-
gon, a variance is called a mean square. Each particular m ean square ( variance) has its
own degrees of freedom .

Because the total sum o f squares (SS,.,1) involves t he varia bility o f all scores aro und
o ne grand mean, the degrees of freedom ar e N – l. The within-groups sum of squares
(SSw”””) involves the variability of all scores wit hin g roups around k g ro up m eans, where
k is the n umber o f g ro ups. So, the within-groups degrees o f freedo m are N- k. T he
between-groups sum of squares($\””””‘) involves the va riability of k gr o up m eans
around the grand mea n. So, the between-g roups degrees of freed om are k – J.

BeCtJase :1 (/tlritlii<'Y:' (meoll sqa,?re) is,? Rllll of square> diviOed br degrees of freedom,
the fo rmu la fo r a m ean square would be MS ~ SSitlf

Two mean squares are u::;ed to calcnlate the Fubt statistic: MS~·i!Jun and A-f~,wMn · Their
specific fo rm ulas are as follows:

There are k ~ 3 groups, so df,”””” = k – 1 = 3- 1 = 2. We may now compute
A•f\””‘” = i 4.66712 = 3i.333

and

T here are a to tal of N = 12 scores within k = 3. so di,;,,;0 = 12- 3 = 9 and MS .. n ,h;, ~ 60/9
~ 6.667.

These are the two variances u~ed ro m ake up the F ratio (F •• ,): MS.., • …., and MS,.,,,,.
The fo rm LLla for F •• , is

MSt,.,w..,n
MSwulUn .

l f we plug in t he values from o ur example, t hen we obtain

fo~x = MSb””‘”” = 37.333 = S.6s.
MS,,;,hin 6.667

This is a bit confusing when presented in bits aJ1d pieces. The ANOVA sununary table
is a way of p resent ing t he information about the sums of squares, degrees of freedom,
mean squares, and F statistics in a more easily understood fashion. Table 6 . 17 uses the
example data.

Once we have computed the Poht’ iL is compared to a critical F. Because two variances
were used to calculate o ur F •• ,. there are two types of degrees o f freedom asso ciated with
it: n umerator deg rees o f freedom (between g ro u ps) and de;w .minator d egrees of freedom
(within g roups). T hese are used either to look up values in a table o f the F distribution or
by computer programs to com pu te p values.

For our example, the n umerator degrees o f freedo m are df = 2 because 2 degr ees of
freedom were used in the calculation o f MS,””‘”” The d enominator d egrees of freedom

C HJo i’IU 6 • S t ATISTIC.S fO ft S OCtAl 1N CIIUP.S 111

TABLE 6 . 17 ANOVA Summary Table

Source Sum of Squares Degrees of Fceedom Mean Squar~ F
11111

B~tween 74.667 3 – 1 – 2 74.67/2 = 37 333 37..333/6 667 = 5 65

Within

Total

60.00

134.667

12 – 3 – 9 60.00/ 9 = 6.667

12- 1 • 11

are df: 9 because 9 degrees of freedom were used in the calculation of MS . .,,h;, · The criti-
cal value for Fat 2 and 9 degrees of freedom is .t~”‘ = 4.26. Because F..,,: 5.6 is greater than
the critical value, we reject the null hypothesis at«= .OS.

Based on these findin gs, it is likely th at at least one pair of means come from d ifferent
populations. Because we already have screened out other opportuni ties LO commit’I)’Pe 1
error, further testing would not be capi[aiizing on chance. Thus, we may carry out the fol-
lowing pair comparisons:

Group l versus Group 2

Group I versus Group 3

Group 2 versus Group 3

The individual pair comparisons may be carried out using any of a number of multi-
ple comparison tests. One of the more frequently used is the least significant difference
(LSD) test. The l.SD test is a variant on the t test. However, the standard error of the mean
is calculated from the within-groups mean square (variance) from the ANOVA:

where

tt, is the nwnber of scores in Group i, and

tt, is the number of scores in Group J.

If the group TIS are equal, then this becomes

For our example,

Sx;-.<_; = )(2}(6 .667)/4 = J3.333 = 0.557.

We now maycarry oul our comparisons evaluating tat df= N – k= 12 – 3 = 9 (Figure 6.6).
In all three instances, we reject the rwll hypothesis at a = .OS.

I

Figure 6 .6

Multiple Comparisons

Hospilal (Group I) vs t – 3 – 5 – 3466 df= 9,«= 05
Nursing Home (Group 2) “‘ – 0.577 – . / t!tl = 2.262

Reject H.

Hosprtal (Group 1) vs. r.,. •• g;~ = 10399 Clf = 9, a- .05
Adult Protective Services t .. , = 2.262
(Group 3) Reject H.

Nursrng Home (Group 2) ‘-=~5~ = 6.932 Clf = 9,a ~ 05 vs. Adun Pro!ectrve la.= 2.262
Services (Group 3) Rejecl H.

T here are a number of measure> for effect size for ru’\0\’A. For the >.Ike of srmplicity,
we d eal wit h rwo: Cohen’• (1988) J and 1{

The J effect· size mca>ure is eq ual to Lhe stand ard deviatio n of th e sam ple means divided
by the pooled “ithin group standard devialion. It ranges from a min imum of 0 to an
rndetinitcly large upper limit. It m~) be estimated from F..,. by using the following for mula:

f = JnFobr·

11′ wa, discussed earlier and defined as a proportion of variance explarned. It is calcu-
laled by the fo llowing formula:

l S.’itwlwttn
1) =– – . ss,,,,.,

It also may be calcul.lled from art F.,.:

Cohen ( 1988) categorizes these effect si1-“s into small, medium, and large categories.
The critcri~ lor each are as folio” s:

Sm all cfYcct size: f :. .lO
Medium efYect size: f; .25
Large effect size: f .40

Using the exarn plr dJLa, 11′ is

11′ = .0 1

11′ ; .06

11′; . 14

z SSt.,,…. 74.667
‘l = = 0.554.

ss,”‘·” t 34.667

CHArtfa 6 • Sr.c..nsTIC;.s fQI\ SociAL WoRKEss 113

which is a very large effect.

Kmskal-Wal!is Test. The K-W test is the k > 2 groups equivalent o f the W/M -W test.
The test involves iniliall y treating all samples as one gro up and ranking scores from
least to most. After this is done, the frequenc ies of low and high ranks among groups <1re compared.

The assumptions of the K-W test are as follows:

Rat~donmess: Sample members must be randomly drawn from the population of inter-
est and randomly assigned to one of the k groups.

Independence: Scores must be independent of each other.

Scali?Jg: The dependent measure must be ordi nal (interval or ratio scores must be con-
verted to ranks).

When the assumptions of ANOVA arc mer, the analysis of variance will be sligh tly
more po<,•erful than the K -W test. However, if the distribution of population scores is not normal and/or the population variances are not equal. then the K-W test might be the more powerful test.

The K-W test is a screening test. If th ere is no significant difference foun d, then we stop
testing. If a significant difference is fo und, then we proceed to test ind ividual pairs with
the W/M -W test.

Our example involves the evaluation of three interven tion techniques being used with
clients who wish to stop making negative self-statements: (a) self-disputation,
(b) thought stopping, and (c) identifying the source of the negative statement (insight). A
total o r 27 clients with this concern were randomly selected and assigned to one of the
three intervention conditions. On the 28th day of the intervention, each client counted
the n umber of negative self-statementS that he or she had made.

The proced ure for tlle K-W test is s imilar to that for the W/M-W test. We begin by
assigning ranks to the scores without regard to which group individuals were in. We then
sum the ranks within each group. The sununed ranks are called W, for Group I, W2 for
Group 2, and W, fo r Group 3 (Table 6 .18).

The test statistic for the K-W test is H..,,, which is approximately distributed as X’ with
df'” k – 1- Jt is calcuhted according to the follow ing equation:

TABlE 6.18 Summed Ranks for the Kruskal-Wallis Test

C roup 1 G(Oup 2 Group 3

Summed Rant, = w, • 122.S Summed Rank3 .;;: W3 ;; t 66.5

12 ( Wk) 2
lfoo,. = – – • 2:— 3(N +I)

N(N + I) II(

wh ere

W, is the sum of ranks for Group k,

111 is the n umber of individuals in Group k, and

N is the total number of individuals in all groups.

From our example, we obtain the following:

12 (89l ( 122.5) 2 ( 166.5)2

27(27 I I) • -9-+ 9 + 9 J(27 + I}
– _ 1_2- • 7921 -i 15006.25 + 27722.25 – 3( 28)

27(28) 9
12 S0649.5

= 756 . 9 84 = (0.0 159 . 5627.7222)- 84 = 89.3289 – 84
= 5.3289

T h is is the test staU>tic if there are no t ied scores. However, if there are tied scores, then
the K-W test statistic has a corr~tion for tics, which is as follows:

E(t’ t)
C= 1-~–:–f

N’ N

The letter 1 refers to the number of tied scores for a IY•nicular tied group of numbers.
In our example, the score of 4 occurred twice, ‘0 t = 2 fo r this g roup. 1 he score of 5
occ u•-red three t ime>, so I= 3 for t his group. There were seven grou ps fo r which t = 2 and
two groups for which 1 = 3.

1 he correction is calculated as foUow;:

C
1

{2
1

-2)+(2
1
-2)+{2

1
21+(2′ l)r~~:-~!+(z’ l,r(Z’-2)-(3′ )j+(3′ 3)

I – (8- 2) H8- 2) + (8 – 2) i-(8 -2) + (8 – 2) + (~- 2) f (8- 21 I (9- 3) + (9- 3)
19,683 27

_,_6+6 6+6•6+6+6l6r6 s• 0.0027
—-.-q,65{, = ‘i9.65(, ~ I

0.9973

We divide H.,.. by the correct ion factor ( q to obtain the corre<.tcd test statistic H':

1-1′ = Hrobl
c

5.3289 3
-09 = 5. 434.

. 973

1-f.., is app roxi mnrely d i,trib uted as x’ with k- I degre~s of freedom . Th e critical value
fo r ‘J.2 at df= 2 and Cl”‘ .05 is x:., = 5.99 (see a table of critical value. of X’ found in mo>t
statistics teru). fl •- 5.34 is not greater than or equal to the critical value, so we fail to
reject the null hypothesis at a. .05.

Based o n these results, we would not carry o ut m ultip le pair compar isons. Because t he
K • W test did not find anr significant differences among the three group~ retesting the
same n u ll hypothesis by a series of pair comparisons would not be justi fi ed.

;(” Test of fudepet1dmce (k X k). fhe test otatistic is the same for a k X k X’ test of indepen ·
dence as for a 2 x k test. The asoumptions are as follows:

Randomness: Sample members muM be randomly drawn from the population.

Jndeprndence: Sample score~ must be independent of each oth e r. 011e im plication of
this is t ha t categories must be mutually exclusive ( no case may appear in more than
one category).

Scalirrg: Th e dependen t m easure (categor y) mmt be n om ina l.

£xpecwl freq11e11cits: No expected fr•”luency within a category should be less than I ,
and no more than 20% of the expected frequencies should he less than 5 ..

Let us imagine that we >till are interested in marijuana usc among high school
student>. We are mtere.sted m lhe marijuana u~c differences (if any) among lOth, lith,
a nd 12th graders in our school distr ic t. A p roponionatc stratified 1·a ndom ;ample w,,s
drawn of sixty l Oth g raders, ,[xty·five l l l h graders, and fifty-five 12th grader> from all
students in the district. The students “ere surve)ed on their use ot drugs over lhe past
year under conditions guaranteeing con fidentia lity of res ponse. Table 6. 19 s hows
reported m arijuana use for th e sampled studen ts.

The null hypothesis for this example \>Ould be thattbe same proportions of lOth, lith,
and 12th graders used marijuana during the pa>l year. The null hypothesis values for this
test are the expected freq uencies. These exp ected freque ncies are culculated in t he same
•vay as fo • u 2 x kz’.

Table 6.20 show” the crOS!o tabulation \-‘ith the expe<"led frequencies. Table 6.21 showo the p roc<"<< nre for colculating x:.~x· For rlf ~ 2 a nd a ~ .05, the clitical value fo r z' is 5.99 (see a table of cri t ical value$ of x'

round in most statbtics texts). Our calculated value (x!,.) was 3.420. Because the obtained
(calculated) value did not exceed the critical value, we would not rejc(l the null hypoth e>i>

Tuu 6 . 19 Reported frequencies of Marijuilna Use

lOth lllh 12th Total

None 30 28 33 91

Marijuana 30 37 22 89

l ot• I 60 65 55 180

116 PAin I • OuAum ATIVE Atl’ltOA(.HfS: Fou NDATI ONS Of DArt. CottKllo:.r

TABLE 6.20 Observed and Expected Frequencies for Marijuana Use

Grade

10 th 11th 12 th Total

None 30 (30.33) 28 (32 8 6) 33 (27.8 1) 91

~;:1 arijuana 30 (29.67) 37 (32. 14) 1 2 (27.19) 89

Total 60 65 55 180

NOTE: Expected frequencies are in paret~thMcs..

TABLE 6.21 Computation x’ •••

Observed (f
0

) Expected (f,) (f. – f,) (f0 – f, )’ (f.- f,)’

f,

30 30.33 -{).33 0 .1089 0.00359050

28 32.86 -4.86 23.6196 0 .71879489

33 2 7.8 1 +5.19 26.9361 0 .96 857605

30 29.67 -0.33 0. 1089 0 .00367037

37 32. 14 +4.86 23.6196 0 .73489732

Z2 27.19 -5. 19 16.9361 0 .9906620 1

IIIOTt: 7.!.= 0.00359050 + 0.7 1879489 + 0 .96857605 + 0.00367037 + 0.734897 32 + 0.9906620 1
= 3.42019114.

at 0< = .05. l~ecause the screening test results were not statistically significant at a = .OS, we do not carry out the pair comparisons (lOth with 11th grades, lOth with 12th grades, and II th with 12th grades).

Conclusion

This chapter· has discussed some of the m ore frequendy used statistical hypothesis tests
and their associated measures of effect size. Of course, there are m any other importan t
statistical hypothesis tests that were not discussed. T hese incl ude tests of correhllion coef-
fic ients, multiple regression analysis, and factor ial and block design ANOVAs, among

CHAf’TfR 6 • STATISTICS EQI\ SOCJAI \’iQI\KU:t 117

many others. The reader who wishes to learn more should consult one of the recom-
mended further readings at the end of the chapter.

Similarly, the discussion of statistical power in this chapter was necessarily limited d11e
to space constraints. I strongly urge the reader to become more deeply acquainted with
power analysis.

Finally, the reader should recognize that statistical hypothesis rests provide evidence only
for relationships between independent and dependent variables. They do not provide evi-
dence that such relationships are fu nctional ones. This is the more difficult task of accounting
for or controlling e.’l.1raneous variables that is discussed in other chapters of this handbook.

Notes

I. A sample is a s ubgroup trom a popu1atioo.
2. A popula1 ion is all rhat there is of a particular thing.
3. A va dable is a characteristic that m ay assume mor<.' than one value. It varies. Some examples

of variables include number of people living in a household. score on the Index of Family
Relations. length of time engaged in cooperative pia)’• and stlf-raling of anxiety.

References

Cohen, J. ( 1988). Swtistiad power rmaly;sis for the bdwviornl sciences (2nd ed.). Hillsdale, NJ:
l,a\.,l’ence Erlbaum.

Mann, H . B .• & \Vhit ney, 1). R. (1947). O n a 1est of w hether o ne of two ra ndom variables is s lO-
chastk aJly larger than the other. t\mUlls of Mathematical Statistks, 18, 50-60.

VVikoxon> F. ( L945). Individual comparisons by ranking methods.JJiometriC$, /, 80-83.

Recommended Readings

Cohen> J.) & Cohen, P. (2003}. Applied multiple regression/correlnthm nnaly$iS for the bchaviornf sciences
(3rd ed. ). Mahwah, Nl: l awrenre Erlbaum.

Siegel. S., & Castelbn, N. ). ( 1988). Nonpammetttc swtistics for tlte bchaviotal sdeno.t< (2nd ed.). New York: M
Stevens~ j . {2002). Applied nwltivnri(llt surtistks j’or tile soc:i”l sciences (4th ed.). tvhlll\V”
hLLp:/ /statastics.corn/
Th1s is a wonde1fu l Web site hsting online classes you can take in a wide a nay of statistical topics, rang·
ing from the introductol)’ to the advanced.

http:/ /www.cochrane.org/ncws/workshops.htm
Th1s Web site from the Cochrane Collaboration lis!$ a number of llarning opportunitJes related to design-
ing and conducting systematrc reviews of the emprncal research lite
http./ /www.siJIS.gla.ac.uk/steps/glossary/
This is a s11e for an online glossary of stallsllcs terms and tests.

DISCUSSION QuESTIONS

1. locate a recenlly publiShed research study in a SOCia wort !OUmal.and see •I you can’ •d where the
authors exphcitfy state one Of more predictive hypotheses. Unde~ine these and bring the artocle to
class, so you can read the hypothesrs to your classmates Orscuss the qual rues of thrs hypothesrs. rn
terms of rts testability.

2. Ask your •nstructor to cxplarn why a measure of eftect sile should always accompany any report of
a statisncalty significant drfference. What do effect SIZe reportS add 10 s1mpiy reporting whether a
gillen dtffercnce exceeded chance expectallonsl

3 Suppose you measure a group of socraf work clients before and after t.hey recerve a specrfic socral
work 1ntervent1on. What type of stat istrcal te>t would be most approprrate, one for independent
samples or one for dependent samples’ Explain why.

–

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Assignment ”
Get high-quality paper
NEW! AI matching with writer

Order an Essay Now & Get These Features For Free:

Turnitin Report

Formatting

Title Page

Citation

Outline

Place an Order

Share

Tweet

Share

Tweet