PLEASE POST EACH ASSIGNMENT SEPARATELY
Assignment 1
Read Chapter 1 and Chapter 2 (ATTACHED) Write your reflections by selecting an idea from the reading, describing your thoughts and feelings about it (Total of 1 Pages half-page per Chapter)
Assignment 2
Read Chapter 3 and Chapter 4 (ATTACHED) Write your reflections by selecting an idea from the reading, describing your thoughts and feelings about it (Total of 1 Pages half-page per Chapter)
1
YOU’D PROBABLY FIND IT DIFFICULT TO LOCATE ANYONE, TEACHER OR NON-
teacher, who doesn’t recognize that there’s some sort of a relationship
between teaching and testing. Just about everyone realizes that if a
teacher does a great instructional job, that teacher’s students will usu-
ally perform better on tests. It’s the other side of the equation that’s
less often understood, namely, that how a teacher tests— the way a
teacher designs tests and applies test data—can profoundly affect how
well that teacher teaches.
The connection between one’s teaching and one’s testing is a crit-
ical one that, if properly understood, can lead to a substantial increase
in instructional effectiveness. I want you not only to accept the idea
that testing can help teaching, but also to act on that idea. I want you
to pick up tangible instructional payoffs from linking your tests to
your teaching. You’ll teach better, and your students will learn more.
You’ll be a better teacher, and I’ll be a happy author. Let’s get started.
What’s in a Name?
I need to define some terms as we get under way. First, what is a test
or, more specifically, what is an educational test? Simply put, an edu-
cational test is a formal attempt to determine a student’s status with
1
The Links
Between Testing
and Teaching
ch1.qxd 7/10/2003 9:53 AM Page 1
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R2
respect to specific variables, such as the student’s knowledge, skills,
and attitudes. The adjective “formal” in the previous sentence is
important, because it distinguishes a test from the many casual judg-
ments that teachers routinely make about their students. For exam-
ple, during my first year of teaching (in a small eastern Oregon high
school), I had a student named Mary Ruth Green. I could almost
always tell (or so I thought) how well Mary Ruth had mastered the
previous night’s English homework assignment. When it came time
to discuss the homework topic, if Mary Ruth was animated and eager
to contribute, I concluded that she knew the assigned stuff. If she sat
silently and avoided eye contact with me, however, I guessed that she
and the previous night’s homework topic were unacquainted.
I made all sorts of on-the-spot judgments about what Mary Ruth
and my other students knew, but those judgments were informal
ones and often based on pretty skimpy observational data. In con-
trast, a test entails a systematic effort to get a fix on a student’s status
with respect to such things as the student’s ability to perform an
intellectual skill—to compose a job-application letter, for instance, or
to carry out an hypothesis-testing experiment in a chemistry class.
For many people, the word test conjures up images of traditional,
paper-and-pencil forms (multiple-choice exams or True-False
quizzes). Perhaps this explains why a growing number of educators
prefer to use the term assessment, which seems to embrace both tra-
ditional forms of testing and comparatively recent ones like looking
for evidence of learning by examining student-generated work port-
folios or group reports of experimental projects. Still, as long as you
don’t restrict yourself to only traditional testing approaches, the
terms test and assessment are really interchangeable. And while we’re
swimming in this particular synonym pool, let me toss in two more:
the slightly technical-sounding measurement and the serious-sound-
ing examination (or exam). Each of these four terms describes a formal
attempt to determine a student’s status with respect to an education-
ally relevant variable. In this book, you’ll find that I use all four
ch1.qxd 7/10/2003 9:53 AM Page 2
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T h e L i n k s B e t w e e n T e s t i n g a n d T e a c h i n g 3
interchangeably, not for any subtle reasons, but just because I get
tired of using the same word all the time.
Why We Test
Human beings are tough to figure out. Ask any psychiatrist. Ask your-
self. And young human beings in grades K–12 are no exception. To
illustrate, if a teacher wants to determine what Ted’s ability to read is,
the teacher won’t find that information tattooed on Ted’s arm. Ted’s
reading ability is covert. The teacher must figure out how to uncover
that hidden ability. So the teacher whips up a 15-item reading test
calling for Ted to read several short passages and then answer a series
of questions getting at (1) the central messages in the passages and
(2) certain key details in those passages. Ted takes the test and does a
great job, answering each of the 15 items correctly. The teacher then
makes an inference about Ted’s covert reading ability based on Ted’s
overt performance on the 15-item test.
If you think about it, just about every worthwhile thing that edu-
cators try to promote is unseeable. Consider spelling ability as another
example. A child’s spelling ability cannot be seen, only inferred. What
goes through the teacher’s head is something like this:
Martha did well on this month’s spelling test. She wrote out
“from scratch” the correct spellings for 18 of 20 words I read out
loud. It is reasonable for me to infer, then, that Martha possesses
a really high level of spelling ability—a level of ability that would
display itself in a fairly similar fashion if Martha were asked to take
other, similar 20-item spelling tests.
Remember, what the teacher sees when Martha spells the word
“awry” properly is only Martha’s spelling of “awry” and not Martha’s
spelling ability. The teacher needs to infer the level of Martha’s
spelling skill by seeing how well Martha does on her spelling tests.
The more spelling tests that Martha takes, the more confidence the
ch1.qxd 7/10/2003 9:53 AM Page 3
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R4
teacher can have in any inferences about Martha’s spelling skill. An
inference about a student can be based on a single test; a more accu-
rate inference will be made if multiple tests are employed.
Likewise, a child’s ability to perform basic arithmetic skills is
unseeable; it’s something we infer from the child’s performance on an
exam (or, preferably, more than one exam) dealing with adding,
subtracting, multiplying, and dividing. Children’s confidence in
being able to present an oral report to their classmates is certainly
unseeable, but again, we can infer it from students’ responses to an
assessment instrument constructed specifically to measure such
things. (You’ll learn more about that sort of noncognitive assessment
in Chapter 8.)
So educational measurement is, at bottom, an inference-making
enterprise in which we formally collect overt, test-based evidence from
students to arrive at what we hope are accurate inferences about stu-
dents’ status with respect to covert, educationally important vari-
ables: reading ability, knowledge of history, ability to solve simulta-
neous equations, interest in social studies, and so on. The process is
represented in Figure 1.1.
1 . 1
EDUCATIONAL TESTING AS AN
INFERENCE-MAKING PROCESS
ch1.qxd 7/10/2003 9:53 AM Page 4
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T h e L i n k s B e t w e e n T e s t i n g a n d T e a c h i n g 5
Yes, as my experience with Mary Ruth and her homework
showed, it is certainly possible for a teacher to make an inference
about students based on informal, nontest evidence. Suppose your
student Alvin gives you a note in which he had misspelled several
words. Based on this evidence, you might infer that Alvin’s spelling
ability isn’t all that wonderful. However, a formal assessment of
Alvin’s spelling skill, one based on a larger and more age-appropriate
set of words, would increase the likelihood of your making an accu-
rate inference about Alvin’s spelling ability.
The accuracy of these inferences is critical, because a teacher’s
understanding of students’ knowledge, abilities, and attitudes should
form the basis for the teacher’s instructional decisions. And, of
course, the more accurate the test-based inferences a teacher makes,
the more defensible will be the teacher’s instructional decisions based
on those inferences.
What Sorts of Teaching Decisions Can Tests Help?
I’ve been touting the tight relationship that should be present
between testing and teaching. It’s time to get more specific. There are
four types of teaching decisions that should rest squarely on what a
teacher finds out either from the structure of the educational tests
themselves or from the way students perform on educational tests.
Decisions about the nature and purpose of the curriculum. Essentially,
the teacher seeks answers to questions like these: “What am I really
trying to teach? What do my students need to know and be able to
do? How can I translate the big curricular goals set for my students
into specific, teachable components?”
Decisions about students’ prior knowledge. Questions include, “ What
do my students already know about the topic I’m planning to teach?
Are there any gaps that I need to address before we can tackle this
material? Based on what my students know and can do, how can I tai-
lor my instruction to provide the proper balance of remediation and
challenge?”
ch1.qxd 7/10/2003 9:53 AM Page 5
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R6
Decisions about how long to teach something. Questions include,
“How long do I think it will take my students to master this content?
What kind of progress are they making? Are we on the right track?
Should I continue teaching on my planned schedule, or are we ready
to move on?”
Decisions about the effectiveness of instruction. Questions include,
“Did my students learn? Was the instructional approach I took a good
one? What specific activities were the most advantageous? Where do
I need to make alterations?”
Now, let’s take a closer look at how tests—both their design and the
results of their application—can help teachers make these kinds of
decisions with confidence.
Using Tests to Clarify the Curriculum
Typically, educators think of a curriculum as the set of intended out-
comes that we want students to achieve. During the bulk of my
teaching career, most teachers have used the phrase educational objec-
tives to describe their curricular intentions. These days, of course, we
find that most curricula are described as sets of content standards—
that is, the knowledge and skills students are supposed to master as a
consequence of instruction. Sometimes we see the term benchmarks
used to describe the more specific skills and knowledge often sub-
sumed beneath fairly broad content standards. The descriptors may
change, but the mission of a curriculum remains constant: Its essen-
tial purpose is to lay out the stuff we want kids to learn.
Regardless of whether we call them content standards, goals, or
objectives, the curricular intentions handed down by states and dis-
tricts are often less clear than teachers need them to be for purposes
of day-to-day instructional planning. For example, a group of ele-
mentary teachers might find themselves responsible for promoting
this district-approved social studies content standard: “Students will
comprehend the formal and informal nature of the interrelationships
ch1.qxd 7/10/2003 9:53 AM Page 6
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T h e L i n k s B e t w e e n T e s t i n g a n d T e a c h i n g 7
among the executive, legislative, and judicial branches of U.S.
government.”
Let’s imagine you’re one of the 5th grade teachers who is supposed
to help students master this content standard. How would you go
about planning your instruction? Personally, I think there’s way too
much fuzz on this curricular peach. Different teachers could easily
read this social studies content standard and come up with quite diver-
gent ideas of what it signifies. For example, one teacher might con-
clude that this content standard focuses exclusively on the formal and
informal “checks and balances” when one governmental branch inter-
acts with the other two. Another teacher might think that this content
standard emphasizes the distinction between “formal” and “informal”
interrelationships among the three governmental branches.
Now suppose that your 5th graders will be taking an important
“standards-based” social studies achievement test at the end of the
school year. If the people who built that test interpret this social stud-
ies content standard in one way, and you interpret it in another
way—and teach toward your interpretation—it’s almost certain that
your students won’t do as well on the achievement test as you, your
principal, or your students’ parents would like.
Clearly, if the curricular aims that a teacher must address are open
to multiple interpretations, then off-the-mark instruction is likely to
occur, bringing with it lower test performances. But if a curricular
goal is accompanied by a set of illustrative test items indicating the
ways that the goal will be measured, then teachers can analyze those
items and form a far more accurate idea of the outcome that the state
or district is actually seeking. Because the sample test items exem-
plify what the curricular intention really represents, teachers can plan
and provide their students with better, more curricularly relevant
instruction.
To illustrate, suppose you knew that mastery of the fairly fuzzy
5th grade social studies goal about the three branches of the U.S. gov-
ernment would be assessed by items similar to the following:
ch1.qxd 7/10/2003 9:53 AM Page 7
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R8
SAMPLE ITEM 1
Which of the following three branches of U.S. government, if
any, is primarily responsible for the final enactment of treaties
with foreign nations?
a. Legislative c. Judicial
b. Executive d. No single branch is responsible.
SAMPLE ITEM 2
Which, if any, of the following statements about governmental
stability is true? (Mark each statement as True or False.)
a. The enactment of term-limiting legislation at the local level
has made the U.S. federal legislative branch of government
more stable.
b. The availability of the impeachment process tends to decrease
the stability of the executive branch of U.S. government.
c. Historically, the judicial branch of U.S. federal government
has been the most stable.
SAMPLE ITEM 3
Our founding fathers charted a meaningful series of govern-
mental checks and balances. Focus on the area of taxation,
then select two of the three branches and briefly describe the
formal way(s) in which one branch can check the other. Answer
in the space provided below.
Having read these sample items, wouldn’t you have a much better
idea of what to teach your students in order for them to come to
“comprehend the formal and informal nature of the interrelationships
ch1.qxd 7/10/2003 9:53 AM Page 8
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T h e L i n k s B e t w e e n T e s t i n g a n d T e a c h i n g 9
among the executive, legislative, and judicial branches of U.S. govern-
ment”? Sample Item 1 makes it clear that students will need to learn
the primary responsibilities of each governmental branch. Sample
Item 2 suggests that students must learn why important factors such
as governmental stability are present for each branch. And Sample
Item 3 indicates that, as the content standard said, students will need
to understand the “formal and informal nature of the relationships”
among the governmental branches. For this item, as you can see, the
focus is on formal. In another item, you can reasonably assume, the
focus might be on informal. Moreover, Sample Item 3 tips you off that
students may need to display this understanding by constructing
their own responses, rather than merely selecting a response from a
set of options.
I believe that elementary teachers who consider these three illustra-
tive items along with the original statement of the content standard are
going to have a far more lucid idea of what the content standard actu-
ally means. Consequently, they’ll be able to deliver instruction that is
more on-target and more effective.
The payoffs from test-triggered clarity about curriculum goals can
apply with equal force to a teacher’s own, personally chosen curricu-
lar aspirations. If teachers are pursuing curricular aims of their own
choosing, but those aims are less clear (in a teacher’s mind) than is
desirable for instructional planning purposes, then teachers are likely
to come up with less relevant instruction. To illustrate, when I was a
first-year teacher, I wanted the students in my two English classes “to
be better writers.” But even though that very general goal was in my
mind as the school year got under way, I really had no idea of what
it meant for my students to be “better writers.” As the months went
by, I occasionally had my students write a practice essay. However, for
their final exam, I had them answer multiple-choice items about the
mechanics of writing. Shame on me!
The task of creating a few sample assessment items can bring the
desired outcomes into focus. In short, test-exemplified curricular
ch1.qxd 7/10/2003 9:53 AM Page 9
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
goals will almost always be better promoted instructionally than will
unexemplified curricular goals. Because of the significance of tests in
helping teachers clarify their instructional targets, I’m going to dig
into this topic a bit more deeply in Chapter 2. Stay tuned.
Using Tests to Determine Students’ Entry Status
In most instructional settings, teachers inherit a new crop of students
each year, and more often than not, these teachers really don’t know
what sorts of capabilities the new students bring with them. Likewise,
teachers looking ahead in their planning books to new topics or skills
(weather systems, Homer’s epics, multiplying fractions, group discus-
sion skills, ability to work independently) frequently find they have
only the roughest idea, usually based on the previous grade level’s
content standards, of their students’ existing familiarity or interest in
the upcoming topics or of their students’ expertise in the upcoming
skill areas. Knowing where students stand in relation to future con-
tent, both as a group and as individuals, is one of a teacher’s most
valuable tools in planning appropriate and engaging instruction.
Therefore, it’s an eminently sensible thing for teachers to get a fix on
their students’ entry status by pre-assessing them, usually using
teacher-created tests to find out what sorts of skills, knowledge, or
attitudes these students have. The more diagnostic a pretest is, the
more illuminating it will be to the teacher.
You can use pretests to isolate the things your new students already
know as well as the things you will need to teach them. If you are a
middle school English teacher aspiring to have your 8th graders write
gripping narrative essays, and you’re certain that these 8th graders
haven’t seriously studied narrative essays during their earlier years in
school, you could use a pre-assessment to help you determine whether
your students possess important enabling subskills. Can they, for exam-
ple, write sentences and paragraphs largely free of mechanical errors in
spelling, punctuation, and word usage? If their pre-assessment results
show that they already possess these enabling subskills, there’s no need
T E S T B E T T E R , T E A C H B E T T E R1 0
ch1.qxd 7/10/2003 9:53 AM Page 10
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T h e L i n k s B e t w e e n T e s t i n g a n d T e a c h i n g 1 1
to re-teach such subskills. If the pre-assessment results show that your
students’ mastery of the mechanics of writing is modest, then you’ll
need to devote appropriate time to promoting such subskills before
you move on.
This example brings up an important point. If you’re using a
classroom pretest chiefly to get a picture of what your students
already can do regarding a particular content standard, you should
always try to employ a pretest that covers the standard’s key enabling
subskills or bodies of knowledge. For instance, when I taught a speech
class in high school, I always had my students deliver a two- to three-
minute extemporaneous speech early in the term. I was looking par-
ticularly for the fundamentals—posture, eye contact, organization of
content, introductions, conclusions, and avoidance time-fillers such
as “uh” and “you know”—those things I knew students needed to
master before they could work on refining their abilities as first-class
public speakers. Those pretests helped me decide where I wanted to
aim my early instruction, and it was always at the most serious weak-
nesses the students displayed during their “mini-orations.”
Using Tests to Determine How Long to Teach Something
One of the classes I taught in my early years on the “grown-up” side
of the desk was 10th grade geography. Thanks to a blessed red geog-
raphy textbook and my ability to read more rapidly than my 10th
graders, I survived the experience (barely). I remember that one of my
units was three-week focus on map projections and map skills, during
which we explored the use of such map-types as Mercator and
homolographic projections. Each year that I taught 10th grade geog-
raphy, my three-week unit on maps was always precisely three weeks
in length. I never altered the duration of the unit because, after all, I
had originally estimated that it would take 15 days of teaching to
stuff the designated content into my students’ heads. Yes, I was
instructionally naïve. Beginning teachers often are.
ch1.qxd 7/10/2003 9:53 AM Page 11
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R1 2
What I should have done instead was use some sort of “dipstick”
assessment of students’ map skills throughout that three-week period
to give me a better idea of how long I really needed to keep teaching
map skills to my 10th graders. I always gave my students a 30-item
map skills exam at the end of the 3 weeks; I could easily have taken
that exam and split it up into 15 microquizzes of 1 or 2 items each,
and then randomly administered each of those microquizzes to dif-
ferent students at the end of, say, 2 weeks. Students would have
needed only two or three minutes to complete their microquizzes.
This approach is a form of what’s called item sampling, a manner
of testing in which different students are asked to complete different
subsamples of items from a test. It works quite well if a teacher is try-
ing to get a fix on the status of an entire class. (Clearly, item sampling
wouldn’t permit sensible inferences about individual students
because different students would be completing different micro-
quizzes.) By reviewing the results of my item-sampled, en route assess-
ment, I could have determined whether, at the end of only two
weeks, my students had already learned enough from their meander-
ings through Mapland. Looking back, I suspect, we continued to mess
with Mercators and homolographics well beyond what was necessary.
You can do something similar with your own students to help
you decide how long to continue teaching toward a particular con-
tent standard. By using an occasional en route test (either item sam-
pling or by giving the same, possibly shortened, test to all of your stu-
dents), you can tell whether you need to keep banging away on a
topic or can put your drumsticks away.
This kind of instructionally illuminating testing, sometimes
referred to as formative assessment, is a particularly valuable tool today,
when there’s so much to fit into each school year. The time saved in an
easily mastered unit can be time applied to other material that students
have unexpected difficulty with. Flexible, en route test-guided instruc-
tional scheduling can allow your students to move on to fascinating
application activities or delve more deeply into other content.
ch1.qxd 7/10/2003 9:53 AM Page 12
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T h e L i n k s B e t w e e n T e s t i n g a n d T e a c h i n g 1 3
Using Tests to Determine the Effectiveness of Instruction
The issue of how well a teacher has taught is becoming more and
more critical as the educational accountability movement places
teachers under ever-intensifying scrutiny. The folks who are demand-
ing evidence that teachers are doing a solid instructional job are look-
ing for hard evidence that proves instructional effectiveness.
This is such a serious and complicated assessment-related issue
that I’ve devoted three chapters in this book to it. Chapter 9 explores
how not to evaluate a teacher’s effectiveness; Chapters 10 and 11 tell
how to go about it properly. But, because finding out how effective
your own instruction is should be important to you, I need to address
some important assessment-related topics first.
These days, many teachers’ instructional competence is being
determined on the basis of a single achievement test administered to
students each spring. For instance, a 4th grade teacher’s students
complete a state-approved standardized achievement test in May, and
then the test results of this year’s 4th graders are compared with the
test results of last year’s 4th graders. If this year’s scores are better than
last year’s scores, the 4th grade teacher is thought to be doing a good
instructional job . . . and vice versa.
But this sort of teacher-appraisal model flunks on several counts.
For one thing, it relies on the wrong kind of measurement tool, as
you’ll learn when you get to Chapter 9. And there’s another, more
obvious shortcoming in these year-to-year comparison models. The
problem is that each year’s testing takes place with a different group of
students, and the results depend on the collection of kids being com-
pared. If your students last year were an atypical collection of gifted
girls and boys and this year’s crop is a more typical mix, then you can
expect your year-to-year results to show a decline, regardless of your
abilities as an instructor.
The simple little model of pre-assessment and postassessment
comparison displayed in Figure 1.2 is the most fundamental way
teachers can judge their own teaching skill. A pretest gets a fix on
ch1.qxd 7/10/2003 9:53 AM Page 13
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R1 4
students’ status before instruction (at the start of school year, say) and
a post-test measures the same students’ status again, once instruction
is complete (at the end of the school year).
As you can see from the figure, the heart of this evaluative model
is students’ test performance. Although a teacher’s overall perform-
ance should be determined using a variety of evaluative considera-
tions, not just students’ test data, one overridingly important factor
should be how well the teacher’s students have learned what they
were supposed to learn. A pretest/post-test evaluative approach (using
some refinements that you’ll read about in Chapter 11) can con-
tribute meaningfully to how teachers determine their own instruc-
tional impact.
Okay, we’ve considered four ways in which testing—the tests
themselves and the student results they produce—can help a teacher
make better instructional decisions. The rest of this book will provide
you with sufficient information about these and other ways of using
assessment in your own classes to make your instructional decisions
more defensible.
1 . 2
A COMMON MODEL FOR DETERMINING
INSTRUMENTAL IMPACT
ch1.qxd 7/10/2003 9:53 AM Page 14
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T h e L i n k s B e t w e e n T e s t i n g a n d T e a c h i n g 1 5
Recommended Resources
Falk, B. (2000). The heart of the matter: Using standards and assessment to learn.
Westport, CT: Heinemann.
Popham, W. J. (Program Consultant). (1996). Improving instruction through
classroom assessment [Videotape]. Los Angeles: IOX Assessment Associates.
Popham, W. J. (2001). The truth about testing: An educator’s call to action:
Alexandria, VA: Association for Supervision and Curriculum Development.
Popham, W. J. (Program Consultant). (2002). Educational tests: Misunderstood
measuring sticks [Videotape]. Los Angeles: IOX Assessment Associates.
Ramirez, A. (1999, November). Assessment-driven reform: The emperor still
has no clothes. Phi Delta Kappan, 81(3), 204–208.
Shepard, L. A. (2000, October). The role of assessment in a learning culture.
Educational Researcher, 29(7), 4–14.
Sirotnik, K. A., & Kimball, K. (1999, November). Standards for standards-
based accountability systems. Phi Delta Kappan, 81(3), 209–214.
Stiggins, R. J. (Program Consultant). (1996). Creating sound classroom assess-
ments [Videotape]. Portland, OR: Assessment Training Institute.
Stiggins, R. J. (2001). Student-involved classroom assessment (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Wiggins, G., Stiggins, R. J., Moses, M., & LeMahieu, P. (Program Consultants).
(1991). Redesigning assessment: Introduction [Videotape]. Alexandria, VA:
Association for Supervision and Curriculum Development.
INSTRUCTIONALLY FOCUSED TESTING TIPS
• Recognize that students’ overt responses to educational tests
allow teachers to make inferences about students’ covert status.
• Use tests to exemplify—and, thus, clarify—fuzzy statements of
curricular aims.
• Pre-assess any new group of students to identify those students’
entry status. Also pre-assess students when they’ll be encounter-
ing new skills and knowledge to be learned.
• Use test results to determine how much instruction on a given
topic your students need.
• Include the data generated by educational tests in evaluations
of your own instructional effectiveness.
ch1.qxd 7/10/2003 9:53 AM Page 15
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:16:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
TEACHERS WHO TRULY UNDERSTAND WHAT THEY WANT THEIR STUDENTS TO
accomplish will almost surely be more instructionally successful than
teachers whose understanding of hoped-for student accomplish-
ments are murky. In this chapter, I’ll elaborate on how educational
tests can help teachers get a clearer idea of the direction in which
their instruction ought to be headed.
When I talk about curriculum and instruction, I use both terms in
a fairly traditional fashion. By “curriculum,” I mean the outcomes
that educators hope to achieve with their students. The three most
common kinds of outcomes sought are students’ acquisition of cogni-
tive skills (such as being able to multiply pairs of triple-digit numbers);
their acquisition of bodies of knowledge (such as understanding
chlorophyll’s role in photosynthesis); and their affect (such as partic-
ular attitudes, interests, or values). When I use the word “instruc-
tion,” I am thinking of the activities that educators carry out in order
to help their students attain those intended outcomes. As Figure 2.1
illustrates, curriculum and instruction can be understood as educa-
tional ends and educational means.
Although teachers often make their own decisions about which
educational ends they want their students to achieve, in most
2
How Tests
Can Clarify
the Curriculum
1 6
ch2.qxd 7/10/2003 9:54 AM Page 16
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
H o w T e s t s C a n C l a r i f y t h e C u r r i c u l u m 1 7
instances these days, higher authorities (district or state officials) stip-
ulate what those ends ought to be. Still, it’s almost always up to
teachers to devise the instructional activities (the means) they believe
will achieve those particular ends. For example, a high school English
teacher trying to get students to become skilled writers of persuasive
essays (a state-approved curricular outcome) might plan a four-week
unit giving students plenty of guided and independent practice com-
posing such essays. Other instructional tactics might include provid-
ing solid models of well-written persuasive essays for students to
review, having students critique each other’s draft essays, and arrang-
ing for a motivational visit from an editorial writer at the local news-
paper. Together, all these activities constitute the teacher’s instruction
and, thus, represent the teacher’s best judgment about the means that
are necessary to accomplish a particular curricular end.
As indicated earlier, a teacher who is clear about what the curric-
ular ends are will almost always come up with more successful
instructional means than will a teacher who has only a muddle-
2 . 1 THE RELATIONSHIP BETWEEN CURRICULUM AND INSTRUCTION
ch2.qxd 7/10/2003 9:54 AM Page 17
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R1 8
minded notion of what a curricular end really represents. And that’s
precisely where educational assessments can help. Tests can enable
teachers to get a more meaningful fix on what it really is that a cur-
ricular aim (a content standard, goal, objective, or intended outcome)
seeks from students. Some teachers may think it odd that they can
glean valuable information not just from test results, but from an
analysis of test design and from designing tests themselves. But that’s
exactly what properly constructed tests can do, well in advance of
their actual administration.
Three Instructional Payoffs
What, then, are the real-world instructional dividends of clarifying
curricular aims via tests? There are three of them, and they are huge.
More accurate task analyses. When you have a better understand-
ing of where your final destination is, you can more sensibly identify
any necessary knowledge or skills that students will either need to
possess already or that you must teach them along the way. The more
accurately you can single out any key enabling subskills or enabling
knowledge that your students must possess, the greater the odds that
you will address those enabling subskills or knowledge in your
instruction.
Clearer explanations. Have you ever tried to explain something to
someone when you yourself were less than 100 percent clear about
what you were yammering about? I have, and I always knew deep
down that my explanations were more than a mite muddy. Test-
induced clarity about what a given content standard really means will
help you to supply your students with more coherent, more accurate
explanations.
More appropriate practice activities. Practice exercises, both guided
practice and independent practice, are terrifically important, espe-
cially if you are seeking high-level curricular outcomes. In my own
teaching experience, whenever my instruction turned out poorly, it
was almost always because I had failed to give my students sufficient
ch2.qxd 7/10/2003 9:54 AM Page 18
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
H o w T e s t s C a n C l a r i f y t h e C u r r i c u l u m 1 9
practice activities, that is, sufficient time-on-task exercises. Both expe-
rience and empirical research indicate that time-on-task exercises,
accompanied by immediate feedback for students, are a key ingredi-
ent in stellar instruction. If practice activities miss the mark because
of a teacher’s misunderstanding about what an educational objective
truly represents, then students won’t learn as well. If you use tests to
exemplify educational objectives, you’ll understand those objectives
better and be able to incorporate on-the-mark practice activities into
your lessons.
To sum up, teachers who clarify their curricular aims by using
tests to illustrate what those aims really signify will be able to devise
more effective instructional means to promote students’ mastery of
test-clarified curricular ends. In concert, these three instructional
advantages—improved task analyses, clearer explanations, and more on-
target practice activities—are genuinely potent.
How Assessments Can Clarify Curricular Aims
Recall that educational tests are employed to secure overt (visible) evi-
dence from students so that teachers can make an inference about the
covert (unseen) status of their students. Collecting data to support
wise inferences is the essence of educational measurement.
As a school-world reality, however, many of the curricular aims
that teachers encounter come in the forms of content standards
stated so generally that it’s tough to get an accurate sense of what the
curricular aim really signifies. That’s certainly true for many of today’s
content standards. If you study any set of state-approved or district-
approved content standards, you’ll almost always find a few that will
force you to wonder aloud about the nature of the curricular aim
that’s being sought. I still remember my all-time favorite opaque cur-
ricular aim, a state-approved language arts objective: “Students will
learn to relish literature.” I’m sure the educators who devised this cur-
ricular aim had some sort of idea about what they wanted students to
be able to do at the completion of instruction, but as written, I
ch2.qxd 7/10/2003 9:54 AM Page 19
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R2 0
daresay most teachers would be more than a little unclear about what
this chopped-pickle objective really represents. (Incidentally, I have
yet to encounter an objective calling for students “to mayonnaise
mathematics.” But there’s still time!)
To cope with overly general curricular aims, educators can take a
lesson from the tactics of experimental psychologists who, because
they frequently deal with covert and elusive human attributes (things
like a person’s “motivation,” “anxiety,” or “happiness”) have come
up with a strategy for clarifying covert variables. The psychologist
first describes an overt behavior, such as a person’s tendency to
persevere unproductively when attempting to solve a problem or per-
form a task. Then the psychologist chooses an overt operation to
represent these covert “perseveration tendencies.” This might be the
number of times a person continues to press a lever after a lever-press
fails to yield the silver dollars previously awarded by each and every
lever-press. The psychologist signifies that the number of unrewarded
lever-presses a person makes represents that individual’s persevera-
tion tendencies. The more unrewarded lever-presses the person
makes, the stronger that person’s tendency to perseverate.
What the experimental psychologist does is operationalize the
meaning of a covert variable by setting up an operation to yield a
quantifiable result that, in effect, serves as a proxy for the variable
being studied. Educators do much the same thing when they use a
student’s performance on a 25-item spelling test (an overt operation)
to represent a student’s spelling ability (a covert variable).
Because most curricular aims are stated quite generally, there are
different ways a teacher might operationalize any given aim. To illus-
trate, let’s consider a common language arts goal related to a student’s
reading comprehension. Suppose you were trying to measure your
students’ ability to identify main ideas in paragraphs. You might
operationalize this ill-defined skill by asking your students to read a
series of paragraphs (each with a reasonably discernible main idea)
and then asking them to do one or more of the following things:
ch2.qxd 7/10/2003 9:54 AM Page 20
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
H o w T e s t s C a n C l a r i f y t h e C u r r i c u l u m 2 1
• Write an accurate statement of any paragraph’s main
idea.
• Select from four-option, multiple-choice items the best state-
ment of a given paragraph’s main idea.
• Orally, provide a reasonable paraphrase of any paragraph’s main
idea.
Notice that each of these ways of operationalizing a student’s
ability to discern a paragraph’s main idea revolves around the stu-
dent’s chief cognitive task in main-idea detection: ferreting out the
paragraph’s central message, whether implied or explicit. What you
must remember as you consider how this or any other curricular aim
has been operationalized by a test is that the specific testing approach
being used is often only one way of operationalizing the covert
knowledge, skills, or affect staked out in the form of a curricular
aim.
Nevertheless, each of these operationalizations is slightly different
and, as a consequence, might lead you toward somewhat different
sorts of instructional plans.
Dissimilar Assessments, Dissimilar Instruction
To illustrate how it is that different ways of assessing students’ cur-
ricular mastery will often lead to different instructional approaches,
suppose you are a 4th grade teacher charged with promoting your
students’ mastery of a state-stipulated language arts goal in reading
described only as “increasing students’ ability to comprehend the
main ideas in paragraphs.”
As we just saw, there are numerous ways that this fairly loose cur-
ricular aim might be operationalized. Let’s say you discover that a
newly developed statewide achievement test for 4th graders attempts
to measure this skill exclusively by designating particular paragraphs
in somewhat lengthy “authentic” passages, and then requiring stu-
dents to select from four multiple-choice options the best statement of
a designated paragraph’s main idea. Based on this operationalization,
what kind of instruction would help students achieve this objective?
ch2.qxd 7/10/2003 9:54 AM Page 21
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R2 2
First, you’d definitely want your lessons to incorporate plenty of
practice time dealing with the intellectual skills students need to mas-
ter this sort of multiple-choice test item. You would need to help your
students make the kinds of discriminations they would need to make
between “best” and “less than best” options in a multiple-choice test
item. You might have your students read a paragraph and then lead
an exploration into why it is that certain statements of the para-
graph’s main idea are better or weaker than others. You’d also want
to make sure your students understand that there is rarely only one
way to state a given paragraph’s main idea. Knowing this, they will
more readily see that they could display their understanding of a
paragraph’s main idea (as operationalized by the state’s test) by being
able to compare the merits of alternative statements of that para-
graph’s central message.
But let’s say that you don’t think the statewide test demands
enough from your 4th grade students. Let’s say you believe that a 4th
grader should be able to read a paragraph, perhaps one of several in an
extended passage, and then figure out “from scratch” the main idea of
any paragraph that happens to have one. And yes, you’d surely want
to teach your students that not every paragraph has a main idea!
At any rate, although the state has operationalized this language
arts aim in one way (maybe because of the cost-savings to be gained
from a large-scale test that uses multiple-choice items), you decide to
operationalize this main-idea reading skill in a different way, a way
that suits your own instructional preferences. You decide to have
your students read a paragraph, then write a single sentence that sat-
isfactorily reflects the paragraph’s central message. In fact, you decide
to develop your own test of students’ main-idea skills and administer
it at both mid-year and near the close of the school year. Your test
presents students with several multiparagraph passages, asks students
to read each passage in its entirety, and then, for each paragraph that
you designate, to write one well-worded sentence that suitably cap-
tures each designated paragraph’s main idea. If any designated
ch2.qxd 7/10/2003 9:54 AM Page 22
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
H o w T e s t s C a n C l a r i f y t h e C u r r i c u l u m 2 3
paragraph does not have a main idea, your students must say so in
writing.
Think about how this different way of operationalizing the cur-
ricular aim would require different instructional activities. For
instance, you’d need to help your 4th graders analyze paragraphs to
determine if there is a main idea. Then, for paragraphs containing a
main idea, you’d need to provide guidance and practice related to iso-
lating the paragraph’s central message from its supporting details so
that students could learn how to produce pithy paraphrases. If you
had used the state’s multiple-choice test as your guide, these obvi-
ously sensible approaches might not have occurred to you—and
might not have seemed (or been) necessary for your students to
demonstrate “mastery.”
What we learn from this example is that the same curricular aim,
when operationalized differently, often requires dissimilar instruc-
tional approaches. For an additional illustration, take a look at Figure
2.2, which shows a typical 7th grade mathematics benchmark, lists
two different ways to assess the benchmark’s mastery, and outlines the
contrasting instructional activities that might proceed from each of
these two testing tactics. The two assessment approaches shown in this
figure are, of course, fairly simplistic, but I hope they illustrate that
one’s teaching tactics ought to flow directly from one’s testing tactics.
Seeking Students’ Generalizable Mastery
You may already have picked up on a potential pitfall associated with
test-operationalized curricular goals: the possibility that teachers
might end up teaching their students to master only one kind of test-
operationalized curricular aim. This kind of tunnel-vision instruc-
tional approach would indeed seriously shortchange students.
Remember, a test represents one way of getting evidence that you
can use to make an inference about a student’s covert status with
respect to a variable like the student’s skill in comprehending the
main ideas in paragraphs. Each assessment approach that’s chosen to
ch2.qxd 7/10/2003 9:54 AM Page 23
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R2 4
operationalize a curricular aim carries with it implications for instruc-
tional design. Typically, tests requiring students to answer selected-
response items (items in which students select from two or more
already-present choices) will require meaningfully different instruc-
tional approaches than will tests requiring students to answer con-
structed-response items (items in which students must generate their
2 . 2
ONE BENCHMARK, TWO TESTING TACTICS,
AND THEIR INSTRUCTIONAL IMPLICATIONS
7th Grade Mathematics Benchmark: Students will be able to employ appro-
priate measures of central tendency (mean, median, and mode) when describing
sets of numerical data.
Testing Tactic One
Given a brief written or oral description of a fictitious but realistic setting in which
the central tendency of numerical data must be described, the student will write a
one-word answer indicating whether a mean, median, or mode is the most appro-
priate way to represent the data’s central tendency.
Instructional Implications: Students must first be taught the reasons that
each measure of central tendency would be preferred over its two counterparts for
describing particular sets of numerical data. Then students should be supplied with
ample guided and independent practice in (1) reading and listening to descrip-
tions of lifelike situations involving numerical data and (2) supplying the name of
the best measure of central tendency for describing such data.
● ● ● ● ●
Testing Tactic Two
Students will be asked (1) to write a brief description for each of the following
types of central-tendency measures: mean, median, and mode, then (2) supply one
written “real-world” example, of the student’s own choosing, in which each of
these measures has been properly used.
Instructional Implications: As this testing tactic calls for students to restate a
memorized definition of the mean, median, and mode, students would need to
be provided with accurate definitions so that those definitions can be memorized.
Then, because students will be asked to come up with an appropriate real-world
example for each measure of central tendency, students should practice generating
their own examples, which the teacher or other students could check for accuracy.
ch2.qxd 7/10/2003 9:54 AM Page 24
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
H o w T e s t s C a n C l a r i f y t h e C u r r i c u l u m 2 5
answers). Any teacher trying to have students display main-idea mas-
tery in the form of both selected-response (the state’s multiple-choice
test) and constructed-response tests (the teacher’s “generate main
ideas from scratch” test) will need to take two dissimilar instructional
approaches.
As a teacher, what you first need to figure out for any curricular
aim is just what sort of cognitive demand will be imposed on students
by any test that’s being used to measure mastery of that curricular
aim. A cognitive demand describes the intellectual activities in which
a student must engage in order to be successful in dealing with the
tasks contained in the particular kind of assessment. The critical ques-
tion you need to ask yourself is this: “For students to truly master this
curricular aim, what must be going on inside their heads?”
If you ask that question after considering different types of tests
that could be used to measure a given content standard, you’ll often
find that the different types of tests impose different cognitive
demands on students. We saw this to be true for the fourth-grade
teacher who chose to measure students’ main idea skills in a manner
other than that used in a statewide test. Different assessments in that
instance called for different instructional activities.
Clearly, teachers want to teach their students to master important
cognitive skills in such a way that students’ skill-mastery is generaliz-
able. A student who possesses generalizable skill-mastery will be able
to apply that mastery in all sorts of in-school and out-of-school set-
tings. There is a real virtue, then, in employing diverse kinds of
assessments. In other words, try to measure a student’s skill-mastery
in several ways, some requiring the student to generate a correct
answer and some requiring the student to select a correct answer
from an array of presented options. Make sure that some of your con-
structed-response tasks call for written responses and others call for
oral responses. Use a kind of “mix and match” approach to educa-
tional testing so that you can get an accurate fix on your students’
generalizable mastery. The more diverse the assessment techniques
ch2.qxd 7/10/2003 9:54 AM Page 25
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R2 6
that you adopt, the stronger the inference they will give you about
the essence of the cognitive demands those varied tests are imposing
on your students. Accordingly, the instruction you design will incor-
porate a variety of different explanations, different kinds of model-
ing, and different sorts of practice exercises.
Take another look at the three ways of measuring main idea com-
prehension identified on page 21. If I were promoting my own stu-
dents’ main-idea mastery, I’d initially consider all three assessment
tactics to help me get a better handle on the true cognitive demand
of this skill. Then I’d incorporate all three assessment-grounded
instructional approaches into my teaching, obliging my students to
show me the various ways that they could pull out the main idea
from any main-idea–containing paragraph they happened to
encounter. This is how generalized skill-mastery is born.
Of course, there are some skills for which it is difficult to come up
with genuinely diverse assessment tactics. For instance, if I wanted
my students to be able to write accurate direction-giving essays in
order to show a friend how to walk from Place X to Place Y, my chief
approach to test-based operationalizing of this skill would revolve
around a written test of students’ direction-giving skill—some form
of a written direction-giving essay. Multiple-choice items, matching
items, or True-False items simply wouldn’t fill the bill in this instance,
either to help me better understand the skill’s essence or as practice
exercises to help my students become more skilled writers of direc-
tion-giving essays.
Most of the time, though, you will find that it is possible to come
up with substantively different ways of measuring a student’s mastery
of individual content standards. Those diverse assessment tactics will
not only help you better understand what each content standard is
really seeking, but will also provide you with instructional cues about
how best to get your students to master each content standard in a
generalizable manner.
ch2.qxd 7/10/2003 9:54 AM Page 26
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
H o w T e s t s C a n C l a r i f y t h e C u r r i c u l u m 2 7
Teaching Toward Test-Represented Targets,
Not Toward Tests
As you’ve seen, educational tests can be used to clarify a curricular
outcome by exemplifying a way of using students’ overt perform-
ances to make inferences about students’ covert status regarding
sought-for knowledge, skills, or affect. Thus, a test functions as a rep-
resentation of the educational variable to be sought instructionally.
Let me say it again: A test is only a representation, meaning that
teachers must aim their instruction not at the tests, but toward the skill,
knowledge, or affect that those tests represent. This is such an impor-
tant point that I dare not race forward without pounding my measure-
ment tom-tom for just a moment more. Far too many folks in educa-
tion, often those who know the least about testing, ascribe far too much
importance to educational tests. Rather than seeing tests as a means
to a fairly accurate fix on an unseen educational variable, these people
believe that tests actually are the educational variable being promoted.
In short, they reify educational tests as the true target of their instruc-
tional efforts. The idea that tests are instructional targets, wrongheaded
as it is, is unfortunately bolstered by many high-stakes statewide
testing programs. Preoccupation with test scores becomes so profound
that many teachers and administrators mistakenly succumb to the
belief that increased test scores are appropriate educational targets.
They’re not.
Educators must recognize that tests are sometimes fallible tools
and that the only legitimate application of these tools is to help us
determine a student’s status with respect to a significant educational
variable. When this point is clear, most teachers will try to aim their
instruction toward what a test represents rather than toward the test
itself. Diverse types of tests will usually incline teachers to do just
this.
ch2.qxd 7/10/2003 9:54 AM Page 27
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R2 8
Recommended Resources
Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning,
teaching, and assessing: A revision of Bloom’s taxonomy of educational objec-
tives. New York: Longman.
Jacobs, H. H. (Program Consultant). (1991). Curriculum mapping: Charting the
course for content [Videotape]. Alexandria, VA: Association for Supervision
and Curriculum Development.
Kendall, J. S., & Marzano, R. J. (2000). Content knowledge: A compendium of
standards and benchmarks for K–12 education (3rd ed.). Alexandria, VA:
Association for Supervision and Curriculum Development; and Aurora,
CO: McREL.
Ohanian, S. (1999). One size fits few: The folly of educational standards.
Portsmouth, NH: Heinemann.
Popham, W. J. (Program Consultant). (2000). Test preparation: The wrong
way/right way [Videotape]. Los Angeles: IOX Assessment Associates.
INSTRUCTIONALLY FOCUSED TESTING TIPS
• Clarify the nature of any curricular aim by analyzing the assess-
ments intended to measure students’ attainment of that aim.
• Clarify your own understanding of a curricular aim by consider-
ing the various ways that students’ achievement of that aim
might be assessed.
• Promote your students’ mastery of any important curricular aim
by employing diverse assessment approaches to measure that
aim.
• Teach toward the skills or knowledge a test represents, not
toward the test itself.
ch2.qxd 7/10/2003 9:54 AM Page 28
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:18:11.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
2 9
TEACHERS WANT TO FIND OUT WHAT’S GOING ON INSIDE STUDENTS’ HEADS. PUT-
ting it a bit more formally, teachers want to make inferences about
the nature of their students’ skills, knowledge, and affect. And that’s
the precise point at which the integral relationship between curricu-
lum and assessment takes center stage. You see, testing doesn’t take
place in a vacuum. More specifically, testing doesn’t take place in a
curriculum-free vacuum. Typically, teachers test students to get a fix
on their students’ status with respect to those cognitive or affective
targets that have been laid out in a prescribed curriculum. More often
than not, therefore, teachers find themselves trying to “test what’s in
the curriculum.”
As a result, most educators today also end up trying to solve an
essentially insolvable problem: how to satisfactorily assess students’
mastery of too many curricular aims imposed “from above.” There’s
an oft-uttered saying that you can’t make a silk purse out of a sow’s
ear. The gist of the saying’s message, of course, is that if you are given
unsuitable raw material, it’s impossible to transform that raw material
into something spectacular. When teachers are informed that they
should promote their students’ mastery of a huge number of state-
stipulated or district-stipulated curricular goals, many teachers actu-
ally try to do so. Unfortunately, that’s almost always a mistake.
3
Too Many
Testing Targets
ch3.qxd 7/10/2003 9:55 AM Page 29
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R3 0
Assessment and curriculum are not a pair of isolated enclaves,
each functioning happily without any reference to the other.
Teachers ought to test what students ought to learn and, yes, what’s
set forth in a curriculum should be the dominant determiner of the
sorts of classroom tests that teachers give to their students. But too
many educators reckon that assessment programs can blossom irre-
spective of the curricula from which they spring. State-level testing
does not occur apart from the test-developers’ attention to state cur-
ricula. And, just as surely, classroom-level testing should not occur
apart from a teacher’s attention to curricular considerations. Thus, if
teachers are dealing with a sow’s-ear curriculum, they’ll never be able
to create silk-purse assessments. It’s important, therefore, to know
what you can do if you find yourself forced to cope with a curriculum
prescription that contains too many pills in its bottle. In this chapter,
I’ll look at the origins of the problem and then offer my recommen-
dations on how to solve it in your own classroom.
Standards-Based Reform
A half-century ago, almost every U.S. state set out an officially
approved curriculum for its schools. This authorized curriculum,
sometimes called a “state curriculum syllabus,” identified the knowl-
edge and skills that the state’s teachers were supposed to promote at
different grade levels (or grade ranges) and in most subjects. Frankly,
few educators paid much attention to those curricular documents.
State curricular syllabi nestled quietly in teachers’ desk drawers, rarely
to be opened.
In the 1990s, many educational policymakers began to proclaim
the virtues of standards-based reform as a new strategy to reform pub-
lic schools. Here’s how it was intended to work. First, a group of expe-
rienced teachers and curriculum specialists in each state met to stake
out a set of challenging content standards for each grade level, or for
grade ranges, in all key subjects. (In most instances, these collections
of content standards were dead ringers for yesteryear’s curricular
ch3.qxd 7/10/2003 9:55 AM Page 30
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T o o M a n y T e s t i n g T a r g e t s 3 1
syllabi.) Then, a statewide test was either selected or constructed to
assess students’ mastery of those content standards. Finally, students’
test performances were employed to identify the schools (and teach-
ers) whose students were achieving high scores or low scores. High-
scoring schools were regarded as winners; low-scoring schools were
regarded as losers.
The underlying strategy of standards-based reforms rested solidly
on the proposition that by administering standards-based tests annu-
ally to students at selected grade levels, the state’s teachers would be
spurred to effectively promote students’ content-standard mastery (as
reflected by higher test scores). Those higher test scores would, of
course, signify improved student mastery of the state’s challenging
con
tent standards.
Unfortunately, there turned out to be several significant short-
comings in the way that the educational officials in most states tried
to implement standards-based reform. For one thing, typically, the
quality of the alignment between state-approved content standards
and state-approved achievement tests has been very weak. Rather
than devote scarce educational funds to the development of cus-
tomized tests based exclusively on a state’s curriculum, many states
simply chose to use an off-the-shelf standardized achievement test.
Such tests were selected despite the fact that many of the state’s
approved content standards appeared to be addressed by only one or
two items in those off-the-shelf tests. And, worse, in some instances
there were state content standards that weren’t measured by even one
item in those already-built standardized achievement tests.
Rarely, however, do even custom-built statewide standards-based
tests include sufficient numbers of items to provide evidence of a stu-
dent’s status with respect to the mastery of particular content stan-
dards. That’s a serious shortcoming, because unless teachers, students,
and parents can determine which content standards are being mas-
tered and which ones are not, then standards-based reform is doomed
to be little more than an attractive but meaningless bit of rhetoric.
ch3.qxd 7/10/2003 9:55 AM Page 31
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R3 2
You see, if we don’t know which content standards have been suc-
cessfully promoted, then teachers can’t tell whether particular parts of
their instructional programs have been effective or ineffective. But
there’s an equally serious deficiency with most of today’s standards-
based reform efforts, and that problem lies in the curricular corner-
stones of this entire reform strategy: the content standards themselves.
Typically, the people who carve out the content standards for any
given state are a group of experienced educators, perhaps 20–30 such
individuals, who have been selected for the task because of their
expertise in the subject areas for which the content standards are
being identified. For instance, the Content Standards Committee of
25 educators in State X who have been directed to identify the knowl-
edge and skills in science that State X students in grades 4–8 should
master is typically composed of science-education specialists.
Specialists, as we all know, tend to love their specialties. That’s the
strength and the weakness of the way that most states have identified
their content standards. When subject-matter specialists are asked to
isolate what students need to know about their specialty, the special-
ists’ response will almost always be “Everything!” In other words,
because the people who take part in the identification of content
standards typically revere their subject fields, they often identify all
the knowledge and all the skills that they wish well-taught children
at a given age would possess.
The net effect of this kind of wish-list curricular thinking is that,
in most states, the number of content standards officially approved
for most subjects is enormous. These wish-list content standards sim-
ply aren’t teachable in the instructional time available to the class-
room teachers. It’s equally unrealistic to think that these litanies of
content standards can be satisfactorily assessed by an annual state-
wide test of 50–60 items per subject area. In most instances, only a
small proportion of these lengthy lists of state-approved content
standards are measured at all, even by one puny item.
ch3.qxd 7/10/2003 9:55 AM Page 32
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T o o M a n y T e s t i n g T a r g e t s 3 3
Think of the teachers in states where wish-list content standards
abound. You can be certain that there is nothing even approximating
total measurement of these standards. Unquestionably, therefore,
these teachers will face some absurd choices. The educational-reform
strategy in place has likely attracted substantial political support in
high places. (Politicians typically revel in rallying behind the “pro-
motion of high standards.”) Moreover, most citizens joyfully applaud
the merits of any accountability strategy aimed at promoting stu-
dents’ mastery of “demanding” content standards. And yet, the two
most important components of this accountability strategy typically
malfunction. The content standards are far too numerous to teach,
and far, far too numerous to test. The culminating high-stakes tests
used to assess students’ mastery of those standards don’t actually
measure those standards and, worse, don’t help teachers make defen-
sible instructional decisions.
Coping with Unrealistic Curricular Demands
Teachers who are being asked to promote their students’ mastery of
an unwieldy array of state-sanctioned content standards, and to do so
without receiving test results that reveal students’ standard-by-
standard status, are in an untenable instructional situation. But even
though such expectations may be unrealistic, those expectations are
still very real. And there are adverse consequences for teachers who
fall short. How, then, should teachers cope?
My advice is to select a coping strategy, given your available alter-
natives, that is most likely to benefit your students. One popular but
unwise response to a rambling, excessive set of standards-based
expectations—an aiming-for-a-silk-purse response that most certainly
won’t help students—is to try to “cover” all the state-stipulated stan-
dards. Shallow coverage, almost by definition, will rarely lead to deep
understanding, especially when teachers must race on to tomorrow’s
content standards before students have the chance to master today’s.
ch3.qxd 7/10/2003 9:55 AM Page 33
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R3 4
It would be far better, of course, if a state’s policymakers would
realize that they’ve given teachers too much to teach (and test) in the
time available, and, after some serious curricular soul-searching, they
would attach estimates of importance to their too-long sets of con-
tent standards. Thereafter, the state’s teachers could determine which
curricular outcomes were really the most imperative to foster. What’s
more, a more modest number of super-significant content standards
would make it possible to create standards-based tests that could
report students’ results on a standard-by-standard basis, thus provid-
ing the explicit standards-mastery information that teachers could
use to refine or revise their instructional approaches.
However, many teachers will find that their own state’s standards-
based reform strategy has not been improved along these lines. So,
given this sort of unhappy situation, my best advice to teachers is to
prioritize for yourself. Either on your own or (even better) in collegial
teams, review the complete set of content standards for which you
are responsible, and set your own priorities.
Here’s how I think the prioritizing game should be played. First,
consider every single content standard on the state- or district-
approved list for the subject area or areas in which you teach. For each
content standard in your subject, assign one of three ratings:
• Essential. It is absolutely necessary for my students to have mas-
tered
this content standard by the end of my instruction.
• Highly desirable. It very important for my students to have mas-
tered this content standard by the end of my instruction.
• Desirable. If possible, I would like my students to have mastered
this content standard by the end of my instruction.
After you’ve rated all the content standards, then you need to rank
only the ones that you already rated as “essential.” Rank all of those
essential content standards from most important (#1 on your list) to
next most important (#2), and so on. Then, design your instruction
ch3.qxd 7/10/2003 9:55 AM Page 34
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T o o M a n y T e s t i n g T a r g e t s 3 5
so that you can do a thorough and effective job of promoting your
students’ mastery of as many of the highest prioritized content stan-
dards as you can promote in the instructional time you have avail-
able. Naturally, you’d like to have your students learn as many good
things as they possibly can, so you’ll also want to give at least some
instructional attention to those content standards you regarded as
“highly desirable” and “desirable.” But any such instructional atten-
tion need not be as extensive as the instructional attention you
devote to your highest-priority essential content standards.
Remember, what you’re trying to do is to get your students to master
the knowledge and skills that, in your professional judgment, are the
“very most important.”
And what about assessment? Well, although we’ve seen that total
measurement of innumerable content standards is impossible, espe-
cially if one is trying to get a fix on students’ per-standard mastery, it
is definitely possible to assess a more limited number of high-priority
content standards. And that’s just what you need to do.
In the last chapter, I pointed out how a somewhat vague content
standard can be operationalized by the assessment (or, preferably, by
the multiple assessments) employed to measure mastery of that vague
content standard. Consistent with that advice, what you need to do
is come up with suitable assessments for each of the high-priority con-
tent standards you’ve decided to tackle instructionally. The assess-
ments you choose will typically lead you to better instructional deci-
sions because of the increased clarity that those “operationalizing”
assessments will provide. Moreover, if you use pre-assessments to
determine what your students already know and can do, you’ll be
better able to choose suitable instructional activities for those stu-
dents. At the end of your instruction, of course, you’ll be able to get
a test-based estimate of how well your students have mastered each
high-priority content standard that you’ve set out to pursue.
Additionally, if you use any sort of less-formal en route assessments,
perhaps you’ll find that your students have mastered some content
ch3.qxd 7/10/2003 9:55 AM Page 35
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R3 6
standards more rapidly than you’d anticipated, thus enabling you to
move on to your next high-priority standard and, ultimately, give
instructional attention to other important standards that did not
make your initial cut.
Counteracting the Myth of Total Measurement
It is impossible, in real schools, to assess students’ attainment of all
the lovely curricular aims we’d like those students to accomplish.
“Total measurement” is a myth. Moreover, the pretense that a
lengthy list of content standards can somehow be accurately meas-
ured breeds a corollary myth of “total instruction.” It is just as silly to
think that teachers can effectively promote students’ mastery of
seemingly endless content standards as it is to think that all those
content standards can be properly assessed.
Prioritizing offers a reasonable way out of this instructional
dilemma. (To my mind, it is the only way out of this bind.) Teachers
need to prioritize a set of content standards so they can identify the
content standards at which they will devote powerful, thoroughgoing
instruction, and then they need to formally and systematically assess
students’ mastery of only those high-priority content standards. Time
permitting, instructional attention to other, lower-priority content
standards could certainly be worked in. And other, less systematic
assessments could also be used to gauge students’ mastery of those
same lower-priority content standards.
Some might object to the prioritization strategy I have recom-
mended here because it seems to suggest that a teacher should be
doing less with students—should be focusing on fewer curricular aims.
To such criticism, I reply that there’s a whopping big difference
between content standards that are simply sought and content stan-
dards that are truly taught. Indeed, when teachers unthinkingly set
out to seek student mastery of so many curricular outcomes, what is
sought will, in fact, rarely be taught. The more ground that teachers
try to cover, the less likely it is they’ll be able to help students gain
ch3.qxd 7/10/2003 9:55 AM Page 36
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T o o M a n y T e s t i n g T a r g e t s 3 7
deep understandings and facility with complex skills. That’s what
“challenging” content standards call for, and that’s what everyone
really wants.
I’m suggesting that teachers, and the administrators they report
to, need to be honest about what’s instructionally accomplishable in
a finite amount of teaching time. And that kind of honesty leads me
to encourage you to do a serious prioritizing job on any curricular
aims that have been officially tossed into your lap. If the pressure to
“cover all content standards” is really severe, then you may wish to
give really rapid-fire treatment to the whole set of content standards,
but give your genuine instructional and assessment attention to only
the curricular aims you believe to be truly of the highest significance.
My proposed prioritization strategy was initially spawned in
response to ill-conceived, state reform strategies built around exces-
sively lengthy lists of content standards. Please recognize, however,
that this prioritization strategy can work just as effectively when
teachers are trying to figure out how best to organize their own
instruction and to evaluate its effectiveness. After all, even if a teacher
had no official state-imposed content standards to wrestle with, that
teacher would still have to decide what to emphasize instructionally
and what to emphasize from an assessment perspective. There’s lots
of stuff to teach, and there’s lots of stuff to test. Teachers will do a bet-
ter job for their students if they devote serious effort to identifying
the most significant things that can be taught in the time available.
Then those high-priority curricular aims can be well taught and well
tested. A prioritization strategy is clearly predicated on a less-is-more
approach to teaching and testing. And it works!
A Labyrinth of Labels
Now I want to change my focus for just a bit. In this chapter, I’ve tried
to tackle a curricular problem that is causing today’s teachers a heap
of instructional and assessment headaches. The essence of my mes-
sage is that excessive curricular aspirations can make instructionally
ch3.qxd 7/10/2003 9:55 AM Page 37
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
sensible assessment impossible. Thus, when teachers are tussling with
their own testing issues, it’s rare that curriculum considerations aren’t
involved.
And because that’s so, I want to close out the chapter with a brief
treatment of several curriculum-related labels that today’s classroom
teachers are almost sure to encounter. As you’ll see, teachers who are
baffled by the meaning of these labels will rarely be able to make the
best curricular or assessment decisions.
In the previous chapter, we saw that there are lots of ways to
describe curricular aims: goals, objectives, expectancies, benchmarks,
learning outcomes, and so on. Today’s most popular label is content
standards. In the No Child Left Behind Act, curricular aims are described
as academic content standards. Typically, classroom teachers pursue con-
tent standards stipulated by state or district authorities, although many
teachers sometimes add on a few noncompulsory content standards
that they personally regard as worthwhile.
There is one strong possibility of semantic confusion these days,
however, and that stems from the use of another descriptive label:
performance standards.* A performance standard refers to the required
level of proficiency students are expected to display when they have
mastered a content standard. For example, if a content standard calls
for a student to write a coherent paragraph, a performance standard
indicates how well that paragraph must be written. Or, if a content
standard focuses on the student’s acquisition of knowledge—say,
knowing the definitions of 50 biological terms—a performance stan-
dard describes how many of the 50 terms a student would need to
know in order to have “mastered” the content standard. Thirty-five
terms? Forty terms? Forty-five?
T E S T B E T T E R , T E A C H B E T T E R3 8
*The authors of the No Child Left Behind Act used the term academic achievement standards
rather than “performance standards.” As the nation’s educators become more familiar with
NCLB requirements, this phrase may become more widely used. However, for the time being,
I’m going to stick with the more commonly employed terminology, namely, performance
standards.
ch3.qxd 7/10/2003 9:55 AM Page 38
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T o o M a n y T e s t i n g T a r g e t s 3 9
A content standard without an accompanying performance stan-
dard is like a play without a final act. You can’t tell what the content
standard really signifies until you know how students’ mastery of it
will be assessed and, thereafter, until you know what sort of per-
formance standard has been set on that assessment. Even the loftiest
content standard can be transformed into a trifling expectation if the
bar is set very low. Likewise, a trivial content standard can become a
challenging, praiseworthy expectation when it’s accompanied by a
super-stringent performance standard. For instance, if elementary
school students were to be able to name the capitals of all 50 states
(the content standard), you could set a mandatory 100 percent cor-
rect level (the performance standard) for proficient performance. The
result would be a tough task for students, even though the task deals
with a rather trivial curricular aim.
Increasingly, U.S. educators are building performance standards
along the lines of the descriptive categories used in the National
Assessment Educational Progress (NAEP), a test administered periodi-
cally under the auspices of the federal government. NAEP results per-
mit students’ performances in participating states to be compared, so
that it can be seen which states’ students outperform which other
states’ students. (Given Americans’ abiding preoccupation with win-
ners and losers, it’s surprising that no one has thought to report NAEP
results in newspapers’ sports sections.) At any rate, since 1990, NAEP
results have been described in four performance categories: advanced,
proficient, basic, and below basic. Most of the 50 states now use those
four categories or labels quite similar to them. For example, if stu-
dents were taking a statewide examination consisting of 65 multiple-
choice items, the performance standards for that test could be set by
deciding how many of the 65 items must be answered correctly for a
student to be classified as advanced, how many items for proficient,
and so on.
My point is that performance standards are malleable, and you
never know what something like “basic” means until you read the
ch3.qxd 7/10/2003 9:55 AM Page 39
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R4 0
fine-print description of that level of performance. For example, the
No Child Left Behind Act calls for states to establish at least three
levels of academic achievement standards (advanced, proficient, and
basic) and to demonstrate, over time, state-decreed increases in the
proportion of students deemed “proficient” or above. Because the
thrust of NCLB is to get all students to be proficient or advanced,
the legislation describes the basic level as “the progress of lower-
achieving children toward mastering the proficient and advanced
levels of achievement.” However, each state is allowed to define “pro-
ficient” in its own way. And because there are significant negative
sanctions for schools that fail to get enough students to score at the
proficient levels on NCLB tests, in some states there have been
remarkably lenient levels of “proficiency” established.
Clearly, because the two kinds of standards (content and per-
formance) have such distinctive purposes, teachers need to avoid the
confusion that’s likely to arise when one simply refers to “standards”
without a preceding adjective. I encourage you to avoid talking about
“high standards” or “challenging standards” because, as you’ve just
seen, if we really want our students to accomplish worthwhile things,
we’ll need high-level content standards accompanied by high-level
per
formance standards.
INSTRUCTIONALLY FOCUSED TESTING TIPS
• Learn which content standards are required by your state or
district.
• Determine if those content standards are accompanied by per-
formance standards.
• Prioritize any officially sanctioned content standards.
• Systematically assess only the high-priority content standards.
• Give major instructional attention to only the high-priority con-
tent standards.
ch3.qxd 7/10/2003 9:55 AM Page 40
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T o o M a n y T e s t i n g T a r g e t s 4 1
Recommended Resources
Falk, B. (2000). The heart of the matter: Using standards and assessment to learn.
Westport, CT: Heinemann.
Jacobs, H. H. (Program Consultant). (1991). Curriculum mapping: Charting the
course for content [Videotape]. Alexandria, VA: Association for Supervision
and Curriculum Development.
Kohn, A. (1999). The schools our children deserve: Moving beyond traditional
classrooms and “tougher standards.” Port Chester, NY: National Professional
Resources, Inc.
Kohn, A. (Program Consultant). (2000). Beyond the standards movement:
Defending quality education in an age of test scores [Videotape]. Port Chester,
NY: National Professional Resources, Inc.
Linn, R. L. (2000, March). Assessments and accountability. Educational
Researcher, 29(2), 4–16.
ch3.qxd 7/10/2003 9:55 AM Page 41
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:37:00.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
FEW IF ANY SIGNIFICANT ASSESSMENT CONCEPTS ARE COMPLETELY UNRELATED TO
a teacher’s instructional decision making. To prove it, in this chapter
I’m going to trot out three of the most important measurement
ideas—validity, reliability, and assessment bias—and then show you
how each of them bears directly on the instructional choices that
teachers must make.
These three measurement concepts are just about as important as
measurement concepts can get. Although teachers need not be meas-
urement experts, basic assessment literacy is really a professional obli-
gation, and this chapter will unpack some key terminology and clar-
ify what you really need to know.
It is impossible to be assessment literate without possessing at
least a rudimentary understanding of validity, reliability, and assess-
ment bias. They are the foundation for trustworthy inferences. As
teachers, we can’t guarantee that any test-based inference we make is
accurate, but a basic understanding of validity, reliability, and assess-
ment bias increases the odds in our favor. And the more we can trust
our test-based inferences, the better our insight into students and the
better our test-based decisions. Also, in this age of accountability, the
fallout from invalid inferences could be invalid conclusions by
4
Validity,
Reliability,
and Bias
4 2
ch4.qxd 7/10/2003 9:56 AM Page 42
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 4 3
administrators, by parents, and by politicians that you and your
school are doing a sub-par job.
Validity
At the very apex of all measurement concepts is the notion of valid-
ity. Indeed, the concept of validity almost always finds its way into
any conversation about educational testing. Well, in the next few
paragraphs you’ll learn that there is no such thing as a valid test.
The Validity of Inferences
The reason that there’s no such thing as a valid test is quite straight-
forward: It’s not the test itself that can be valid or invalid but, rather,
the inference that’s based on a student’s test performance. Is the score-
based inference that a teacher has made a valid one? Or, in contrast,
has the teacher made a score-based inference that’s invalid? All valid-
ity analysis should center on the test-based inference, rather than on
the test itself. Let’s see how this inference-making process works and,
thereafter, how educators can determine if their own score-based
inferences are valid or not.
You’ll remember from Chapter 1 that educators use educational
tests to secure overt evidence about covert variables, such as a stu-
dent’s ability to spell, read, or perform arithmetic operations. Well,
even though teachers can look at students’ overt test scores, they’re
still obliged to come up with the interpretation about what those test
scores mean. If the interpretation is accurate, we say that the teacher
has arrived at a valid test-based inference. If the interpretation is inac-
curate, then the teacher’s test-based inference is invalid.
You might be wondering, why is this author making such a big
fuss about whether it’s the test or the test-based inference that’s valid?
Well, if a test can be labeled as valid or invalid, then it is surely true
that assessment accuracy resides in the test itself. By this logic, that
test would yield unerringly accurate information no matter how it’s
used or to whom it’s administered. For instance, let’s say a group of
ch4.qxd 7/10/2003 9:56 AM Page 43
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R4 4
Ethiopian test-developers have created a brand new science test for
children. Because the test was developed by Ethiopians in Ethiopia,
the test items and text passages are in Amharic, the official language
of that nation. Now, if that test were administered to Ethiopian chil-
dren, the scores would probably yield valid inferences about these
children’s science skills and knowledge. But if it were administered to
English-speaking children in a Kansas school district, any test-based
inferences about those children’s science skills and knowledge would
be altogether inaccurate. The test itself would yield accurate interpre-
tations in one setting with one group of test-takers, yet inaccurate
interpretations in another setting, with another group of test-takers.
It is the score-based inference that is accurate in Ethiopia, but inaccu-
rate in Kansas. It is not the test.
The same risk of invalid inferences applies in any testing situation
where factors interfere with test-takers’ ability to demonstrate what
they know and can do. For example, consider a 14-year-old gifted
writer recently arrived from El Salvador who cannot express herself in
English; a trigonometry student with cerebral palsy who cannot con-
trol a pencil well enough to draw sine curves; a 6th grader who falls
asleep 5 minutes into the test and only completes 10 of the 50 test
items. Even superlative tests, when used in these circumstances or
other settings where extraneous factors can diminish the accuracy of
score-based inferences, will often lead to mistaken interpretations. If
you recognize that educational tests do not possess some sort of
inherent accurate or inaccurate essence, then you will more likely
realize that assessment validity rests on human judgment about the
inferences derived from students’ test performances. And human
judgment is sometimes faulty.
The task of measurement experts who deal professionally with
assessment validity, then, is to assemble evidence that a particular
score-based inference made in a particular context is valid. With tests
used in large-scale assessments (such as a statewide, high school grad-
uation test), the experts’ quest is to assemble a collection of evidence
ch4.qxd 7/10/2003 9:56 AM Page 44
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 4 5
that will support the validity of the test-based inferences that are
drawn from the test. Rarely does a single “validity study” supply suf-
ficiently compelling evidence regarding the validity of any test-based
inference. In most cases, to determine whether any given type of
score-based inference is on the mark, it is necessary to consider the
collection of validity evidence in a number of studies.
Three Varieties of Validity Evidence
There are three kinds of validity evidence sanctioned by the relevant
professional organizations: (1) criterion-related evidence, (2) construct-
related evidence, and (3) content-related evidence. Each type, usually col-
lected via some sort of investigation or analytic effort, contributes to
the conclusion that a test is yielding data that will support valid infer-
ences. Typically, such investigations are funded by test-makers before
they bring their off-the-shelf product to the market. In other in-
stances, validity studies are required by state authorities prior to test-
selection or prior to “launching” a test they’ve commissioned. We’ll
take a brief peek at each evidence type, paying particular attention to
the one that should most concern a classroom teacher.
Before we do that, though, I need to call your attention to an
important distinction to keep in mind when dealing with educa-
tional tests: the difference between an achievement test and an aptitude
test. An achievement test is intended to measure the skills and know-
ledge that a student currently possesses in a particular subject area.
For instance, the social studies test in the Metropolitan Achievement
Tests is intended to supply an idea of students’ social studies skills
and knowledge. Classroom tests that teachers construct to see how
much their students have learned are other examples of achievement
tests. In contrast, an aptitude test is intended to help predict a stu-
dent’s future performance, typically in a subsequent academic set-
ting. The best examples of this test type are the widely used ACT and
SAT, which are supposed to predict how well high school students
will perform when they get to college.
ch4.qxd 7/10/2003 9:56 AM Page 45
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R4 6
As you’ll read in Chapter 9, there are many times when these two
supposedly different kinds of educational tests actually function in an
almost identical manner. Nevertheless, it’s an important terminology
difference that you need to know, and it comes into particular focus
regarding the first type of validity evidence, which we’re going to
examine right now.
Criterion-related evidence of validity. This kind of validity concerns
whether aptitude tests really predict what they were intended to pre-
dict. Investigators directing a study to collect criterion-related evi-
dence of validity simply administer the aptitude test to high school
students and then follow those students during their college careers
to see if the predictions based on the aptitude test were accurately
predictive of the criterion. In the case of the SAT or ACT, the criterion
would be those students’ college grade-point averages. If the rela-
tionship between the predictive test scores and the college grades is
strong, then this finding constitutes criterion-related evidence sup-
porting the validity of score-based inferences about high school stu-
dents’ probable academic success in college.
Clearly, classroom teachers do not have the time to collect
criterion-referenced evidence of validity, especially about aptitude
tests. That’s a task better left to assessment specialists. Few teachers I
know have ever been mildly tempted to create an aptitude test, much
less collect criterion-related validity evidence regarding that test.
Construct-related evidence of validity. Most measurement specialists
regard construct-related evidence as the most comprehensive form of
validity evidence because, in a sense, it covers all forms of validity
evidence. To explain what construct-related evidence of validity is, I
need to provide a short description of how it is collected.
The first step is identifying some sort of hypothetical construct,
another term for the covert variable sought. As we’ve learned, this
can be something as exotic as a person’s “depression potential” or as
straightforward as a student’s skill in written composition. Next,
based on the understanding of the nature of the hypothetical
ch4.qxd 7/10/2003 9:56 AM Page 46
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 4 7
construct identified, a test to measure that construct is developed,
and a study is designed to help determine whether the test does, in
fact, measure the construct. The form of such studies can vary sub-
stantially, but they are all intended to supply empirical evidence that
the test is behaving in the way it is supposed to behave.
Here’s what I hope is a helpful example. One kind of construct-
related validity study is called a differential population investigation.
The validity investigator identifies two groups of people who are
almost certain to differ in the degree to which they possess the con-
struct being measured. For instance, let’s say a new test is being inves-
tigated that deals with college students’ mathematical skills. The
investigator locates 25 college math majors and another 25 college
students who haven’t taken a math course since 9th grade. The inves-
tigator then predicts that when all 50 students take the new test, the
math whizzes will blow away the math nonwhizzes. The 50 students
take the test, and that’s just the way things turn out. The investiga-
tor’s prediction has been confirmed by empirical evidence, so this
represents construct-related evidence that, yes, the new test really
does measure college students’ math skills.
There are a host of other approaches to the collection of
construct-related validity evidence, and some are quite exotic. As a
practical matter, though, busy classroom teachers don’t have time to
be carrying out such investigations. Still, I hope it’s clear that because
all attempts to collect validity evidence really do relate to the exis-
tence of an unseen educational variable, most measurement special-
ists believe that it’s accurate to characterize every validity study as
some form of a construct-related validity study.
Content-related evidence of validity. This third kind of validity evi-
dence is also (finally) the kind that classroom teachers might wish to
collect. Briefly put, this form of evidence tries to establish that a test’s
items satisfactorily reflect the content the test is supposed to repre-
sent. And the chief method of carrying out such content-related
validity studies is to rely on human judgment.
ch4.qxd 7/10/2003 9:56 AM Page 47
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R4 8
I’ll start with an example that shows how to collect content-
related evidence of validity for a district-built or school-built test.
Let’s say that the English teachers in a high school are trying to build
a new test to assess students’ mastery of four language arts content
standards—the four standards from their state-approved set that they,
as a group, have decided are the most important. They decide to
assess students’ mastery of each of these content standards through a
32-item test, with 8 items devoted to each of the 4 standards.
The team of English teachers creates a draft version of their new
test and then sets off in pursuit of content-related evidence of the
draft test’s validity. They assemble a review panel of a dozen individ-
uals, half of them English teachers from another school and the other
half parents who majored or minored in English in college. The
review panel meets for a couple of hours on a Saturday morning, and
its members make individual judgments about each of the 32 items,
all of which have been designated as primarily assessing one of the 4
content standards. The judgments required of the review panel all
revolve around the four content standards that the items are suppos-
edly measuring. For example, all panelists might be asked to review
the content standards and then respond to the following question for
each of the 32 test items:
Will a student’s response to this item help teachers determine
whether a student has mastered the designated content standard
that the item is intended to assess?
Yes No Uncertain
After the eight items linked to a particular content standard have
been judged by members of the review panel, the panelists might be
asked to make the following sort of judgment:
Considering the complete set of eight items intended to measure
this designated content standard, indicate how accurately you
ch4.qxd 7/10/2003 9:56 AM Page 48
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 4 9
believe a teacher will be able to judge a student’s content-standard
mastery based on the students’ response to these eight items.
Very Accurately Somewhat Accurately
Not Too Accurately
It’s true that the process I’ve just described represents a consider-
able effort. That’s okay for state-developed, district-developed, or
school-developed tests . . . but how might individual teachers go
about collecting content-related evidence of validity for their own,
individually created classroom tests? First, I’d suggest doing so only
for very important tests, such as midterm or final exams. Collecting
content-related evidence of validity does take time, and it can be a
ton of trouble, so do it judiciously. Second, you can get by with a
small number of validity judges, perhaps a colleague or two or a par-
ent or two. Remember that the essence of the judgments you’ll be
asking others to make revolves around whether your tests satisfacto-
rily represent the content they are supposed to represent. The more
“representative” your tests, the more likely it is that any of your test-
based inferences will be valid.
Ideally, of course, individual teachers should construct classroom
tests from a content-related validity perspective. What this means, in
a practical manner, is that teachers who develop their own tests
should be continually attentive to the curricular aims each test is
supposed to represent. It’s advisable to keep a list of the relevant
objectives on hand and create specific items to address those objec-
tives. If teachers think seriously about the content-representativeness
of the tests they are building, those tests are more likely to yield valid
score-based inferences.
Figure 4.1 presents a summing-up graphic depiction of how stu-
dents’ mastery of a content standard is measured by a test . . . and the
three kinds of validity evidence that can be collected to support the
validity of a test-based inference about students’ mastery of the con-
tent standard.
ch4.qxd 7/10/2003 9:56 AM Page 49
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R5 0
In contrast to measurement specialists, who are often called on to
collect all three varieties of validity evidence (especially for important
tests), classroom teachers rarely collect any validity evidence at all.
But it is possible for teachers to collect content-related evidence of
validity, the type most useful in the classroom, without a Herculean
effort. When it comes to the most important of your classroom
exams, it might be worthwhile for you to do so.
Validity in the Classroom
Now, how does the concept of assessment validity relate to a teacher’s
instructional decisions? Well, it’s pretty clear that if you come up
with incorrect conclusions about your students’ status regarding
important educational variables, including their mastery of particular
content standards, you’ll be more likely to make unsound instruc-
tional decisions. The better fix you get on your students’ status
4 . 1
HOW A TEST-BASED INFERENCE IS MADE AND
HOW THE VALIDITY OF THIS INFERENCE IS SUPPORTED
ch4.qxd 7/10/2003 9:56 AM Page 50
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 5 1
(especially regarding such unseen variables as their cognitive skills),
the more defensible your instructional decisions will be. Valid
assessment-based inferences about students don’t always translate
into brilliant instructional decisions; however, invalid assessment-
based inferences about students almost always lead to dim-witted or,
at best, misguided instructional decisions.
Let me illustrate the sorts of unsound instructional decisions that
teachers can make when they base those decisions on invalid test-
based inferences. Unfortunately, I can readily draw on my own class-
room experience to do so. When I first began teaching in that small,
eastern Oregon high school, one of my assigned classes was senior
English. As I thought about that class in the summer months leading
up to my first salaried teaching position, I concluded that I wanted
those 12th grade English students to leave my class “being good writ-
ers.” In other words, I wanted to make sure that if any of my students
(whom I’d not yet met) went on to college, they’d be able to write
decent essays,
reports, and so on.
Well, when I taught that English course, I was pretty pleased with
my students’ progress in being able to write. As the year went by, they
were doing better and better on the exams I employed to judge their
writing skills. My instructional decisions seemed to be sensible,
because the 32 students in my English class were scoring well on my
exams. The only trouble was . . . all of my exams contained only
multiple-choice items about the mechanics of writing. With suitable
shame, I now confess that I never assessed my students’ writing skills by
asking them to write anything! What a cluck.
I never altered my instruction, not even a little, because I used my
students’ scores on multiple-choice tests to arrive at an inference that
“they were learning how to write.” Based on my students’ measured
progress, my instruction was pretty spiffy and didn’t need to be
changed. An invalid test-based inference had led me to an unsound
instructional decision, namely, providing mechanics-only writing
instruction. If I had known back then how to garner content-related
ch4.qxd 7/10/2003 9:56 AM Page 51
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R5 2
evidence of validity for my multiple-choice tests about the mechan-
ics of writing, I’d probably have figured out that my selected-response
exams weren’t truly able to help me reach reasonable inferences
about my students’ constructed-response abilities to write essays,
reports, and so on.
You will surely not be as much of an assessment illiterate as I was
during my first few years of teaching, but you do need to look at the
content of your exams to see if they can contribute to the kinds of
inferences about your students that you truly need to make.
Reliability
Reliability is another much-talked-about measurement concept. It’s a
major concern to the developers of large-scale tests, who usually
devote substantial energy to its calculation. A test’s reliability refers to
its consistency. In fact, if you were never to utter the word “reliability”
again, preferring to employ “consistency” instead, you could still live
a rich and satisfying life.
Three Kinds of Reliability
As was true with validity, assessment reliability also comes in three
flavors. However, the three ways of thinking about an assessment
instrument’s consistency are really quite strikingly different, and it’s
important that educators understand the distinctions.
Stability reliability. This first kind of reliability concerns the con-
sistency with which a test measures something over time. For
instance, if students took a standardized achievement test on the first
day of a month and, without any intervening instruction regarding
what the test measured, took the same test again at the end of the
month, would students’ scores be about the same?
Alternate-form reliability. The crux of this second kind of reliability
is fairly evident from its name. If there are two supposedly equivalent
forms of a test, do those two forms actually yield student scores that
are pretty similar? If Jamal scored well on Form A, will Jamal also
ch4.qxd 7/10/2003 9:56 AM Page 52
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 5 3
score well on Form B? Clearly, alternate-form reliability only comes
into play when there are two or more forms of a test that are sup-
posed to be doing the same assessment job.
Internal consistency reliability. This third kind of reliability focuses
on the consistency of the items within a test. Do all of the test’s items
appear to be doing the same kind of measurement job? For internal
consistency reliability to make much sense, of course, a test should be
aimed at a single overall variable—for instance, a student’s reading
comprehension. If a language arts test tried to simultaneously meas-
ure a student’s reading comprehension, spelling ability, and punctu-
ation skills, then it wouldn’t make much sense to see if the test’s
items were functioning in a similar manner. After all, because three
distinct things are being measured, the test’s items shouldn’t be func-
tioning in a homogeneous manner.
Do you see how the three forms of reliability, although all related
to aspects of a test’s consistency, are conceptually dissimilar? From
now on, if you’re ever told that a significant educational test has
“high reliability,” you are permitted to ask, ever so suavely, “What
kind or what kinds of reliability are you talking about?” (Most often,
you’ll find that the test developers have computed some type of
internal consistency reliability, because it’s possible to calculate such
reliability coefficients on the basis of only one test administration.
Both of the other two kinds of reliability require at least two test
administrations.) It’s also important to note that reliability in one
area does not ensure reliability in another. Don’t assume, for exam-
ple, that a test with high internal consistency reliability will auto-
matically yield stable scores over time. It just isn’t so.
Reliability in the Classroom
So, what sorts of reliability evidence should classroom teachers col-
lect for their own tests? My answer may surprise you. I don’t think
teachers need to assemble any kind of reliability evidence. There’s just
too little payoff for the effort involved.
ch4.qxd 7/10/2003 9:56 AM Page 53
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R5 4
I do believe that classroom teachers ought to understand test reli-
ability, especially that it comes in three quite distinctive forms and
that one type of reliability definitely isn’t equivalent to another. And
this is the point at which even reliability has a relationship to instruc-
tion, although I must confess it’s not a strong relationship. Suppose
your students are taking important external tests—say, statewide
achievement tests assessing their mastery of state-sanctioned content
standards. Well, if the tests are truly important, why not find out
something about their technical characteristics? Is there evidence of
assessment reliability presented? If so, what kind or what kinds of
reliability evidence?
If the statewide test’s reliability evidence is skimpy, then you
should be uneasy about the test’s quality. Here’s why: An unreliable
test will rarely yield scores from which valid inferences can be drawn.
I’ll illustrate this point with an example of stability evidence of
reliability. Suppose your students took an important exam on
Tuesday morning, and there was a fire in the school counselor’s
office on Tuesday afternoon. Your student’s exam papers went up in
smoke. A week later, you re-administer the big test, only to learn
soon after that the original exam papers were saved from the flames
by an intrepid school custodian. When you compare the two sets of
scores on the very same exam administered a week apart, you are
surprised to find that your students’ scores seem to bounce all over
the place. For example, Billy scored high on the first test, yet scored
low on the second. Tristan scored low on the first test, but soared on
the second.
Based on the variable test scores, have these students mastered
the material or haven’t they? Would it be appropriate or inappropri-
ate to send Billy on to more challenging work? Does Tristan need
extra guided practice, or is she ready for independent application? If
a test is unreliable—inconsistent—how can it contribute to accurate
score-based inferences and sound instructional decisions? Answer: It
can’t.
ch4.qxd 7/10/2003 9:56 AM Page 54
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 5 5
Assessment Bias
Bias is something that everyone understands to be a bad thing. Bias
beclouds one’s judgment. Conversely, the absence of bias is a good
thing; it permits better judgment. Assessment bias is one species of
bias and, as you might have already guessed, it’s something educators
need to identify and eliminate, both for moral reasons and in the
interest of promoting better, more defensible instructional decisions.
Let’s see how to go about doing that for large-scale tests and for the
tests that teachers cook up for their own students.
The Nature of Assessment Bias
Assessment bias occurs whenever test items offend or unfairly penal-
ize students for reasons related to students’ personal characteristics,
such as their race, gender, ethnicity, religion, or socioeconomic sta-
tus. Notice that there are two elements in this definition of assess-
ment bias. A test can be biased if it offends students or if it unfairly
penalizes students because of students’ personal characteristics.
An example of a test item that would offend students might be
one in which a person, clearly identifiable as a member of a particu-
lar ethnic group, is described in the item itself as displaying patently
unintelligent behavior. Students from that same ethnic group might
(with good reason) be upset at the test item’s implication that persons
of their ethnicity are not all that bright. And that sort of upset often
leads the offended students to perform less well than would other-
wise be the case. Another example of offensive test items would arise
if females were always depicted in a test’s items as being employed in
low-level, undemanding jobs whereas males were always depicted as
holding high-paying, executive positions. Girls taking a test com-
posed of such sexist items might (again, quite properly) be annoyed
and, hence, might perform less well than they would have otherwise.
Turning to unfair penalization, think about a series of mathemat-
ical items set in a context of how to score a football game. If girls par-
ticipate less frequently in football and watch, in general, fewer
ch4.qxd 7/10/2003 9:56 AM Page 55
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R5 6
football games on television, then girls are likely to have more diffi-
culty in answering math items that revolve around how many points
are awarded when a team “scores a safety” or “kicks extra points
rather than running or passing for extra points.” These sorts of math-
ematics items simply shimmer with gender bias.
Not all penalties are unfair, of course. If students don’t study
properly and then perform poorly on a teacher’s test, such penalties
are richly deserved. What’s more, the inference a teacher would
make, based on these low scores, would be a valid one: These students
haven’t mastered the material! But if students’ personal characteris-
tics, such as their socioeconomic status (SES), are a determining fac-
tor in weaker test performance, then assessment bias has clearly
raised its unattractive head. It distorts the accuracy of students’ test
performances and invariably leads to invalid inferences about stu-
dents’ status and to unsound instructional decisions about how best
to teach those students.
Bias Detection in Large-Scale Assessments
There was a time, not too long ago, when the creators of large-scale
tests (such as nationally standardized achievement tests) didn’t do a
particularly respectable job of identifying and excising biased items
from their tests. I taught educational measurement courses in the
UCLA Graduate School of Education for many years, and in the late
1970s, one of the nationally standardized achievement tests I rou-
tinely had my students critique employed a bias-detection procedure
that would be considered laughable today. Even back then, it made
me smirk a bit.
Here’s how the test-developers’ bias-detection procedure worked.
A three-person bias review committee was asked to review all of the
items of a test under development. Two of the reviewers “represented”
minority groups. If all three reviewers considered an item to be biased,
the item was eliminated from the test. Otherwise, that item stayed on
the test. If the two minority-representing reviewers considered an
ch4.qxd 7/10/2003 9:56 AM Page 56
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 5 7
item biased beyond belief, but the third reviewer disagreed, the item
stayed. Today, such a superficial bias-review process would be recog-
nized as absurd.
Developers of large-scale tests now employ far more rigorous bias-
detection procedures, especially with respect to bias based on race
and gender. A typical approach these days calls for the creation of a
bias-review committee, usually of 15–25 members, almost all of
whom are themselves members of minority groups. Bias-review com-
mittees are typically given ample training and practice in how best to
render their item-bias judgments. For each item that might end up
being included in the test, every member of the bias-review commit-
tee would be asked to respond to the following question:
Might this item offend or unfairly penalize students because of
such personal characteristics as gender, ethnicity, religion, or
socioeconomic status?
Yes No Uncertain
Note that the above question asks reviewers to judge whether a stu-
dent might be penalized, not whether the item would (for absolutely
certain) penalize a student. This kind of phrasing makes it clear that
the test’s developers are going the extra mile to eliminate any items
that might possibly offend or unfairly penalize students because of per-
sonal characteristics. If the review question asked whether an item
“would” offend or unfairly penalize students, you can bet that fewer
items would be reported as biased.
If a certain percentage of bias-reviewers believe the item might be
biased, it is eliminated from those to be used in the test. Clearly, the
determination of what proportion of reviewers is needed to delete an
item on potential-bias grounds represents an important issue. With
the three-person bias-review committee my 1970s students read
about at UCLA, the percentage of reviewers necessary for an item’s
elimination was 100 percent. In recent years I have taken part in
ch4.qxd 7/10/2003 9:56 AM Page 57
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
T E S T B E T T E R , T E A C H B E T T E R5 8
bias-review procedures where an item would eliminated if more than
five percent of the reviewers rated the item as biased. Times have def-
initely changed.
My experiences suggest that the developers of most of our current
large-scale tests have been attentive to the potential presence of
assessment bias based on students’ gender or membership in a minor-
ity racial or ethnic group. (I still think that today’s major educational
tests feature far too many items that are biased on SES grounds. We’ll
discuss this issue in Chapter 9.) There may be a few items that have
slipped by the rigorous item-review process but, in general, those test
developers get an A for effort. In many instances, such assiduous
attention to gender and minority-group bias eradication stems
directly from the fear that any adverse results of their tests might be
challenged in court. The threat of litigation often proves to be a
potent stimulant!
Assessment Bias in the Classroom
Unfortunately, there’s much more bias present in teachers’ classroom
tests than most teachers imagine. The reason is not that today’s class-
room teachers deliberately set out to offend or unfairly penalize cer-
tain of their students; it’s just that too few teachers have systemati-
cally attended to this issue.
The key to unbiasing tests is a simple matter of serious, item-by-
item scrutiny. The same kind of item-review question that bias-
reviewers typically use in their appraisal of large-scale assessments will
work for classroom tests, too. A teacher who is instructing students
from racial/ethnic groups other than the teacher’s own racial/ethnic
group might be wise to ask a colleague (or a parent) from those
racial/ethnic groups to serve as a one-person bias review committee.
This can be very illuminating. Happily, most biased items can be
repaired with only modest effort. Those that can’t should be tossed.
This whole bias-detection business is about being fair to all
students and assessing them in such a way that they are accurately
ch4.qxd 7/10/2003 9:56 AM Page 58
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.
V a l i d i t y , R e l i a b i l i t y , a n d B i a s 5 9
measured. That’s the only way a teacher’s test-based inferences will be
valid. And, of course, valid inferences about students serve as the
foundation for defensible instructional decisions. Invalid inferences
don’t.
Recommended Resources
American Educational Research Association. (1999). Standards for educational
and psychological testing. Washington, DC: Author.
McNeil, L. M. (2000, June). Creating new inequalities: Contradictions of
reform. Phi Delta Kappan, 81(10), 728–734.
Popham, W. J. (2000). Modern educational measurement: Practical guidelines for
educational leaders (3rd ed.). Boston: Allyn & Bacon.
Popham, W. J. (Program Consultant). (2000). Norm- and criterion-referenced
testing: What assessment-literate educators should know [Videotape]. Los
Angeles: IOX Assessment Associates.
Popham, W. J. (Program Consultant). (2000). Standardized achievement tests:
How to tell what they measure [Videotape]. Los Angeles: IOX Assessment
Associates.
INSTRUCTIONALLY FOCUSED TESTING TIPS
• Recognize that validity refers to a test-based inference, not to
the test itself.
• Understand that there are three kinds of validity evidence, all of
which can contribute to the confidence teachers have in the
accuracy of a test-based inference about students.
• Assemble content-related evidence of validity for your most
important classroom
tests.
• Know that there are three related, but meaningfully different
kinds of reliability evidence that can be collected for educational
tests.
• Give serious attention to the detection and elimination of
assessment bias in classroom tests.
ch4.qxd 7/10/2003 9:56 AM Page 59
Popham, W. James. Test Better, Teach Better : The Instructional Role of Assessment, Association for Supervision &
Curriculum Development, 2003. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/amridge/detail.action?docID=5704436.
Created from amridge on 2022-01-13 03:38:36.
C
o
p
yr
ig
h
t
©
2
0
0
3
.
A
ss
o
ci
a
tio
n
f
o
r
S
u
p
e
rv
is
io
n
&
C
u
rr
ic
u
lu
m
D
e
ve
lo
p
m
e
n
t.
A
ll
ri
g
h
ts
r
e
se
rv
e
d
.