1. What is the general question or issue that motivated the study?
2. What is the specific question or hypothesis that the experiment is designed to address? (How does the general question lead to the specific one?)
3. What is the method of the experiment(s)—the materials and procedure? If multiple experiments were done, why were they done?
4. Identify the dependent and independent variables, and identify any confounds that might flaw the results (if any).
5. What are the results ( in a general way ).
6. If you had to design a follow-up study, what might you test next (and how)?
Psychonomic Bulletin & Review
1999, 6 (4), 641-646
A song’s identity is specified by its pitch and rhythmic
structure. Accordingly, these structures have been the
primary focus of psychological research on music (e.g.,
Jones & Yee, 1993; Krumhansl, 1990). Songs are a par-
ticularly interesting domain of study because their iden-
tity is determined from abstracted information about re-
lations between tones, rather than from the tones’ absolute
characteristics. For example, the frequency (pitch) of the
initial tone of “Happy Birthday” can be selected arbitrar-
ily, but the song will retain its identity if the relations (in-
tervals) between tones are preserved. Hence, regardless
of whether a song is sung with a high or a low voice, it is
recognizable if its intervallic structure is maintained. Dif-
ferences in tone durations (rhythm) work similarly. Songs
can be sung fast or slow and still be recognized (within
limits; see Warren, Gardner, Brubaker, & Bashford, 1991),
if the durational differences between consecutive tones
maintain the correct ratios.
By contrast, the sound quality of musical instruments
(timbre) is irrelevant to a song’s identity. “Happy Birthday”
is recognizable regardless of whether it is played on a
trombone or a piano. Timbre is typically defined by what
it is not: characteristics of sounds other than pitch, dura-
tion, or amplitude (see, e.g., Dowling & Harwood, 1986;
Hajda, Kendall, Carterette, & Harshberger, 1997). Whereas
these parameters can be measured on ordinal scales, tim-
bre is multidimensional and diff icult to def ine (Hajda
et al., 1997). Nonetheless, we know that listeners’ per-
ception of timbre is a function of static attributes of tones,
such as the steady state frequency distribution of har-
monics, and of dynamic or time-varying attributes, such
as changes in harmonics at tone onsets (see, e.g., Grey,
1977; Iverson & Krumhansl, 1993; McAdams, Wins-
berg, Donnadieu, De Soete, & Krimphoff, 1995; Pitt &
Crowder, 1992).
Although a song’s identity is defined by relational in-
formation, this does not preclude the possibility that ab-
solute information about pitch, tempo, or timbre is also
stored in auditory memory. Absolute attributes of voices
(e.g., pitch and timbre) are irrelevant to a word’s identity,
yet talker identity is stored in episodic memory for words
(Nygaard & Pisoni, 1998; Nygaard, Sommers, & Pisoni,
1994; Palmeri, Goldinger, & Pisoni, 1993). In the exper-
iments conducted by Pisoni and his colleagues, partici-
pants typically heard a list of words spoken by different
talkers and were asked to identify words that had been
presented previously in the list. Consistent with the prin-
ciple of encoding specificity (Tulving & Thomson, 1973),
recognition was best if the same talker said the word both
times, but relatively poor when the repeated word was said
by a different talker. Voice recognition may be somewhat
unique, however, in that listeners appear to rely on differ-
ent cues for different speakers; for example, some famous
voices are recognized equally well when they are pre-
sented backward or forward, presumably because listeners
641 Copyright 1999 Psychonomic Society, Inc.
Funding for this research was provided by a grant awarded to the first
author from the Natural Sciences and Engineering Research Council of
Canada. We thank Dennis Phillips for extensive discussions about all
aspects of the study, Susan Hall for her assistance in preparing Figure 1,
and Andrea Halpern, Dan Levitin, John Wixted, and an anonymous re-
viewer for their insightful comments on earlier versions of the manuscript.
Correspondence concerning this article should be addressed to E. G.
Schellenberg, Department of Psychology, University of Toronto at Mis-
sissauga, Mississauga, ON, L5L 1C6, Canada (e-mail: g.schellenberg@
utoronto.ca).
Name that tune:
Identifying popular recordings from brief excerpts
E. GLENN SCHELLENBERG
University of Toronto, Mississauga, Ontario, Canada
PAUL IVERSON
University of Washington, Seattle, Washington
and
MARGARET C. MCKINNON
University of Toronto, Mississauga, Ontario, Canada
We tested listeners’ ability to identify brief excerpts from popular recordings. Listeners were required
to match 200- or 100-msec excerpts with the song titles and artists. Performance was well above chance
levels for 200-msec excerpts and poorer but still better than chance for 100-msec excerpts. Perfor-
mance fell to chance levels when dynamic (time-varying) information was disrupted by playing the
100-msec excerpts backward and when high-frequency information was omitted from the 100-msec
excerpts; performance was unaffected by the removal of low-frequency information. In sum, success-
ful identification required the presence of dynamic, high-frequency spectral information.
642 SCHELLENBERG, IVERSON, AND MCKINNON
are using cues other than those based on dynamic spectral
information (Van Lancker, Kreiman, & Emmorey, 1985).
Absolute attributes also play an important role in
memory for popular recordings, despite their irrelevance
to a song’s identity. When respondents are asked to sing
short passages from well-known recordings, they tend to
do so at a pitch (Levitin, 1994) and tempo (Levitin &
Cook, 1996) that closely approximate those of the origi-
nal recordings. Anecdotal evidence indicates that listen-
ers can recognize songs rapidly when scanning through
radio stations for a song that they like or when participat-
ing in radio contests (e.g., “Name that Tune”) that require
identification of brief excerpts of recordings. Although
it is possible that the limited relational information avail-
able in these segments is sufficient for recognition, we
suggest that such recognition relies more on absolute in-
formation based primarily on timbre rather than on pitch
or tempo. (Timbre can also refer to the global sound qual-
ity of the recording and orchestration of a particular
song.) Indeed, listeners’ ability to perceive differences in
timbre is remarkable. For example, sequences of 10-msec
tones with identical pitch but different timbres can be dis-
tinguished from comparison sequences with the same
tones played in a different order (Warren et al., 1991).
Moreover, specific musical instruments can be identified
in forced-choice tasks involving tones of similarly short
durations (Robinson & Patterson, 1995a).
In the present investigation, listeners were asked to
identify excerpts from recordings of popular songs that
were too brief to contain any relational information. We
selected five recordings that were highly popular in North
America in the months preceding data collection and,
therefore, likely to be familiar to undergraduates. Our goal
was twofold: (1) to explore the limits of listeners’ abil-
ity to identify recordings from very brief excerpts and
(2) to identify stimulus attributes necessary for success-
ful identification. Although our excerpts contained ab-
solute information about pitch and timbre, their brevity
(100 or 200 msec) precluded the possibility of identify-
ing words or multiple tones presented successively. Our
hypothesis was that listeners would rely on timbre more
than on absolute pitch in these brief contexts. Accord-
ingly, the excerpts were altered in some conditions, to ex-
amine which attributes were important for identification.
Specifically, we altered the distribution of frequencies in
the harmonic spectrum through high-pass (frequencies <
1000 Hz attenuated) and low-pass (frequencies > 1000 Hz
attenuated) filtering and the dynamic information by play-
ing the excerpts backward. These alterations affected the
timbre of the excerpts but had little impact on their per-
ceived pitch. Thus, differential responding across condi-
tions would indicate listeners’ greater reliance on timbre
than on absolute pitch.
METHOD
Participants
The listeners were 100 undergraduates enrolled in psychology
courses at a medium-sized Canadian university located a few miles
from downtown Detroit. Participation in the experiment took ap-
proximately 20 min, for which the students received partial course
credit. An additional 10 listeners were recruited but excluded from
the testing session for failing to meet the inclusion criterion (see the
Procedure section).
Apparatus and Stimulus Materials
We searched through “HOT 100” charts in Billboard magazine to
select five recordings that were highly popular in North America in
the months preceding data collection: (1) “Because You Loved Me,”
performed by Celine Dion; (2) “Exhale (Shoop Shoop),” performed
by Whitney Houston; (3) “Macarena,” performed by Los Del Rios;
(4) “Missing,” performed by Everything But the Girl; and (5) “One
Sweet Day,” performed by Mariah Carey and Boyz II Men. The ex-
tensive airplay accorded these songs ensured that it was likely that
anyone who had listened to popular music during this period had
been exposed to all of them. The recordings were purchased on com-
pact disc. An excerpt from each disc was digitally copied onto the
hard disk of a Macintosh PowerPC 7100/66AV computer in 16-bit
format (sampling rate of 22.05 kHz) using the SoundEdit 16 soft-
ware program. Excerpt onsets were chosen to be maximally repre-
sentative of the recordings (experimenters’ judgment); each started
on a downbeat at the beginning of a bar. One of the excerpts (“Maca-
rena”) contained no vocals.1
There were five experimental conditions. In one condition, the
excerpts were 200 msec in duration; this duration was selected so
that the task would be challenging but not impossible. In a second
condition, the excerpts were shortened to 100 msec by deleting the
second half. Frequency spectra at 50 msec from excerpt onsets are
illustrated in Figure 1. In a third condition, the 100-msec excerpts
were played backward (as in Van Lancker et al., 1985), which dis-
rupted the dynamic information but had no effect on the static
(steady state) information. In the remaining two conditions, the orig-
inal (forward) 100-msec excerpts were high-pass or low-pass filtered
(following D. L. Halpern, Blake, & Hillenbrand, 1986, but with a
cutoff frequency of 1000 Hz, similar to Compton, 1963), using the
SoundEdit program.2 The stimuli were presented to the listeners
binaurally via headphones (Sony CD 550) at a comfortable listening
level. Inclusion of 10-msec onset and offset ramps proved to be un-
detectable to the experimenters, so the excerpts were not ramped.
Procedure
The listeners were tested individually; 20 were assigned to each
of five conditions. They wore headphones and sat in front of the
computer monitor in a quiet room. A SoundEdit file was open on
the computer, which allowed the listeners to see the waveforms for
each of the five excerpts. (None of the listeners reported any famil-
iarity with waveforms.) The order of the waveforms was random-
ized separately for each condition. To hear an excerpt, the listeners
used a mouse connected to the computer and clicked on one of the
waveforms. The listeners were provided with an answer sheet that
listed the five artists and song titles (alphabetical order) and were
required to match the five excerpts with the five songs on the an-
swer sheet. This method differed from multiple-choice tasks in that
the five judgments from any individual listener were not indepen-
dent (e.g., one error ensured another error). The listeners were allowed
to hear the test excerpts repeatedly and in any order they chose.
Prior to the test session, the participants were informed that there
would be a pretest, to verify that they were familiar with the five
songs used in the experiment. Because many of the students might
have been familiar with the recordings but not with the names of the
songs, the pretest also served to familiarize or refamiliarize the par-
ticipants with the song titles and artists, as was required in the sub-
sequent experiment. The pretest involved presenting a single 20-sec
excerpt from each of the recordings and requiring listeners to match
the five excerpts with the five song titles and artists, as in the actual
experiment. The vocals in these excerpts did not reveal the titles of
the songs, and the 20-sec excerpts did not contain the excerpts used
IDENTIFYING POPULAR RECORDINGS 643
in the actual experiment. Only listeners who scored 100% were in-
cluded in the final sample, but all the participants received course
credit, even if they failed to meet the inclusion criterion. The listen-
ers were tested individually or in small groups during the screening
process. A delay of several minutes between the screening session
and the actual experiment prevented the listeners from retaining a
representation of the excerpts in working memory.
RESULTS
For each condition, there were 120 (5 ! 4 ! 3 ! 2
! 1) possible response combinations, each of which was
equally likely if the listeners were guessing. The average
number of correct responses for these 120 possibilities was
one. Because the distribution of scores (number correct
)
based on chance levels of responding was not normal, the
data were analyzed with nonparametric tests. Individual
listeners were classified according to whether or not they
performed better than chance (score > 1 or score ≤ 1).
The probability of getting more than one correct response
(two, three, or five correct)3 was 31/120 if listeners were
guessing. Thus, only about 1 in 4 listeners (i.e., 5.17 out
of 20 in each condition) should score better than chance,
if listeners as a group were guessing. Figure 2 illustrates
the number of listeners who performed above chance sep-
arately for each condition. Mean scores for each condition
(provided below the figure) make it clear that dichoto-
mizing the outcome variable did not affect the overall re-
sponse pattern.
Chi-square goodness-of-fit tests were used separately
for each condition, to examine whether the number of
listeners with scores greater than 1 exceeded chance levels.
Performance was much better than chance in the 200-
msec condition [c2(1, n ” 20) ” 49.89, p < .001], with
19 of 20 listeners performing above chance. Group re-
sponding remained above chance for the even briefer
100-msec stimuli [c2 (1, n " 20) " 8.87, p < .005]. Per-
formance was also better than chance in the 100-msec
high-pass filtered condition [c2 (1, n " 20) " 15.99, p <
.001], but not in the low-pass filtered or backward con-
ditions. A chi-square test of independence confirmed that
the number of listeners performing above chance differed
across conditions [c2 (4, N " 100) " 30.29, p < .001].
Performance in the 200-msec condition was superior
to levels observed in the 100-msec condition [c2 (1, n ”
40) ” 8.53, p < .005]. This effect was evident for each of
the five recordings and implies that successful identifi-
cation of the recordings required the presence of dynamic
information in the frequency spectrum, because the sta-
tic (steady state) information and the absolute pitch of
the excerpts would have been very similar for the 200-
Frequency (kHz)
R
el
at
iv
e
A
m
pl
itu
de
(
dB
)
0 2 4 6 8 1
0
One Sweet Day
Because You
Loved Me
Exhale
(Shoop Shoop)
Macarena
Missing
20 dB=
Figure 1. Relative amplitude of frequencies between 0 and 10 kHz in the
unfiltered, forward excerpts. Spectra were derived using linear predictive
coding (LPC) at 50 msec after the onset of each excerpt.
644 SCHELLENBERG, IVERSON, AND MCKINNON
and the 100-msec excerpts. This hypothesis was tested
directly in the next comparison, which showed that per-
formance was poorer in the backward 100-msec condition
than it was in the forward 100-msec condition [c2 (1, n ”
40) ” 5.23, p < .05]. This decrement was evident for four
of the five songs (all but “One Sweet Day”). Because sta-
tic spectral information and absolute pitch were exactly
the same in these two conditions, inferior performance
with the backward excerpts provides confirmation of lis-
teners’ reliance on dynamic information in the frequency
spectrum.
In the next set of analyses, differences in performance
as a function of the presence of low-frequency or high-
frequency information were examined. Performance in the
high-pass filtered condition was no different from levels
observed in the original 100-msec condition; the number
of listeners scoring above chance increased for two songs
(“Because You Loved Me” and “Missing”), decreased for
two songs (“Exhale” and “Macarena”), and remained un-
changed for one song (“One Sweet Day”). Signif icant
performance decrements were observed, however, in the
low-pass condition, as compared with the original 100-
msec and the high-pass conditions [c2 (1, n ” 60) ” 6.54,
p < .05]; indeed, the low-pass condition had the fewest
above-chance listeners for all the songs but one (“One
Sweet Day”). Thus, successful identification of the ex-
cerpts depended on the presence of high-frequency, but
not on low-frequency, spectral information.
To examine the possibility that listeners were relying
solely on vocal cues, rather than the timbre of the overall
recordings, we examined song-by-song responding for
each of the three conditions in which performance was bet-
ter than chance. In each condition, absolute levels of per-
formance were highest for the excerpt that did not con-
tain any vocals (“Macarena”).
DISCUSSION
Our listeners were able to identify recordings of pop-
ular songs from excerpts as brief as 0.1 sec, provided that
dynamic, high-frequency information from the record-
ings was present in the excerpts. The observed pattern of
findings cannot be attributed to absolute-pitch cues or to
recognition of specific voices. Rather, the spectra in Fig-
ure 1 show that the excerpt with the highest levels of per-
formance (“Macarena,” no vocals) had the densest concen-
tration of energy between 1000 and 8000 Hz, which may
have contributed to its relative distinctiveness. Listeners
may also have been more familiar with “Macarena” than
with the other recordings.
Listeners’ ability to identify complex musical stimuli
from a minimal amount of perceptual information is sim-
ilar to their abilities with speech. For example, 10-msec
vowels can be identified reliably (Robinson & Patterson,
1995b; Suen & Beddoes, 1972), as can individual voices
from vowel samples as brief as 25 msec (Compton, 1963).
0
5
10
15
20
200 ms 100 ms 100 msbackward
100 ms
high-pass
100 ms
low-pass
Mean: 3.45 1.70 1.00 1.90 1.20
Li
st
en
er
s
A
bo
ve
C
ha
nc
e
Figure 2. Number of listeners exceeding chance levels (>1 correct response) for each
testing condition (ns ” 20). Hatched bars indicate conditions in which group perfor-
mance was significantly better than chance. Mean scores (number of songs identified
correctly) are provided below the figure.
IDENTIFYING POPULAR RECORDINGS 645
When respondents are asked to identify famous voices
from a set of 60 different voices, performance starts to ex-
ceed chance levels with samples of 250 msec (Schwein-
berger, Herholz, & Sommer, 1997). The capacity to iden-
tify speech stimuli from a minimal amount of information
appears to be general enough to extend to other auditory
domains—such as music—where the adaptive signif i-
cance is much less obvious (Roederer, 1984). Although
our findings do not imply that recognition of popular songs
typically occurs in 100 msec, they provide unequivocal
evidence that excerpts this brief contain information that
can be used for identification. Moreover, our results re-
veal that such information is timbral in nature and inde-
pendent of absolute-pitch cues or changes in pitch and
tone durations.
Our results extend those of Levitin (1994; Levitin &
Cook, 1996; see, also, A. R. Halpern, 1989), who reported
that memory representations for popular recordings con-
tain absolute information about pitch and tempo. With
very brief presentations, however, identification of re-
cordings is primarily a function of timbre rather than of
absolute pitch or tempo. Although information about
tempo was unavailable in our brief excerpts, pitch is per-
ceptible from tones as brief as 10 msec (Warren et al.,
1991). Nonetheless, performance was at chance when our
100-msec excerpts were played backward or low-pass fil-
tered. Because both manipulations would have dramati-
cally disrupted attributes that are critical to timbre (dy-
namic and static information, respectively) while having
little impact on perceived pitch, it appears that timbre is
more important than absolute pitch for identifying pop-
ular recordings from very brief excerpts. This finding con-
verges with others involving music and speech, which
show that timbre (i.e., a specific musical instrument or
vowel) is better identified than is pitch when stimuli are
extremely brief (Robinson & Patterson, 1995a, 1995b).
The listeners’ dependence on timbre rather than on ab-
solute pitch in the present investigation could stem from
(1) the importance of timbral cues (i.e., voice qualities
other than pitch) in speech, (2) the relative unimportance
of absolute, as compared with relative pitch in music lis-
tening, or (3) both of these factors. Although voices vary
in pitch as well as in timbre, differences in pitch (i.e., av-
erage fundamental frequency) between talkers of the same
sex are relatively small; in a group of 12 women tested by
Miller (1983), the SD was less than 2.5 semitones. None-
theless, most people can rapidly identify many different
female (or male) voices, despite similarities in pitch. Be-
cause of the multidimensional nature of timbre, voice-
quality cues are more distinctive than those based on pitch.
Extensive experience discriminating voices on the basis
of timbre could, in turn, influence processing in the musi-
cal domain.
We also know that the ability to perceive musical pitch
in an absolute manner is limited to a relatively small pro-
portion of the population (approximately 1 in 10,000; see
Takeuchi & Hulse, 1993). Absolute-pitch possessors can
identify a note by name (e.g., C, F♯, etc.) when it is played
in isolation (an ability that is qualitatively different than
remembering the pitch of a recording). Because such ab-
solute-identification abilities tend to be automatic, they
can interfere with relational processing strategies that are
more relevant to music listening (Miyazaki, 1993). More-
over, other evidence implies that absolute-pitch process-
ing is actually a relatively primitive auditory strategy. For
example, elevated prevalence levels have been reported
among mentally retarded individuals, and absolute- rather
than relative-pitch processing is the norm for nonhuman
vertebrates (Ward & Burns, 1982).
At present, it is unclear why the portion of the spectrum
above 1000 Hz is more important for song recognition
than the portion below 1000 Hz. The high-pass filtered
excerpts differed quantitatively from the low-pass ex-
cerpts (e.g., they had more spectral information, because
most of the harmonics in the excerpts were above 1000 Hz;
see Figure 1), and qualitative differences may also have
played a role (e.g., the high frequencies may have been
more distinctive). It is also possible that high-frequency
timbral information is either perceived or encoded in mem-
ory with better detail, as compared with low-frequency
information. Interestingly, Compton (1963) used speech
samples that were low-pass and high-pass filtered much
like our musical excerpts (cutoff frequency of 1020 Hz,
rather than 1000 Hz) and reported results similar to ours.
His respondents, who were asked to identify the talker,
showed marked deficits in performance for low-pass fil-
tered samples, but not for high-pass samples.
Performance levels in the present study were undoubt-
edly inflated by two factors: (1) allowing the excerpts to
be heard repeatedly, which would have enhanced percep-
tual fluency for the repeated items (Jacoby & Dallas,
1981), and (2) the pretest session, which would have
primed listeners’ memories of the songs. Indeed, exposure
to the pretest excerpts could have allowed above-chance
levels of performance to emerge even among listeners
who had limited familiarity with the songs prior to the
experiment. These listeners may have met the pretest in-
clusion criterion by recognizing one or two of the singers,
by a process of elimination, by luck, or by a combination
of these factors, all of which may have influenced perfor-
mance in the subsequent test session as well. Because the
listeners received course credit even if they failed to meet
the inclusion criterion (which excused them from the test
session), however, it is unlikely that they falsely claimed
familiarity with the tunes. Moreover, the time frame of
the experiment prevented the listeners from retaining one
or more of the excerpts in working memory. By defini-
tion, then, the task required the listeners to rely primarily
on representations in long-term memory of greater or
lesser permanence. For example, such representations
would be relatively permanent (or consolidated) for lis-
teners with extensive familiarity with the tunes, but more
temporary (or less consolidated) for other listeners, being
retrievable only for the length of the experiment. Regard-
646 SCHELLENBERG, IVERSON, AND MCKINNON
less, the results make it clear that (1) the brief stimuli
contained information that listeners could compare with
their representations of the recordings and (2) this infor-
mation was primarily timbral in nature. Future research
could examine the generalizability of these findings with
a broader selection of excerpts and a less constrained task.
For example, different results might be obtained with re-
cordings of soft-rock tunes or orchestral symphonies or
with individual recordings in which the overall timbre is
less distinctive. Representations that vary in degree of con-
solidation could also differ in the way timbre is encoded.
It is important to clarify that absolute attributes in mem-
ory representations for popular songs would be stored in
combination with the relational information that defines
the songs. Adult, child, and infant listeners recognize sim-
ilarities between sequences of pure tones presented in
transposition (different absolute pitch, same pitch and tem-
poral relations; Schellenberg & Trehub, 1996a, 1996b).
It is safe to assume, then, that our listeners would recog-
nize previously unheard versions of, say, “Macarena,” per-
formed by different singers, on different instruments, and
in a key and tempo different from the original recording.
Nonetheless, our results provide converging evidence that
memory representations for complex auditory stimuli
contain information about the absolute properties of the
stimuli, in addition to more meaningful information ab-
stracted from the relations between stimulus components.
Indeed, in contexts with an extremely limited amount of
information, listeners may rely primarily on the sound
quality of the stimuli for successful identification and
recognition.
REFERENCES
Compton, A. J. (1963). Effects of filtering and vocal duration upon the
identification of speakers, aurally. Journal of the Acoustical Society
of America, 35, 1748-1752.
Dowling, W. J., & Harwood, D. L. (1986). Music cognition. San Diego:
Academic Press.
Grey, J. M. (1977). Multidimensional perceptual scaling of musical tim-
bres. Journal of the Acoustical Society of America, 61, 1270-1277.
Hajda, J. M., Kendall, R. A., Carterette, E. C., & Harshberger,
M. L. (1997). Methodological issues in timbre research. In I. Deliège
& J. Sloboda (Eds.), Perception and cognition of music (pp. 253-306).
Hove, U.K.: Psychology Press.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar
songs. Memory & Cognition, 17, 572-581.
Halpern, D. L., Blake, R., & Hillenbrand, J. (1986). Psychoacous-
tics of a chilling sound. Perception & Psychophysics, 39, 77-80.
Iverson, P., & Krumhansl, C. L. (1993). Isolating the dynamic attrib-
utes of musical timbre. Journal of the Acoustical Society of America,
94, 2595-2603.
Jacoby, L. L., & Dallas, M. (1981). On the relationship between auto-
biographical memory and perceptual learning. Journal of Experimen-
tal Psychology: General, 110, 306-340.
Jones, M. R., & Yee, W. (1993). Attending to auditory events: The role
of temporal organization. In S. McAdams & E. Bigand (Eds.), Think-
ing in sound: The cognitive psychology of human audition (pp. 69-
112). Oxford: Oxford Universty Press, Clarendon Press.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New
York: Oxford University Press.
Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence
from the production of learned melodies. Perception & Psychophysics,
56, 414-423.
Levitin, D. J., & Cook, P. R. (1996). Memory for musical tempo: Addi-
tional evidence that auditory memory is absolute. Perception &
Psychophysics, 58, 927-935.
McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., & Krimp-
hoff, J. (1995). Perceptual scaling of synthesized musical timbres:
Common dimensions, specificities, and latent subject classes. Psy-
chological Research, 58, 177-192.
Miller, C. L. (1983). Developmental changes in male/female classifi-
cation by infants. Infant Behavior & Development, 6, 313-330.
Miyazaki, K. (1993). Absolute pitch as an inability: Identification of
musical intervals in a tonal context. Music Perception, 11, 55-72.
Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in
speech perception. Perception & Psychophysics, 60, 355-376.
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech percep-
tion as a talker-contingent process. Psychological Science, 5, 42-46.
Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic en-
coding of voice attributes and recognition memory for spoken words.
Journal of Experimental Psychology: Learning, Memory, & Cogni-
tion, 19, 309-328.
Pitt, M. A., & Crowder, R. G. (1992). The role of spectral and dynamic
cues in imagery for musical timbre. Journal of Experimental Psychol-
ogy: Human Perception & Performance, 18, 728-738.
Robinson, K., & Patterson, R. D. (1995a). The duration required to
identify an instrument, the octave, or the pitch chroma of a musical note.
Music Perception, 13, 1-15.
Robinson, K., & Patterson, R. D. (1995b). The stimulus duration re-
quired to identify vowels, their octave, and their pitch chroma. Jour-
nal of the Acoustical Society of America, 98, 1858-1865.
Roederer, J. G. (1984). The search for a survival value of music. Music
Perception, 1, 350-356.
Schellenberg, E. G., & Trehub, S. E. (1996a). Children’s discrimi-
nation of melodic intervals. Developmental Psychology, 32, 1039-1050.
Schellenberg, E. G., & Trehub, S. E. (1996b). Natural intervals in
music: A perspective from infant listeners. Psychological Science, 7,
272-277.
Schweinberger, S. R., Herholz, A., & Sommer, W. (1997). Recog-
nizing familiar voices: Influence of stimulus duration and different
types of retrieval cues. Journal of Speech, Language, & Hearing Re-
search, 40, 453-463.
Suen, C. Y., & Beddoes, M. P. (1972). Discrimination of vowel sounds
of very short duration. Perception & Psychophysics, 11, 417-419.
Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychologi-
cal Bulletin, 113, 345-361.
Tulving, E., & Thomson, D. M. (1973). Encoding specificity and re-
trieval processes in episodic memory. Psychological Review, 80, 352-
373.
Van Lancker, D., Kreiman, J., & Emmorey, K. (1985). Familiar voice
recognition: Patterns and parameters: Part I. Recognition of backward
voices. Journal of Phonetics, 13, 19-38.
Ward, W. D., & Burns, E. M. (1982). Absolute pitch. In D. Deutsch (Ed.),
The psychology of music (pp. 431-451). New York: Academic Press.
Warren, R. M., Gardner, D. A., Brubaker, B. S., & Bashford, J. A.
(1991). Melodic and nonmelodic sequences of tones: Effects of du-
ration on perception. Music Perception, 8, 277-290.
NOTES
1. Although the recording of “Macarena” contained vocals, the ex-
cerpt did not.
2. Filtering is actually gradual rather than absolute; some frequencies
on the unwanted side of the cutoff point are present with monotonically de-
creasing amplitude (D. J. Levitin, personal communication, August 1998).
3. A score of four correct was impossible: In the present matching task,
one error ensured another error.
(Manuscript received June 23, 1998;
revision accepted for publication December 11, 1998.)
6
Ann. N.Y. Acad. Sci. 1060: 6–16 (2005). © 2005 New York Academy of Sciences.
doi: 10.1196/annals.1360.002
Probing the Evolutionary Origins
of Music Perception
JOSH MCDERMOTTa AND MARC D. HAUSERb
aPerceptual Science Group, Department of Brain and Cognitive Sciences,
Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
bCognitive Evolution Laboratory, Department of Psychology,
Harvard University, Cambridge, Massachusetts 02138
ABSTRACT: Empirical data have recently begun to inform debates on the evo-
lutionary origins of music. In this paper we discuss some of our recent findings
and related theoretical issues. We claim that theories of the origins of music will
be usefully constrained if we can determine which aspects of music perception
are innate, and, of those, which are uniquely human and specific to music.
Comparative research in nonhuman animals, particularly nonhuman pri-
mates, is thus critical to the debate. In this paper we focus on the preferences
that characterize most humans’ experience of music, testing whether similar
preferences exist in nonhuman primates. Our research suggests that many
rudimentary acoustic preferences, such as those for consonant over dissonant
intervals, may be unique to humans. If these preferences prove to be innate in
humans, they may be candidates for music-specific adaptations. To establish
whether such preferences are innate in humans, one important avenue for fu-
ture research will be the collection of data from different cultures. This may be
facilitated by studies conducted over the internet.
KEYWORDS: music; preferences; monkey; consonance; evolution; adaptation
INTRODUCTION
From the standpoint of evolutionary theory, music is among the most puzzling
things that people do. As far as we know, music is universal, playing a significant
role in every human culture that has ever been documented. People everywhere love
music and expend valuable resources in order to produce and listen to it. Yet despite
its central role in human culture, the evolutionary origins of music remain a great
mystery. Unlike many other things that humans enjoy (e.g., food, sex, and sleep),
music confers no obvious value to an organism, and for this reason music has
puzzled evolutionary theorists since the time of Darwin.1
Although the adaptive function of music, if any, remains unknown, there is no
shortage of proposals for how it might have evolved. Some have noted that music
Address for correspondence: Josh McDermott, Perceptual Science Group, Department of
Brain and Cognitive Sciences, Massachusetts Institute of Technology, NE20-444, 3 Cambridge
Center, Cambridge, MA 02139. Voice: 617-258-9412; fax: 617-253-8335.
jhm@mit.edu
7MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION
might promote social cohesion in group activities like war or religion; others have
proposed a sexually selected role in courtship.1–6 Developmental psychologists have
drawn attention to the pacifying effect music has on infant listeners, which could
constitute an adaptive function.7 Still others suggest that music was not a product of
natural selection and, instead, is a side effect of mechanisms that evolved for other
functions.8 Despite the longstanding interest in music’s origins, there has thusfar
been little empirical data with which to decide between these and other theories (see
McDermott and Hauser9 for a review).
Rather than continue to speculate on putative adaptive functions, we have focused
on gathering further empirical constraints on music’s origins. Our approach is to ex-
amine aspects of human music perception, and for each of them attempt to answer
three questions: (1) Is the feature in question innate in humans? (2) Is it unique to
humans? and (3) Is it specific to music?
Each of these questions plays an important role in thinking about the evolution of
music. Capacities that are innate, that is, determined from properties present in an
organism at birth, are potential targets for evolutionary explanations, unlike capaci-
ties that are learned. The question of uniqueness plays an equally important role, par-
ticularly for music, because music is something that only humans do (see recent
reviews10,29 for a discussion of animal song). If some feature of human music per-
ception is found to be shared by a nonhuman animal, and that feature is assumed to
be homologous to the human feature, then the feature in question must not have
evolved for the purpose of making music. Testing for aspects of human music per-
ception (e.g., octave equivalence,11–13 or relative pitch perception11,12,14) can thus
place useful constraints on music’s origins. The third question of music specificity
is most relevant for features of music perception that have been found to be uniquely
human. If some aspect of music perception in humans is found to be innate and
uniquely human, the possibility remains that it evolved to serve some uniquely
human function other than music, such as language or mathematics. In contrast,
perceptual capacities that are innate, unique, and specific to music are strong candi-
dates for adaptations for music. We thus suggest that evolutionary theories of music
perception would be well served by posing these three questions about different
aspects of music perception.
P
REFERENCES
In this paper we will discuss one particular aspect of music perception—prefer-
ences—framed by the three questions about innateness, uniqueness, and specificity.
Clearly, many preferences that humans have for music are culture specific, as
humans tend to prefer the music of their own culture. Preferences for entire pieces
or genres of music may, however, be built on more elementary preferences that could
themselves be universal and innate in humans. One simple preference that has re-
ceived great attention in music literature is that for consonance over dissonance. It
has been widely appreciated since at least the time of the Greeks that some combi-
nations of musical notes are more pleasing than others. Although the fact that con-
sonant and dissonant intervals are perceptually distinct seems to follow from what is
known about the peripheral auditory system,15–17 it remains unclear why conso-
nance is preferable to dissonance. This preference is generally acknowledged to be
8 ANNALS NEW YORK ACADEMY OF SCIENCES
widespread among Westerners, but there is surprisingly little data from other cul-
tures to support a claim of universality.18,19 Recent work in developmental psycho-
logy, however, suggests that the preference for consonance is either innate or
acquired very early, as infants as young as two months seem to exhibit the prefer-
ence.20–22 There is thus some evidence that the preference is present independent
from musical experience, although a larger cross-cultural database would help to
augment the existing case.
Given the possibility that this and perhaps other elementary preferences are in-
nate, our research has focused on the question of whether such preferences are
unique to humans by testing for them in nonhuman primates. A consonance prefer-
ence in a nonhuman primate would provide evidence that the preference did not
evolve for the purpose of making and/or appreciating music, as nonhuman primates
do not naturally make music. Conversely, any feature of music found to be uniquely
human becomes a candidate for part of an adaptation for music, particularly if there
is evidence that it is specific to music. Nonhuman subjects have the additional ad-
vantage of being reared in a laboratory setting, in which their exposure to music can
be controlled to an extent not possible in humans for practical and ethical reasons.
As a result of this high level of control, many of the concerns often voiced about the
role of musical exposure in experimental results from human infants can be decisive-
ly addressed. We thus tested for various acoustic preferences, including that for
consonance over dissonance, in nonhuman primates.
Our subjects in the experiments to be described are two species of new world
monkey—cotton-top tamarins and common marmosets. Both species are native to
the South American rain forest; their lineage diverged from that of humans approx-
imately 48 million years ago (FIG. 1). They are generally regarded as the most prim-
FIGURE 1. Divergence times of some of the relevant taxonomic groups used in studies
of the origins and evolution of music. The cotton-top tamarins and common marmosets used
in our studies are New World monkeys. (Reproduced with permission from Hauser and
McDermott.10)
9MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION
itive species of monkey, but are small (weighing on the order of about one pound)
and harmless, making them useful experimental subjects. Their hearing characteris-
tics have not been well explored, but the audiograms that have been measured in
marmosets are similar to those of humans,23 and recent auditory physiology work
suggests there may be higher-level similarities as well.24 Recent behavioral work in
Japanese monkeys suggests that nonhuman primates can readily discriminate
between consonance and dissonance,25 as one would expect given Helmholtzian the-
ory and the recent physiological results that support it. What is unknown is whether
nonhuman primates would also prefer consonance over dissonance as many humans
do. All the animals used in our experiments were reared in captivity, and none had
ever heard human music prior to the onset of the experiments.
A METHOD TO MEASURE PREFERENCES
To measure preferences in animals, we used a behavioral method in which sub-
jects were placed in a V-shaped maze26 (FIG. 2); related methods have been devel-
oped to test for preferences in birds.27 Each branch of the maze had a speaker at its
end, and a subject’s position in the apparatus controlled their auditory environ-
ment—one sound was played out of the left speaker when they were in the left
branch of the maze, and another out of the right speaker when they were in the right
branch. The stimulus for a particular side played continuously as long as the animal
was on that side, and switched as soon as they switched sides. If a subject preferred
FIGURE 2. Photo of the apparatus used in nonhuman primate experiments. (Repro-
duced with permission from McDermott and Hauser.26)
10 ANNALS NEW YORK ACADEMY OF SCIENCES
one of the two sounds over the other, one might expect them to spend more time in
the corresponding side of the apparatus, so as to increase their exposure to the pre-
ferred sound. We left animals in the apparatus for five-minute sessions and measured
the proportion of time they spent on the left and right.
To verify that the method was appropriate for measuring preferences for sounds,
we began by conducting two control experiments. In the first, we presented subjects
with a choice between loud (90 dB) and soft (60 dB) white noise. We expected the
animals to find the high amplitude noise aversive, and to thus spend more time on
the side of the soft noise. The average results from six tamarins over four sessions
are shown in FIGURE 3. The animals exhibited a pronounced bias toward the soft side
as early as the first session, an effect that increased in the second session. Between
the second and third sessions the side–sound pairings were reversed, to rule out ef-
fects due to side biases. Following the reversal, the animals spent an average of 50%
of the time on each side. Coupled with the increase in the effect from the first session
to the second, this indicates that the animals had acquired a side–sound association
that took time to be unlearned. By the fourth session (the second after the reversal),
however, the effect had reversed, such that they again spent most of the time on the
side with the soft noise. The results suggest that the animals learn to associate a side
with a sound and modulate their position in the apparatus to reflect their preferences.
In a second control experiment, we presented tamarins with a choice between two
classes of species-specific vocalizations: chirps that they emit in the presence of
FIGURE 3. Results of the first control experiment, in which animals were presented
with a choice between loud and soft white noise. Each bar plots the average data from 6 sub-
jects, as a proportion of the total time spent in the apparatus. Error bars here and elsewhere
denote standard errors. The dashed line denotes the reversal of the side assignment that
occurred after the second session. (Reproduced with permission from McDermott and
Hauser.26)
11MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION
food, and screams that they make when being held by a veterinarian. We reasoned
that they would be likely to have negative associations with the screams and positive
associations with the chirps, and thus might spend more time on the side with the
chirps than that with the screams. Recordings of the two types of vocalizations were
equated in amplitude to minimize loudness differences. The same six tamarins were
again run in several five-minute sessions. As shown in FIGURE 4, the tamarins spend
more time on average with the chirps than with the screams, providing additional ev-
idence that our method provides an appropriate behavioral assay for measuring pref-
erences for sounds.
CONSONANCE AND DISSONANCE
We next proceeded to test for preferences for consonance over dissonance. Before
testing our animal subjects with such stimuli, we ran an analogous experiment in
humans to confirm that a behavioral method such as ours would demonstrate the
consonance preference believed to be widespread in humans. Our human subjects
were placed in a room divided in half with a strip of tape (FIG. 5). A concealed speak-
er was situated on each side of the room, and as in the animal apparatus, each speaker
was assigned a particular stimulus. Only one speaker was on at a time, triggered by a
subject’s position in the room. Our human subjects were given no instructions and
were merely told they would be left in the room for five minutes and videotaped. All
subjects were naive as to the purpose of the experiment and were involved for a single
session. As with the tamarins, we measured the proportion of time spent on each side.
The consonant stimulus in this experiment was a random sequence of two-note
chords, the notes of which were separated by either an octave, and fifth, or a fourth.
FIGURE 4. Results from the second control experiment, comparing tamarin food
chirps with distress screams. Data are averages across sessions. (Reproduced with permis-
sion from McDermott and Hauser.26)
12 ANNALS NEW YORK ACADEMY OF SCIENCES
The dissonant stimulus was a similar sequence of minor seconds, tritones, and minor
ninths. The notes composing the intervals were synthesized complex tones with ten
harmonics. The bass note was always middle C. Each interval was 1.5 s in duration.
FIGURE 6 (left) plots the average results for four human subjects, all of whom
spent most of their time on the side with the consonant intervals. Typically a human
subject would wander around the room until by chance they crossed over the divid-
ing line, thus changing the sound. After moving back and forth across the line a few
times, they quickly realized that their position controlled the sound, and thereafter
typically spent most of their time on the side of the sound they preferred. These re-
sults suggested that our method would be sufficient to demonstrate a consonance
preference in nonhuman primates were they to share this with humans,
FIGURE 6 (right) plots the average results for five tamarins. In contrast to the
humans, they showed no effect. Note that the animals used in these experiments were
the same ones used in the two control experiments, both of which yielded significant
effects. Moreover, all 5 animals again showed a preference for loud over soft noise
when tested at the conclusion of the consonance experiment, confirming that they
had not somehow habituated to the apparatus or method. Rather, it seems that
tamarins do not exhibit the preference for consonance over dissonance found in
humans, even when tested with analogous methods.
FIGURE 5. Schematic of setup for human control experiments.
13MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION
SCREECHING
For a second test of whether nonhuman primates might share timbral preferences
with humans, we turned to a sound that many humans find highly aversive—the
sound of fingernails on a blackboard. We made recordings of a very similar sound
produced by scraping a metal garden tool down a glass window; many listeners in-
formally reported the sounds to be very unpleasant. Spectrograms of the sounds we
recorded revealed harmonic structure superimposed on broadband noise, similar to
what has been previously described.28 Little is known about why such sounds are so
unpleasant, or about the relationship between the perceptual effect they have and that
of musical stimuli, but given the strength of the reaction evoked in humans, they
seemed a promising stimulus with which to test for timbral preferences in nonhuman
primates.
We used a concatenation of several screech recordings as an experimental stimu-
lus. For a control stimulus, we generated white noise with the same amplitude enve-
lope as the screech stimulus. This control stimulus was as loud as the screech
stimulus, but otherwise sounded quite different, and we intended it to be much less
annoying to human listeners. We again began by running an experiment with human
subjects, using the same method as for the consonance experiment. As expected, our
method revealed a pronounced preference in humans for the white noise over the
screech, shown in FIGURE 7a. In contrast, the tamarins showed no evidence of a pref-
erence one way or the other, even when run over many sessions (FIG. 7b). Evidently
the screeching sounds that are so annoying to most humans are not particularly aver-
sive for tamarins, at least no more so than our amplitude-matched control stimulus.
FIGURE 6. Results from experiment comparing consonant and dissonant musical in-
tervals. (Left) Results for human subjects. (Right) Results for tamarin subjects. (Reproduced
with permission from McDermott and Hauser.26)
14 ANNALS NEW YORK ACADEMY OF SCIENCES
ARE HUMAN ACOUSTIC PREFERENCES UNIQUELY HUMAN?
Two timbral preferences that are pronounced in humans thus appear to be absent
in cotton-top tamarins. We have recently replicated the consonance result in com-
mon marmosets (McDermott and Hauser, unpublished data), and although it would
be ideal to test other species of primates as well, our results raise the possibility that
nonhuman primates may lack the timbral preferences that appear to at least partly
underlie human appreciation of music.
One key difference between our primate subjects and our human subjects, how-
ever, is that the humans all had a lifetime of exposure to music, as do virtually all
humans. The consonance preferences apparently present in young infants suggest
that a lifetime of exposure is not necessary to develop the preference, but whatever
exposure the infants inevitably had may nonetheless be important. Given this, our
results suggest three main possibilities: (1) Simple acoustic preferences for conso-
nance and other stimuli could be innate in humans, and unique to them, given the
absence of such preferences in the nonhuman primates we have tested. (2) Such
preferences might not be unique to humans and could primarily be the result of ex-
posure to musical stimuli, which our nonhuman primate subjects lacked. (3) Such
preferences could require exposure to music but might also involve specialized
learning mechanisms that could be unique to humans, and perhaps specific to music.
A key issue, therefore, involves determining the role of exposure to music. One
important avenue for future research will be to explore the effects of extended mu-
sical exposure on nonhuman animals. If nonhuman animals can develop preferences
given enough exposure to human music, domain-general learning mechanisms might
then also be responsible for human preferences. Conversely, if animals tested after
musical exposure still do not exhibit any of the preferences found in humans, the
case for uniqueness would be bolstered, for even with similar auditory experience,
humans and nonhumans would exhibit different behavior. Further explorations of the
FIGURE 7. Results from experiment comparing a screeching sound to amplitude-
matched white noise. (Left) Results for human subjects. (Right) Results for tamarin subjects.
(Reproduced with permission from McDermott and Hauser.26)
15MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION
effects of musical exposure on humans could help to determine whether exposure
coupled with uniquely human learning mechanisms is involved, or whether the pref-
erences in question are, in fact, innate.
MUSIC UNIVERSALS STUDY
In an attempt to assess the effect of the varying musical exposure that occurs in
different cultures, one of us (J.M.) has set up an experiment on the internet to mea-
sure aspects of music perception in people all over the world. Anyone can participate
in the Music Universals Study by visiting
to collect large amounts of data from people with vastly different musical cultures,
to examine whether any aspects of music perception are invariant across culture. Dif-
ferences across cultures would suggest an important role for learning. Web-based ex-
periments are not a replacement for conventional cross-cultural studies, as the
subject pool is limited to those with internet access, but they are potentially a useful
additional tool with which to ask many questions of interest in music perception.
CONCLUSIONS
The role of musical exposure could also be clarified with a richer cross-cultural
database. We propose that evolutionary theories of music’s origins will be facilitated
by investigating whether aspects of music perception are innate in humans, and, of
those, whether any are unique to humans and specific to music. Our studies of pref-
erences in nonhuman primates suggest that many simple acoustic preferences that
are pronounced in humans are not shared by our primate relatives. Additional re-
search is needed to investigate the role of musical exposure, but such preferences
may thus be innate and unique to humans. Given that some of them appear to be spe-
cific to music, they are candidates for part of an adaptation for music. We believe
that future research investigating the innateness, uniqueness, and specificity of other
aspects of music perception will place strong constraints on the evolutionary origins
of music.
ACKNOWLEDGMENTS
We are grateful to Matt Kamen, Altay Guvench, Fernando Vera, Adam Pearson,
Tory Wobber, Matthew Sussman, and Alex Rosati for their assistance in running the
experiments.
[Competing interests: The authors declare that they have no competing financial
interests.]
REFERENCES
1. DARWIN, C. 1871. The Descent of Man and Selection in Relation to Sex. London. John
Murray.
16 ANNALS NEW YORK ACADEMY OF SCIENCES
2. MERKER, B. 2000. Sychronous chorusing and human origins. In The Origins of Music.
B. Merker & N. L. Wallin, Eds.: 315–327. The MIT Press. Cambridge, MA.
3. MILLER, G.F. 2001. The Mating Mind: How Sexual Choice Shaped the Evolution of
Human Nature, 1st ed. Anchor Books. New York.
4. CROSS, I. 2001. Music, cognition, culture, and evolution. Ann. N. Y. Acad. Sci. 930:
28–42.
5. HURON, D. 2001. Is music an evolutionary adaptation? Ann. N. Y. Acad. Sci. 930: 43–61.
6. HAGEN, E.H. & G.A. BRYANT. 2003. Music and dance as a coalition signaling system.
Hum. Nat. 14: 21–51.
7. TREHUB, S.E. 2003. The developmental origins of musicality. Nat. Neurosci. 6: 669–
673.
8. PINKER, S. 1997. How the Mind Works 1st ed. Norton. New York.
9. MCDERMOTT, J. & M.D. HAUSER. 2005. The origins of music: innateness, uniqueness,
and evolution. Mus. Percept. In press.
10. HAUSER, M.D. & J. MCDERMOTT. 2003. The evolution of the music faculty: a compara-
tive perspective. Nat. Neurosci. 6: 663–668.
11. HULSE, S.H. & J. CYNX. 1985. Relative pitch perception is constrained by absolute
pitch in songbirds (Mimus, Molothrus, and Sturnus). J. Comp. Psychol. 99: 176–196.
12. D’AMATO, M.R. 1988. A search for tonal pattern perception in cebus monkeys: why
monkeys can’t hum a tune. Mus. Percept. 5: 453–480.
13. WRIGHT, A.A., J.J. RIVERA, S.H. HULSE, et al. 2000. Music perception and octave gen-
eralization in rhesus monkeys. J. Exp. Psychol. Gen. 129: 291–307.
14. BROSCH, M., E. SELEZNEVA, C. BUCKS & H. SCHEICH. 2004. Macaque monkeys dis-
criminate pitch relationships. Cognition 91: 259–272.
15. HELMHOLTZ, H.V. & A.J. ELLIS. 1954. On the Sensations of Tone as a Physiological
Basis for the Theory of Music. 2nd English ed. Dover Publications. New York.
16. FISHMAN, Y.I., I.O. VOLKOV, M.D. NOH, et al. 2001. Consonance and dissonance of
musical chords: neural correlates in auditory cortex of monkeys and humans. J.
Neurophysiol. 86: 2761–2788.
17. TRAMO, M.J., P.A. CARIANI, B. DELGUTTE & L.D. BRAIDA. 2001. Neurobiological foun-
dations for the theory of harmony in Western tonal music. Ann. N. Y. Acad. Sci. 930:
92–116.
18. BUTLER, J.W. & P.G. DASTON. 1968. Musical consonance as musical preference: a
cross-cultural study. J. Gen. Psychol. 79: 129–142.
19. MAHER, T.F. 1976. “Need for resolution” ratings for harmonic musical intervals: a
comparison between Indians and Canadians. J. Cross Cultural Psychol. 7: 259–276.
20. ZENTNER, M.R. & J. KAGAN. 1996. Perception of music by infants. Nature 383: 29.
21. TRAINOR, L.J. & B.M. HEINMILLER. 1998. The development of evaluative responses to
music: infants prefer to listen to consonance over dissonance. Infant Behav. Dev. 21:
77–88.
22. TRAINOR, L.J., C.D. TSANG & V.H.W. CHEUNG. 2002. Preference for sensory conso-
nance in two- and four-month-old infants. Mus. Percept. 20: 187–194.
23. SEIDEN, H.R. 1958. Auditory acuity of the marmoset monkey (Hapale jacchus).
Unpublished doctoral dissertation, Princeton University.
24. BENDOR, D. & X. WANG. 2005. The neuronal representation of pitch in primate audi-
tory cortex. Nature 436: 1161–1165.
25. IZUMI, A. 2000. Japanese monkeys perceive sensory consonance of chords. J. Acoust.
Soc. Am. 108: 3073–3078.
26. MCDERMOTT, J. & M.D. HAUSER. 2004. Are consonant intervals music to their ears?
Spontaneous acoustic preferences in a nonhuman primate. Cognition 94: B11–21.
27. WATANABE, S. & M. NEMOTO. 1998. Reinforcing property of music in Java sparrows
(Padda oryzivora). Behav. Processes 43: 211–218.
28. HALPERN, D.L., R. BLAKE & J. HILLENBRAND. 1986. Psychoacoustics of a chilling
sound. Percept. Psychophys. 39: 77–80.
29. FITCH, W.T. 2005. The evolution of music in comparative perspective. Ann. N. Y.
Acad. Sci. 1060: 29–49.
PSYCHOLOGICAL SCIENCE
Research Article
2
62
Copyright © 2003 American Psychological Society
VOL. 14, NO. 3, MAY 2003
GOOD PITCH MEMORY IS WIDESPREAD
E. Glenn Schellenberg and Sandra E. Trehub
University of Toronto at Mississauga, Mississauga, Ontario, Canada
Abstract—
Here we show that good pitch memory is widespread among
adults with no musical training. We tested unselected college students on
their memory for the pitch level of instrumental soundtracks from familia
r
television programs. Participants heard 5-s excerpts either at the original
pitch level or shifted upward or downward by 1 or 2 semitones. They suc-
cessfully identified the original pitch levels. Other participants who heard
comparable excerpts from unfamiliar recordings could not do so. These
findings reveal that ordinary listeners retain fine-grained information
about pitch level over extended periods. Adults’ reportedly poor memory
for pitch is likely to be a by-product of their inability to name isolated
pitches.
Absolute pitch
(AP; also called
perfect pitch
) is often viewed as a
marker of musical giftedness (Takeuchi & Hulse, 1993; Ward, 1999)
,
with an estimated incidence of 1 in 10,000. AP refers to the ability to
identify or produce isolated tones in the absence of contextual cues
or
reference pitches. Upon awakening, for example, AP possessors can
label or sing middle C (262 Hz) or concert A (440 Hz). In other words,
they have long-term memory for musically relevant pitches, and they
remember those pitches by name (Levitin, 1994). AP is thought to dif-
fer from other human abilities in its bimodal distribution (Takeuchi &
Hulse, 1993): Either you have it or you do not. For people who do not,
memory for isolated pitches is thought to fade quickly with the pas-
sage of time (Burns, 1999). According to Krumhansl (2000), “pitch
memory is approximately equal for possessors and nonpossessors of
AP for delays up to one minute, but only AP possessors perform above
chance for longer delays” (p. 167). AP possessors do not differ from
other musicians in their memory for tone frequencies that are musi-
cally irrelevant (e.g., tones outside the musical range, mistuned tones),
nor do they differ in their ability to discriminate pitches or in mos
t
other musical abilities (Ward, 1999). In short, the uniqueness of AP
possessors is restricted to their rapid and effortless identification and
production of isolated tones.
AP is found almost exclusively among individuals who began music
lessons in early childhood (Takeuchi & Hulse, 1993), which implies a
critical period
for its acquisition. In one large sample of musicians, 40%
of those who began musical training before 4 years of age had AP, com-
pared with 27% who began between ages 4 and 6 years, and 8% who be-
gan between ages 6 and 9 years (Baharloo, Johnston, Service, Gitschier,
& Freimer, 1998). Although early training is the best predictor of AP, it
does not guarantee AP. Genetic factors also make important contributions.
For example, individuals with AP are considerably more likely than those
without AP to have siblings with AP, even when amount of musical train-
ing and age of onset are taken into account (Baharloo, Service, Risch,
Gitschier, & Freimer, 2000).
For normally developing children, relative pitch processing is
thought to replace absolute pitch processing during the preschool years
(Saffran & Griepentrog, 2001; Takeuchi & Hulse, 1993), with only a
small minority (i.e., AP possessors) retaining both modes of process-
ing. Relative pitch processing—a widespread skill—lies at the heart of
music and its appreciation. For example, identifying a familiar tune
(e.g., “The Star Spangled Banner”), whether it is performed at a high
pitch level (e.g., sung by a soprano, played on a piccolo) or at a low
pitch level (e.g., sung by a baritone, played on a tuba), depends on the
listener’s knowledge of pitch relations. Whereas non-AP musicians
share AP possessors’ explicit knowledge of musical note names and
pitch intervals (i.e., relations between musical notes), they do not
share AP possessors’ accurate memory for individual pitches (Ben-
guerel & Westdal, 1991). Nevertheless, given one musical tone, such
as C, non-AP musicians can use their knowledge of intervals to iden-
tify or generate other musical tones, such as F or G (5 or 7 semitones
from C). AP possessors tend to approach such tasks on a tone-by-tone
basis, reflecting their bias for absolute over relative processing. As a
result, they name intervals more slowly and less accurately than do
non-AP musicians, which implicates AP as a nonmusical mode of pro-
cessing (Miyazaki, 1995).
In contrast to musicians with or without AP, nonmusicians cannot
name any musical intervals or tones. Nonetheless, they can identify fa-
miliar melodies presented at novel pitch levels, and they notice when
such melodies are performed incorrectly (Drayna, Manichaikul, de
Lange, Snieder, & Spector, 2001), which confirms the accuracy of
their implicit memory for pitch relations. There is speculation that the
higher-than-usual incidence of AP in autistic and developmentally de-
layed populations (Heaton, Hermelin, & Pring, 1998; Heaton, Pring,
& Hermelin, 1999; Lenhoff, Perales, & Hickok, 2001a, 2001b; Mot-
tron, Peretz, Belleville, & Rouleau, 1999; Young & Nettlebeck, 1995)
stems from deficient relational processing (Ward, 1999). These atypi-
cally developing individuals may fail to generalize song-defining pitch
relations across pitch levels (e.g., the first four tones of “Twinkle
Twinkle Little Star” can be CCGG, DDAA, EEBB, and so on, with the
last two tones being 7 semitones higher than the first two).
Our goal in the present investigation was to demystify the phenom-
enon of AP by documenting adults’ memory for pitch under ecologi-
cally valid conditions. We hypothesized that the reportedly poor pitch
memory of ordinary adults is an artifact of conventional test proce-
dures, which involve isolated tones and pitch-naming tasks. Isolated
tones are musically meaningless to all but AP possessors, and pitch
naming necessarily excludes individuals without musical training.
Much recent research focuses on knowledge acquired without explicit
awareness (e.g., Goshen-Gottstein, Moscovitch, & Melo, 2000; Reber
& Allen, 2000; Tillman, Bharucha, & Bigand, 2000). Thus, the ab-
sence of explicit memory for pitch level does not preclude relevant im-
plicit knowledge. We also expected that implicit memory for pitch,
like most other human abilities, would be normally distributed rather
than bimodally distributed.
Previous indications that nonmusicians retain in memory some
sensory attributes of music arise from studies that have included
meaningful test materials (Bergeson & Trehub, 2002; Halpern, 1989;
Levitin & Cook, 1996; Palmer, Jungers, & Jusczyk, 2001; Schellenberg,
Address correspondence to Glenn Schellenberg, Department of Psychol-
ogy, University of Toronto at Mississauga, Mississauga, ON, Canada L5L 1C6;
e-mail: g.schellenberg@utoronto.ca.
PSYCHOLOGICAL SCIENCE
E. Glenn Schellenberg and Sandra E. Trehub
VOL. 14, NO. 3, MAY 2003
263
Iverson, & McKinnon, 1999). For example, college students with lim-
ited musical training can identify familiar recordings of popular songs
(i.e., songs heard previously at the same pitch level, tempo, and tim-
bre) from excerpts as short as 100 ms (Schellenberg et al., 1999). Such
brief excerpts preclude the use of relational cues, forcing listeners to
rely on absolute features from the overall timbre or frequency spec-
trum. When adults sing hit songs from recordings heard repeatedly, al-
most two thirds of these productions are within 2 semitones of the
recorded versions (Levitin, 1994), and their tempo (speed) is within
8% of the originals (Levitin & Cook, 1996). Adults show similar con-
sistency in pitch level and tempo when they sing familiar songs from
the folk repertoire (e.g., “Yankee Doodle”) on different occasions,
even though they would have heard these songs at several pitch levels
and tempi (Bergeson & Trehub, 2002; Halpern, 1989).
Although the song-production data (Bergeson & Trehub, 2002;
Halpern, 1989; Levitin, 1994) imply accurate pitch memory, the con-
tributions of cognitive and motor factors are inseparable in these stud-
ies. For example, movement patterns associated with song production
(i.e., motor memory) may be implicated. Moreover, the limited pitch
range of musically untrained individuals may generate pitch consis-
tency that has little to do with memory. Nonetheless, the findings
highlight the potential of familiar materials to reveal nonmusicians’
memory for acoustic features.
We tested memory for the pitch level of musical recordings heard
frequently at one pitch level only. We expected that contextually rich
materials would reveal the generality of long-term memory for pitch
and the normal distribution of this ability. In Experiment 1, adult lis-
teners heard excerpts from highly familiar recordings. On each trial,
the same instrumental excerpt was presented twice, once at the origi-
nal pitch level and once shifted upward or downward in pitch by 1 or 2
semitones. Participants attempted to identify which excerpt (the first
or the second) was presented at the correct pitch level, that is, the only
pitch level at which they had heard the recording previously. Experi-
ment 2 was identical except that a different group of listeners made
judgments about unfamiliar recordings that were pitch-shifted by 2
semitones. In other words, it was a “control” experiment designed to
ascertain whether factors other than pitch memory (e.g., the audio ma-
nipulation, composers’ use of particular keys) contribute to successful
identification.
EXPERIMENT
1
Method
Participants
The participants in Experiment 1 were 48 college students. Re-
cruitment was limited to students familiar with the six television pro-
grams from which the stimuli were excerpted. The skewed distribution
of musical training (i.e., years of music lessons) was typical of college
populations, with a mean of 5.1, a median of 3, and a mode of zero.
None of the participants reported having AP.
Stimuli and apparatus
The recordings were instrumental excerpts from six popular tele-
vision programs: “E.R.,” “Friends,” “Jeopardy,” “Law & Order,”
“The Simpsons,” and “X-Files” (keys of B minor, A major, E-flat ma-
jor, G minor, C-sharp major, and A minor, respectively). Each recording
had multiple instruments, each with multiple pure-tone components. The
selection criteria were as follows: (a) popularity with undergraduates, as
estimated in a pilot study, and (b) a musical theme with at least 5 s of in-
strumental music. The theme music was saved as CD-quality sound files
on an iMac computer. For five of the six programs, the 5-s instrumental
excerpt was from the beginning of the program. For “Jeopardy,” the ex-
cerpt was from Final Jeopardy. In all cases, the excerpt was selected to be
maximally representative of the overall recording.
The excerpts were shifted in pitch by 1 or 2 semitones with Pro
Tools (DigiDesign) digital-editing software, which is used commonly
in professional recording studios.
1
Pitch shifting had no discernible ef-
fect on tempo (speed) or overall sound quality. Within each semitone
condition, the “incorrect” excerpt for a given musical selection was al-
ways shifted in one direction (upward for three, downward for three),
to eliminate the option of selecting the middle pitch level and to en-
sure that correct and incorrect excerpts were presented equally often;
the participants were divided into two equal groups, and the direction
of pitch shifts was reversed for the two groups. Pitch shifts involved
multiplying (for upward shifts) or dividing (for downward shifts) all
frequencies in an excerpt by a factor of 1.12 for 2-semitone shifts and
1.06 for 1-semitone shifts. For example, a 2-semitone upward shift in-
volved a change from 262 Hz to 294 Hz.
To eliminate potential cues from the electronic manipulation, we
also shifted the pitch level of the correct excerpts. The original excerpts
were shifted upward and then downward by 1 semitone (all frequen-
cies multiplied and subsequently divided by 1.06) in the 2-semitone
condition and by half a semitone (frequencies multiplied and divided
by 1.03) in the 1-semitone condition. The monaural excerpts were pre-
sented binaurally over lightweight headphones while participants sat
in a sound-attenuating booth. (Sample stimuli are available on the
Web at www.erin.utoronto.ca/~w3psygs.)
Procedure
Participants were tested in two test sessions on different days no
more than 1 week apart. The incorrect excerpts were shifted by 2 semi-
tones in one session and by 1 semitone in the other, with order of ses-
sions counterbalanced. The 2-semitone pitch shifts were orthogonal to
the 1-semitone shifts, such that the direction of shift was reversed for
half of the excerpts across sessions. Each session consisted of five
blocks of six trials. Each block had one trial for each excerpt, with trials
presented in random order. The first block served as a practice block. On
each trial, listeners heard one version of a 5-s excerpt at the original
pitch level and another version at the altered (upward or downward)
pitch, with the two excerpts separated by 2 s. Order (original-altered or
altered-original) was counterbalanced. Participants were told that they
would hear two versions of the same theme song on each trial, with one
version at the correct pitch and the other version shifted higher or lower.
Their task was to identify the excerpt (first or second) at the correct (i.e.,
usual) pitch level. They received no feedback for correct or incorrect re-
sponses. Participants also completed a brief questionnaire about their
musical background, and they provided cumulative viewing estimates
for each program (i.e., lifetime viewing estimates).
1. A free version of the software (Pro Tools Free) that includes the pitch-shift-
ing function can be downloaded from the Internet (http://www.digidesign. com).
PSYCHOLOGICAL SCIENCE
Good Pitch Memory Is Widespread
264
VOL. 14, NO. 3, MAY 2003
Results
The outcome measure was the percentage of correct responses. Be-
cause order of presentation (1- or 2-semitone change first) and stimulus
set (i.e., excerpts shifted upward or downward) did not affect performance
or interact with other variables, they were excluded from further consider-
ation. Performance exceeded chance levels (50% correct) for the 1-semi-
tone comparisons (58% correct)
,
t
(47)
�
4.00,
p
�
.001, and for the 2-
semitone comparisons (70% correct),
t
(47)
�
9.40,
p
�
.001, with supe-
rior performance on the larger shifts,
t
(47)
�
4.46,
p
�
.001 (see Fig. 1).
(This finding was replicated with different listeners and a slightly different
task: yes/no judgments for single excerpts rather than selection of one of
two alternatives. Performance remained significantly above chance and
commensurate with the levels in the main study reported here.) Perfor-
mance on the first trial of each excerpt significantly exceeded perfor-
mance on subsequent trials, which implies that increasing exposure to
pitch-shifted excerpts interfered with memory for the original pitch level.
For subsequent analyses, performance was calculated across the
1- and 2-semitone conditions. As can be seen in Figure 2, the fre-
quency distribution for performance accuracy approximated a normal
curve. Performance was far from perfect, but it was remarkably con-
sistent, with only 3 of 48 participants performing below 50% correct
(binomial test,
p
�
.001). Performance was not significantly corre-
lated with musical training,
r
�
�
.242,
p
�
.952 (one-tailed).
Differences in performance among the six excerpts were examined
with a one-way repeated measures analysis of variance. The analysis re-
vealed that some musical excerpts were identified better than others,
F
(5,
235)
�
5.59,
p
�
.001, with performance at 60% correct or below for
some excerpts (“The Simpsons”—57%; “E.R.”—60%) and above 70% for
others (“Friends”—71%, “X-Files”—71%). Pair-wise comparisons re-
vealed better performance for “Friends” and “X-Files” than for the other
four excerpts,
p
s
�
.03, but no differences between other pairs of excerpts.
For all six excerpts, performance exceeded chance levels,
p
s
�
.03.
Lifetime-viewing estimates for the TV programs are summarized in
Table 1. For each program, the distribution of estimates was positively
skewed because some individual estimates for particular programs
were extremely high. (For example, “Friends” and “The Simpsons” are
broadcast several times daily.) We evaluated the possibility that viewing
estimates for a particular TV program predicted performance for that
excerpt better than for the other five excerpts. Although the six within-
program correlations (e.g., exposure to “X-Files” and pitch memory for
“X-Files”) were low (highest
r
�
.372,
p
�
.005, for “The Simpsons”;
lowest
r
�
.074,
p
�
.5, for “Law & Order”), they were significantly
higher than the 30 cross-program correlations (e.g., exposure to “X-Files”
and pitch memory for “Friends”),
p
�
.010 (Mann-Whitney test).
EXPERIMENT 2
We conducted a further experiment to rule out the possibility that
the results of Experiment 1 were due to singularly appropriate pitch
levels for the original excerpts or to obscure electronic cues from the
pitch-shifting manipulation. If either of these factors accounted for
adults’ ability to identify the familiar recordings in Experiment 1, then
similar findings should be obtained with unfamiliar recordings.
Fig. 1. Performance on excerpts pitch-shifted by 1 or 2 semitones
(chance � 50%). The excerpts were familiar in Experiment 1 and un-
familiar in Experiment 2. Error bars represent standard errors.
Fig. 2. Distribution of performance levels in Experiment 1 collapsed
across 1- and 2-semitone pitch shifts. The distribution, which is cen-
tered markedly to the right of chance levels (50% correct), approxi-
mates the normal curve (mean � 64%, median � 65%). Scores at the
boundary between two categories were grouped in the higher category.
Table 1.
Mean estimates of total number of episodes viewed
for the television programs in Experiment 1
Program Viewing estimate
E.R. 55 (114)
Friends 337 (631)
Jeopardy 370 (1,137)
Law & Order 319 (824)
The Simpsons 1,094 (2,258)
X-Files 127 (226)
Note.
Standard deviations are in parentheses.
PSYCHOLOGICAL SCIENCE
E. Glenn Schellenberg and Sandra E. Trehub
VOL. 14, NO. 3, MAY 2003
26
5
Method
The participants were 48 college students who did not take part in
Experiment 1. Recruitment was limited to students who were familiar
with the six programs from the previous experiment. The method was
the same as in Experiment 1 with two exceptions: (a) Musical excerpts
were taken from unfamiliar recordings, and (b) there was a single test
session in which excerpts at the original pitch level were paired with
excerpts pitch-shifted upward or downward by 2 semitones. Partici-
pants were told that they would hear a series of trials in which a musi-
cal excerpt would be played twice, once at the original pitch level and
once shifted upward or downward in pitch. Their task was to identify
the correct, unshifted excerpt (first or second).
In two cases, we replaced the familiar recording from Experiment 1
with an unfamiliar recording by the same composer: The theme from
“Silk Stalkings” (in C minor), an HBO police drama from the 1980s, re-
placed the theme from “Law & Order”; and the theme from “Gremlins”
(E-flat major), a 1984 film, replaced the theme from “The Simpsons.” In
the other instances, the unfamiliar recordings retained the style and in-
strumentation of the original. A pop song, “Circle of Friends” (A major,
by Better Than Ezra), replaced music from “Friends”; theme music from
“Match Game” (C major), a game show from the 1980s, replaced the
“Jeopardy” theme; music from Ninja Gaiden (B-flat minor), a Nintendo
video game, replaced the “E.R.” theme; and music from Tenchu Stealth
Assassin (E-flat minor), from SONY PlayStation, replaced the theme
from “X-Files.”
Results
Overall performance did not differ from chance levels (49% cor-
rect) and was significantly poorer than performance in the 2-semitone
condition of Experiment 1,
t
(94)
�
8.18,
p
�
.001 (see Fig. 1). Perfor-
mance ranged from 40.1% to 55.7% correct across excerpts but did
not exceed chance levels for any excerpt (
p
s
�
.2). Clearly, the pitch-
shifting procedure did not generate cues that enabled listeners to dis-
tinguish unfamiliar, pitch-shifted excerpts from the original versions.
Moreover, there was no indication that any pitch level or key, includ-
ing common keys (e.g., C major, A major), was considered more ap-
propriate than any other.
DISCUSSION
Our results provide unequivocal evidence that adults with little mu-
sical training remember the pitch level of familiar instrumental record-
ings, as reflected in their ability to distinguish the correct version from
versions shifted upward or downward by 1 or 2 semitones. Their fail-
ure to identify the correct pitch level of unfamiliar musical recordings
rules out contributions from potential artifacts of the pitch-shifting
process. Long-term memory for pitch that permits successful identifi-
cation of 1-semitone alterations is especially interesting for two rea-
sons. First, musicians with AP often make 1-semitone errors (Lockhead
& Byrd, 1981; Miyazaki, 1988), which raises the possibility that ordi-
nary adults’ memory for the pitch level of highly familiar music is
similar to AP possessors’ memory for isolated pitches. Second, 1 semi-
tone is the smallest meaningful difference in Western music, as well as
the smallest difference specified in standard musical notation. Per-
formers may use smaller pitch deviations for expressive purposes, but
no musical culture makes systematic use of intervals smaller than a
semitone (Burns, 1999).
Thus, contrary to scholarly wisdom, adults with little musical back-
ground retain fine-grained information about pitch level over extended
periods. This finding advances the case that music listeners construct
precise memory representations of music that include absolute as well
as relational features (Dowling, 1999). It also demystifies aspects of AP
such as its rarity, its bimodal distribution, and the reported critical pe-
riod for AP acquisition. Once pitch-naming or reproduction require-
ments are eliminated and familiar materials are used, memory for
specific pitch levels seems to be widespread and normally distributed. It
is likely that pitch naming rather than pitch memory underlies much
about AP, including its apparent bimodal distribution. Pitch naming may
be an all-or-none ability, but pitch memory is not. Similarly, the unique
pattern of cortical activity in AP possessors (Hirata, Kuriki, & Pantev,
1999; Ohnishi et al., 2001) may reflect distinctive auditory-verbal associa-
tions—a consequence of naming—rather than distinctive pitch process-
ing (Zatorre, Perry, Beckett, Westbury, & Evans, 1998).
Although our findings are consistent with some previous accounts,
they are notable for demonstrating considerably greater accuracy in
pitch memory. For example, Levitin (1994) reported that 44% of
adults’ sung performances were within 2 semitones of the original re-
cording on both of two test trials. He quantized responses to the near-
est semitone, however, which means that 44% of his participants were
within 2.5 semitones of the original pitch. He also ignored pitch height
(e.g., Cs in different octaves were considered equivalent), such that
performance could deviate from the original by no more than 6 semi-
tones. In effect, more than half of his participants were performing at
chance levels (deviations of 3 semitones or more) on at least one of
two trials. Our more sensitive measure of pitch memory avoids these
and other limitations of production tasks.
If repeated exposure to a recording enhances memory for its pitch
level, what accounts for the weak associations between exposure and
pitch memory? First, self-reports of television viewing may be inaccu-
rate. Second, individual differences associated with amount of television
viewing (e.g., motivation) would complicate matters, as would factors
that influence performance on standard AP tasks, such as timbre, pitch
register, and pitch class (Takeuchi & Hulse, 1993). The complex associ-
ation between exposure and pitch memory is exemplified by results for
“The Simpsons,” which had greater exposure than any other program
(Table 1), the strongest association between accuracy and individual dif-
ferences in exposure, and the poorest overall levels of performance.
“The Simpsons” differs from the other familiar programs in that its
theme music incorporates upward and downward pitch shifts (transposi-
tions) in 2-semitone steps, which could obscure differences between the
original pitch and shifts of 2 semitones or less.
We are not suggesting that the pitch-perception skills of typical college
students are equivalent to those of musicians with or without AP. In gen-
eral, pitch memory and the perception of pitch relations are a function of
musical experience (Krumhansl, 2000). Musical training results in en-
hanced representation of musical features, which is reflected not only in
superior performance on tests of explicit musical knowledge (e.g., naming
tones, intervals, or keys; playing an instrument), but also in differential
neural processing (Ohnishi et al., 2001; Pantev et al., 1998; Schneider et
al., 2002). For tasks that do not depend on explicit knowledge, however,
nonmusicians’ abilities are surprisingly similar to those of musicians. As
unfamiliar melodies unfold, trained and untrained listeners have similar
expectancies about which tones will follow (Schellenberg, 1996), as do
listeners of different ages (Schellenberg, Adachi, Purdy, & McKinnon,
2002). Our measure of memory for pitch level appears to be another test
of implicit musical knowledge that is unrelated to musical training.
PSYCHOLOGICAL SCIENCE
Good Pitch Memory Is Widespread
26
6
VOL. 14, NO. 3, MAY 2003
The hypothesized shift from absolute to relative pitch processing in
early childhood (Saffran & Griepentrog, 2001; Takeuchi & Hulse, 1993)
is at odds with our results and with considerable evidence of relative-pitch
processing in infancy (Trehub, 2000). The discrepancy in experimental re-
sults could stem from the divergent procedures used for evaluating pitch
processing in children and adults. Aside from the criteria for AP being ap-
plied less stringently to children than to adults, isolated pure tones—the
stimuli of choice for adults—are used rarely with children. For example,
children’s reproduction of songs at a consistent pitch level is often offered
as evidence of AP (Takeuchi & Hulse, 1993), but similar adult skills
(Bergeson & Trehub, 2002; Halpern, 1989) are regarded as
residual AP
or
pseudo-AP
(Takeuchi & Hulse, 1993; Ward, 1999) rather than genuine
AP. In short, the absolute-to-relative shift in pitch processing may be ex-
aggerated or absent altogether.
On the one hand, adults outperform children on relative-pitch tasks
(Schellenberg & Trehub, 1996), and adults with AP identify musical
tones more accurately than do children with AP (Miller & Clausen,
1997). On the other hand, preschoolers outperform older children and
adults on the acquisition and retention of labels for specific pitches
(Crozier, 1997). In this respect, preschoolers are more like older autistic
or developmentally delayed individuals whose cognitive inflexibility,
language limitations, or focus on local rather than global details may fa-
cilitate the acquisition of pitch labels (Heaton et al., 1998, 1999). The
prolonged critical period for the acquisition of AP among developmen-
tally delayed children (Lenhoff et al., 2001b) could stem from similar
factors. It is intriguing that the critical period for acquiring AP among
normally developing children (before age 6 or 7) is the optimal age
range for achieving nativelike phonological proficiency in a second lan-
guage (Flege & Fletcher, 1992). The attentional and cognitive profile of
young children may be ideally suited to rote learning, sound reproduc-
tion, and the acquisition of word-object or pitch-name associations.
Why some children with early musical training acquire AP, as usually
defined, and others do not may stem from genetic variations in associa-
tive abilities and from unidentified environmental factors.
In conclusion, adults with little explicit knowledge of music differ
from musicians with AP, who can label isolated tones, and from musi-
cians without AP, who can label isolated intervals. Nonetheless, the
average person has rich representations of familiar music that include
implicit memory for pitch level.
REFERENCES
Baharloo, S., Johnston, P.A., Service, S.K., Gitschier, J., & Freimer, N.B. (1998). Absolute
pitch: An approach for identification of genetic and nongenetic components.
Ameri-
can Journal of Human Genetics
,
62
, 224–231.
Baharloo, S., Service, S.K., Risch, N., Gitschier, J., & Freimer, N.B. (2000). Familial ag-
gregation of absolute pitch.
American Journal of Human Genetics
,
67
, 755–758.
Benguerel, A.-P., & Westdal, C. (1991). Absolute pitch and the perception of sequential
musical intervals.
Music Perception
,
9
, 105–120.
Bergeson, T.R., & Trehub, S.E. (2002). Absolute pitch and tempo in mothers’ songs to in-
fants.
Psychological
Science
,
13
, 72–75.
Burns, E.M. (1999). Intervals, scales, and tuning. In D. Deutsch (Ed.),
The psychology of
music
(2nd ed., pp. 215–264). San Diego, CA: Academic Press.
Crozier, J.B. (1997). Absolute pitch: Practice makes perfect, the earlier the better.
Psychol-
ogy of Music
,
25
, 110–119.
Acknowledgments—This research was supported by the Natural Sciences
and Engineering Research Council of Canada. We thank Keira Stockdale
and Will Huggon for assistance in stimulus preparation and data collection,
and Mari Jones, Morris Moscovitch, Bruce Schneider, Laurel Trainor, Bill
Thompson, and Lawrence Ward for helpful comments on an earlier draft.
Dowling, W.J. (1999). Development of music perception and cognition. In D. Deutsch (Ed.),
The psychology of music
(2nd ed., pp. 603–625). San Diego, CA: Academic Press.
Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001). Genetic
correlates of musical pitch recognition in humans.
Science
,
2
91
, 1969–1972.
Flege, J.E., & Fletcher, K.L. (1992). Talker and listener effects on degree of perceived for-
eign accent.
Journal of the Acoustical Society of America
,
91
, 370–389.
Goshen-Gottstein, Y., Moscovitch, M., & Melo, B. (2000). Intact implicit memory for
newly formed verbal associations in amnesic patients following single study trials.
Neuropsychology
,
14
, 570–578.
Halpern, A.R. (1989). Memory for the absolute pitch of familiar songs.
Memory & Cogni-
tion
,
17
, 572–581.
Heaton, P., Hermelin, B., & Pring, L. (1998). Autism and pitch processing: A precursor for
savant musical ability?
Music Perception
,
15
, 291–305.
Heaton, P., Pring, L., & Hermelin, B. (1999). A pseudo-savant: A case of exceptional mu-
sical splinter skills.
Neurocase
,
5
, 503–509.
Hirata, Y., Kuriki, S., & Pantev, C. (1999). Musicians with absolute pitch show distinct
neural activities in the auditory cortex.
NeuroReport
,
10
, 999–1002.
Krumhansl, C.L. (2000). Rhythm and pitch in music cognition.
Psychological Bulletin
,
126
, 159–179.
Lenhoff, H.M., Perales, O., & Hickok, G. (2001a). Absolute pitch in Williams syndrome.
Music Perception
,
18
, 491–503.
Lenhoff, H.M., Perales, O., & Hickok, G. (2001b). Preservation of a normally transient critical
period in a cognitively impaired population: Window of opportunity for acquiring abso-
lute pitch in Williams syndrome. In C.A. Shaw & J.C. McEachern (Eds.),
Toward a the-
ory of neuroplasticity
(pp. 275–287). Philadelphia: Psychology Press.
Levitin, D.J. (1994). Absolute memory for musical pitch: Evidence from the production of
learned melodies.
Perception & Psychophysics
,
56
, 414–423.
Levitin, D.J., & Cook, P.R. (1996). Memory for musical tempo: Additional evidence that
auditory memory is absolute.
Perception & Psychophysics
,
58
, 927–935.
Lockhead, G.R., & Byrd, R. (1981). Practically perfect pitch.
Journal of the Acoustical So-
ciety of America
,
70
, 387–389.
Miller, L.K., & Clausen, H. (1997). Pitch identification in children and adults: Naming and
discrimination.
Psychology of Music
,
25
, 4–17.
Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors.
Perception
& Psychophysics
,
44
, 501–512.
Miyazaki, K. (1995). Perception of relative pitch with different references: Some absolute-
pitch listeners can’t tell musical interval names.
Perception & Psychophysics
,
57
,
962–970.
Mottron, L., Peretz, I., Belleville, S., & Rouleau, N. (1999). Absolute pitch in autism: A
case study.
Neurocase
,
5
, 485–501.
Ohnishi, T., Matsuda, H., Asada, T., Aruga, M., Hirakata, M., Nishikawa, M., Katoh, A., &
Imabayashi, E. (2001). Functional anatomy of musical perception in musicians.
Ce-
rebral Cortex
,
11
, 754–760.
Palmer, C., Jungers, M.K., & Jusczyk, P.W. (2001). Episodic memory for musical prosody.
Journal of Memory and Language
,
45
, 526–545.
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L.E., & Hoke, M. (1998). In-
creased auditory cortical representation in musicians.
Nature
,
392
, 811–814.
Reber, A.S., & Allen, R. (2000). Individual differences in implicit learning: Implication for
the evolution of consciousness. In R.G. Kunzendorf & B. Wallace (Eds.),
Advances
in consciousness research: Vol. 20. Individual differences in conscious experience
(pp. 227–247). Amsterdam: John Benjamins.
Saffran, J.R., & Griepentrog, G.J. (2001). Absolute pitch in infant auditory learning: Evi-
dence for developmental reorganization.
Developmental Psychology
,
37
, 74–85.
Schellenberg, E.G. (1996). Expectancy in melody: Tests of the implication-realization
model.
Cognition
,
58
, 75–125.
Schellenberg, E.G., Adachi, M., Purdy, K.T., & McKinnon, M.C. (2002). Expectancy in
melody: Tests of children and adults.
Journal of Experimental Psychology: General
,
131
, 511–537.
Schellenberg, E.G., Iverson, P., & McKinnon, M.C. (1999). Name that tune: Identifying popu-
lar recordings from brief excerpts.
Psychonomic Bulletin & Review
,
6
, 641–646.
Schellenberg, E.G., & Trehub, S.E. (1996). Children’s discrimination of melodic intervals.
Developmental Psychology
,
32
, 1039–1050.
Schneider, P., Scherg, M., Dosch, H.G., Specht, H.J., Gutschalk, A., & Rupp, A. (2002).
Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of
musicians.
Nature Neuroscience
,
5
, 688–694.
Takeuchi, A.H., & Hulse, S.H. (1993). Absolute pitch.
Psychological Bulletin
,
113
, 345–
361.
Tillman, B., Bharucha, J.J., & Bigand, E. (2000). Implicit learning of tonality: A self-orga-
nizing approach.
Psychological Review
,
107
, 885–913.
Trehub, S.E. (2000). Human processing predispositions and musical universals. In N.L.
Wallin, B. Merker, & S. Brown (Eds.),
The origins of music (pp. 427–448). Cam-
bridge, MA: MIT Press.
Ward, W.D. (1999). Absolute pitch. In D. Deutsch (Ed.), The psychology of music (2nd ed.,
pp. 265–298). San Diego, CA: Academic Press.
Young, R.L., & Nettlebeck, T. (1995). The abilities of a musical savant and his family.
Journal of Autism and Developmental Disorders, 25, 231–248.
Zatorre, R.J., Perry, D.W., Beckett, C.A., Westbury, C.F., & Evans, A.C. (1998). Func-
tional anatomy of musical processing in listeners with absolute and relative pitch.
Proceedings of the National Academy of Sciences, USA, 95, 3172–3177.
(RECEIVED 6/3/02; REVISION ACCEPTED 8/31/02)