Read the following three research articles and complete written response to the readings. Write a page and a half synthesis of the three articles plus 1 discussion question per article.
The following factors will be considered in grading: relevance, accuracy, synthetization of the reading materials, degree to which the responses show understanding/comprehension of the material, and quality of writing.
· Questions must be original, thoughtful and not easily found in the readings.
· Follows APA Rules
· Use proper citations
· Use past tense when discussing the studies (the research was already conducted).
· Avoid the use of the following words: me, you, I, we, prove, proof
· Refer to the articles by their authors (year of publication) (not by the title of the article or the words first, second, or third)
· Do not just summarize the articles. Dig deeper!
Two Factor Model of ASD Symptoms
One of the key factors in determining whether an individual has Autism Spectrum Disorder (ASD) is in their social and communication skills. Individuals who are diagnosed with ASD have delayed joint attention, eye gazing, and other social interactions such as pointing (Swain et al., 2014).
Joint attention is an important social skill to master because it is a building block for developing theory of mind which, helps us to understand other’s perspectives. Korhonen et al. (2014) found that individuals with autism have impaired joint attention. However, some did not show impairment in joint attention, which lead to evidence that suggests there are different trajectories for joint attention. One suggestion as to why Korhonen et al. (2014) found mixed results, is that there is evidence that joint attention may not be directly linked to individuals with ASD since they were unable to find a difference in joint attention between ASD and developmentally delayed (DD) individuals. Another suggestion for the mixed results, is individual interest in the task vary. Research has found that while individualized studies are beneficial in detecting personal potential and abilities, it would be difficult to generalize the study in order to further research to ASD as a whole (Korhonen et al., 2014). In addition to joint attention, atypical gaze shifts is a distinguishing factor in individuals with ASD. Swain et al. (2014) found the main difference between typically developing (TD) and ASD individuals in the first 12 months of life is in gaze shifts. Individuals that were diagnosed with ASD earlier had lower scores on positive affect, joint attention, and gaze shifts, however those diagnosed later differed from typically developing (TD) only in gaze shifts. It is not until 24 months that later onset ASD individuals significantly differ from their TD peers, by displaying lower positive affect and gestures (Swain et al., 2014). These findings may lead to other ASD trajectories.
Another defining characteristic of ASD is the excess of restrictive patterns of interest and repetitive motor movements. These patterns and movements often impaired the individual from completing daily tasks. Like joint attention and gaze shifts, these repetitive movements and patterns of interest have different trajectories (Joseph et al., 2013). Joseph et al. (2013) found that individuals with high cognitive functioning ASD engage in more distinct and specific interests and less in repetitive motor movements than individuals with lower cognitive functioning ASD. Another finding showed that at the age of two, repetitive motor and play patterns were more common than compulsion. By the age of four all these behaviors increased however, repetitive use of specific objects was found to be less frequent in older children than younger children. This finding suggests that the ritualistic behaviors and motor movements may present themselves differently based on the age of the individual (Joseph et al., 2013).
Joseph et al. (2013), Korhornen et al. (2014), and Swain et al. (2014) all defined key characteristics of an ASD individual and explains the different trajectories of each characteristic. The difficulty with the trajectories is that it is specific to each individual, some symptoms may worsen while others remain stable. It is also difficult to generalize finding with small sample sizes (Joseph et al., 2013).
Discussion Questions:
1. Korhonen et al. (2014) did not use preference-based stimuli to look for joint attention and did not separate high- from low-functioning ASD individuals. Do you think that there could be a difference in level of motivation from each group? If so, how do you think this could change the results?
2. Swain et al. (2014) found that early and late onset of ASD did not differ in their social skills scores at the age of 12 months. If we know that their social skills do not differ then, is there another factor that would allow diagnosis of late onset ASD to be diagnosed at an earlier point in development?
3. Joseph et al. (2013) explains that it is difficult to assess the trajectories of ASD with a small sample size however, how do you think that their findings still help advance the research on ASD?
B R I E F R E P O R T
Brief Report: Concurrent Validity of Autism Symptom
Severity Measures
Stephanie S. Reszka • Brian A. Boyd •
Matthew McBee • Kara A. Hume • Samuel L. Odom
Published online: 27 June 2013
� Springer Science+Business Media New York 2013
Abstract The autism spectrum disorder (ASD) diagnos-
tic classifications, according to the DSM-5, include a
severity rating. Several screening and/or diagnostic mea-
sures, such as the autism diagnostic and observation
schedule (ADOS), Childhood Autism Rating Scale (CARS)
and social responsiveness scale (SRS) (teacher and parent
versions), include an assessment of symptom severity. The
purpose of this study was to examine whether symptom
severity and/or diagnostic status of preschool-aged children
with ASD (N = 201) were similarly categorized on these
measures. For half of the sample, children were similarly
classified across the four measures, and scores on most
measures were correlated, with the exception of the ADOS
and SRS-P. While the ADOS, CARS, and SRS are reliable
and valid measures, there is some disagreement between
measures with regard to child classification and the cate-
gorization of autism symptom severity.
Keywords Concurrent validity � Autism � Severity �
Diagnostic classification
Introduction
The proposed changes to the forthcoming diagnostic and
statistical manual of mental disorders, DSM-5 (http://
www.dsm5.org) would include severity criteria for the
autism spectrum disorders (ASD) category. This new cri-
teria would combine autism disorder, Asperger syndrome,
and pervasive developmental disorder—not otherwise
specified (PDD-NOS) into one larger ASD category. As a
result of this collapse, reliable and valid measurement of
autism severity will be even more important in the deter-
mination of services for children with a diagnosis of ASD
(Matson et al. 2012).
Currently, the Childhood Autism Rating Scale (CARS;
Schopler et al. 1986) and Social Responsiveness Scale
(SRS; Constantino 2002) are two commonly used measures
that include a symptom severity estimate. Previously,
higher raw scores on the autism diagnostic and observation
schedule (ADOS; Lord et al. 1999) indicated the presence
of more deficits that are characteristic of individuals with
ASD, suggesting a greater level of impairment, but the raw
scores were not normalized to indicate severity (Gotham
et al. 2009). A recent calibrated severity metric provides
estimations of ASD symptom severity using ADOS scores
(see Gotham et al. 2009). Generally, severity is measured
in several areas for children with ASD: language delay,
cognitive functioning, and behavioral issues (Gotham et al.
2009), however these are not necessarily considered the
core features of ASD. Each of these measures, the CARS,
SRS, and ADOS utilizes slightly different methods of
evaluating the severity of ASD symptoms and have varied
diagnostic cut-offs along the ASD spectrum.
The primary purpose of this study was to examine
whether children’s symptom severity and/or diagnostic
status were similarly categorized across the four measures.
S. S. Reszka (&) � B. A. Boyd
Department of Allied Health, Division of Occupational Science
and Occupational Therapy, University of North Carolina, 321 S.
Columbia Street, Bondurant Hall CB #7122, Chapel Hill,
NC 27599-7122, USA
e-mail: stephanie.reszka@unc.edu
Present Address:
M. McBee
East Tennessee State University, Johnson City, TN, USA
M. McBee � K. A. Hume � S. L. Odom
Frank Porter Graham Child Development Institute, University
of North Carolina, Chapel Hill, NC, USA
123
J Autism Dev Disord (2014) 44:466–470
DOI 10.1007/s10803-013-1879-7
http://www.dsm5.org
http://www.dsm5.org
The two study goals were to examine: (1) the concurrent
validity of the ADOS, CARS, and SRS (parent and teacher
versions) and (2) the categorization of children’s diagnostic
status and symptom severity.
Methods
Data for this study were collected on 201 children as part of a
larger study comparing the efficacy of school-based, com-
prehensive treatment models for preschoolers with ASD.
Data were collected across four states (CO, NC, FL, and
MN), and at the beginning of the school year. For each child,
all measures were collected within a 6-week time window.
Participants
Children
At enrollment, the mean child age was 3.59 years (SD = 0.56,
range 2.24–5.04). Most participating children were male
(83.3 %) and ethnically non-Hispanic (64.6 %). In terms of
racial status, 5.1 % were identified as Asian, 12.1 % were
Black, 78.3 % were White, and 4.0 % were multiracial. To be
eligible for the larger study, each child was required to have a
clinical or school diagnosis of autism, PDD-NOS, or Asper-
ger’s Syndrome, or meet the autism spectrum cut-off score on
the ADOS and Social Communication Questionnaire (SCQ;
Rutter et al. 2003). If the child had an educational label of
developmental delay (DD) instead of ASD, which is consis-
tent with federal and state policy for children in this age range,
then s/he must have met diagnostic criteria on both the ADOS
and SCQ to be eligible for the study. It was not the point of our
study to diagnose children, but rather screen them for potential
eligibility and a DD educational label is reflective of the real-
world heterogeneity when recruiting children through local
school systems. The other study measures included the fol-
lowing: (1) Mullen Scales of Early Learning (Mullen 1995),
which is a measure of children’s cognitive and motor devel-
opment. Trained research staff administered the visual
reception, fine motor, expressive language, and receptive
language subscales to the child. The mean standard score on
the Mullen was 64.40 (N = 193, SD = 19.6, range 49–136).
And (2) Preschool Language Scale, fourth edition (PLS-4;
Zimmerman et al. 2003), which is a measure of children’s
auditory comprehension and expressive communication
skills. The mean standard score on the PLS-4 was 68.23
(N = 198, SD = 68.23, range 50–134).
Parents
Most participating parents were female (88.2 %), non-
Hispanic (66.8 %). Additionally, 5.2 % were identified as
Asian, 13.0 % were black, 78.7 % were white, and 3.1 %
were multiracial. Household annual income ranged from
less than $20,000 (12.8 %) to over $100,000 (26.7 %).
Parents completed the parent version of the SRS (SRS-P).
Teachers
Teachers completed the teacher version of the SRS (
SRS-
T). Participating teachers were almost exclusively female
(98.6 %) and non-Hispanic (83.6 %), and identified them-
selves as white (97.3 %), with the remaining 2.7 % iden-
tifying themselves as black. most held a master’s degree
(56.2 %), while 37 % had a bachelor’s, 2.7 % had an
associate’s, and 4.1 % had a degree above the master’s
level.
Diagnostic and Severity Measures
The measures examined in this study included the ADOS,
CARS, and SRS parent and teacher versions. Both the
ADOS and CARS were administered by trained and reli-
able project staff. The ADOS was administered by a
research-trained and/or research reliable staff member at
each site, and staff across sites met reliability criterion on a
series of CARS training tapes prior to administration.
The ADOS is a semi-structured assessment of children’s
communication, social, and play skills. Module 1 is for
children who are non-verbal or who have a few words.
Module 2 is for children with phrase speech, while Module
3 is intended for children who are verbally fluent. In
accordance with the suggested severity ratings, ADOS
severity scores of 4–5 indicated autism spectrum disorder
and scores from 6 to 10 indicated autism (Gotham et al.
2009). In this sample, 125 children were administered
Module 1 of the ADOS, 57 were administered Module 2,
and 15 were administered Module 3, while 4 children had
missing data for the ADOS.
Using the CARS, the child is rated on 15 subscales
based on observation (during the Mullen administration, in
this case). To ensure consistency in CARS scoring across
study sites and classrooms, the measure was completed
based on observations of children’s behavior during the
structured administration of the Mullen and 15 min of
unstructured time post-Mullen administration. The CARS
includes items on socialization, communication, emotional
response, and sensory issues. Each of the 15 items is rated
on a scale from 0 to 4, with 4 indicating severe impair-
ments. A CARS cutoff raw score of 25.5 was used to
indicate autism spectrum disorder, with raw scores over 30
indicating autism (Chlebowski et al. 2010). The original
CARS was used, as opposed to the newly released CARS2
(Schopler et al. 2010), because the CARS2 only became
publicly available after the study was already underway.
J Autism Dev Disord (2014) 44:466–470 467
123
This study used the original CARS, which is aligned with
the currently available CARS2-ST, for children younger
than 6 years of age.
The SRS is a 65-item rating scale that was completed by
parents and teachers. The SRS provides information about
children’s social functioning including social awareness,
social information processing, social reciprocal communi-
cation, social anxiety/avoidance behaviors, and stereotypic
behavior/restricted interests. Each item is rated on a scale
of 1 (not true) to 4 (almost always true). T-scores (mean of
50, standard deviation of 10) were used in the analyses,
with a T-score of 60–75 indicating mild to moderate
symptoms of ASD, and scores over 75 indicating severe
symptoms. The SRS was normed with T-scores for parent
and teacher versions, with separate norms within each for
child gender. The appropriate scoring norms were used for
each measure, as specified by the SRS manual. The pre-
school version of the SRS was used for children aged
36–47 months, and the standard version was used for
children 48 months and older.
Results
Autism diagnostic and observation schedule scores ranged
from 2 to 10, with a mean of 7.19 (SD = 1.64) suggesting
that children in the sample tended to score in the milder
end of the ASD category, but represented the full range of
severity across the spectrum. The mean score on the CARS
was 33.37 (SD = 7.31) with a range of 15–55.5. Similarly
to the ADOS mean score, the mean score of 33.37 corre-
sponds to the autism category for the CARS. The SRS-
Teacher (SRS-T) version and SRS-Parent (SRS-P) versions
both showed mean scores in the mild to moderate symptom
category (66.27 and 73.70, respectively). Descriptive
information for each measure is available in Table 1.
Question 1: Concurrent Validity at Pretest
The ADOS severity scores were significantly correlated
with the CARS total score (r = 0.432, p \ 0.001) and the
total score on the teacher version of the SRS (r = 0.418,
p \ 0.001). The ADOS severity scores were not signifi-
cantly correlated with scores on the SRS-P (r = 0.088,
p = 0.236). The CARS was significantly correlated with
both versions of the SRS (r = 0.558, p \ 0.001 for the
teacher version; r = 0.292, p \ 0.001 for the parent ver-
sion). The SRS-Teacher and SRS-Parent scores were sig-
nificantly correlated (r = 0.275, p \ 0.001). The
correlation matrix for these measures is shown in Table 2.
Question 2: Categorization of Diagnostic Status/
Severity
Nearly 98 % of the children scored on the spectrum
according to the ADOS. The CARS scores classified
64.7 % of children as being on the spectrum. The SRS-
Teacher and SRS-Parent scores classified 76.6 and 82.1 %
of children as being on the spectrum, respectively. Diag-
nostic classification charts for each measure are available
in Fig. 1.
A summary of children’s diagnostic classifications
across all measures is available in Table 3. Ratings were
collapsed so that a score of 0 indicated that the child did
not score on the autism spectrum, while a score of 1
indicated that a child would score in the autism spectrum
range (mild/moderate/severe autism symptoms). As shown,
for 92 cases (50 % of the sample) children were classified
similarly across all measures. For another 25 cases
(13.59 % of the sample), children were classified similarly
on the ADOS and both versions of the SRS, but not the
CARS. The remaining children scored on the spectrum on
one or more of the measures. Almost 14 % scored on the
spectrum according to the ADOS and both SRS versions,
but not the CARS, followed by 10.33 % on the ADOS and
SRS-Parent only. Another 6.52 % of children scored on the
spectrum on the ADOS, CARS, and SRS-Parent. Approx-
imately 6 % scored on the spectrum on both the ADOS and
SRS-Teacher and another nearly 6 % on the ADOS,
CARS, and SRS-Teacher. Almost 4 % scored on the
spectrum only on the ADOS. Just over 2 % scored on the
spectrum only on the ADOS and CARS. Finally, 1 % of
children scored on the spectrum according to the SRS-
Parent and SRS-Teacher forms only, and 0.54 % scored on
the spectrum only on the SRS-Parent. For approximately
76 % of the sample (140 cases), children were similarly
classified on at least three of the four measures.
Table 1 Descriptives for measures
Measure N Mean (SD) Range
ADOS severity 198 7.19 (1.64) 2.00–10.00
CARS total score 200 33.37 (7.31) 15.00–55.50
SRS-Teacher total score 200 66.27 (9.66) 42.00–90.00
SRS-Parent total score 185 73.70 (14.27) 42.00–111.00
Table 2 Bivariate correlations of measures
ADOS severity
Score
CARS total
score
SRS-
Teacher
CARS total score 0.432 (\.001) – –
SRS-Teacher total
score
0.418 (\.001) 0.558 (\.001) –
SRS-Parent total
score
0.088 (.236) 0.292 (\.001) 0.275 (.001)
p values in parentheses
468 J Autism Dev Disord (2014) 44:466–470
123
Discussion
Generally, children’s severity scores on the measures were
correlated, indicating that the severity of autism symptoms
was rated similarly across all measures, with the exception
of the ADOS and SRS-Parent version. There were mod-
erate to strong correlations between the CARS and all other
measures, and between the SRS-T and all other measures.
The ADOS was moderately correlated with both the CARS
and SRS-T, but not with the SRS-P. Research suggests that
scores on the SRS agree with clinical diagnosis a signifi-
cant portion of the time and the SRS teacher and parent
versions have shown correlations ranging from 0.75 to 0.91
in a clinical sample (Constantino et al. 2003), while this
sample showed a weaker, but still significant, correlation of
0.275. Interestingly, the parent version of the SRS was
correlated, albeit moderately, with all other measures with
the exception of the ADOS. However, the statistical
significance of some of the more modest correlations may
be an artifact of the relatively large sample size used in this
study.
The differences in ADOS and SRS-Parent scores seen in
this study may reflect potential variations in child behaviors
across different contexts; all measures except the SRS-P
were completed in the school context, while the SRS-P
reflects parental views of child behaviors at home. It is
important to consider the context under which these mea-
sures of symptom severity were collected. The parent
measures were not always correlated with measures taken
in the school context by teachers or research staff, and
children may display different behaviors at home than they
would in a classroom or research setting. Thus the context
may be a factor in potential disagreements between par-
ents’ and clinicians’ or practitioners’ interpretations of
symptom severity or autism diagnosis.
For half of the sample, children were similarly classified
across all measures. About three quarters (76 %) of the
sample were similarly classified on at least three of the four
measures. Ratings on the CARS appear to be the most
conservative regarding diagnosis, as only 64.7 % (119
children) were rated as having an ASD diagnosis using the
CARS, while nearly all (98.4 %; 181 out of 184) of the
children were rated as having a diagnosis on the spectrum
according to the ADOS. However the ADOS, along with
the SCQ, was used to determine children’s study eligibility,
and was selected because it is considered a gold-standard
measure for ASD diagnosis.
While the children in this study were between the ages
of 3 and 5, previous research comparing the ADOS and
CARS for diagnosing toddlers with ASD suggests that
there is a significant agreement between the two for diag-
nosing ASD in toddlers, matching clinical judgment
(Ventola et al. 2006). Children in this study tended to have
Fig. 1 Diagnostic classification
pie charts by measure
Table 3 Collapsed summary of diagnostic ratings
ADOS CARS SRS-Teacher SRS-Parent N %
0 0 0 1 1 0.54
0 0 1 1 2 1.09
1 0 0 0 7 3.80
1 0 0 1 19 10.33
1 0 1 0 11 5.98
1 0 1 1 25 13.59
1 1 0 0 4 2.17
1 1 0 1 12 6.52
1 1 1 0 11 5.98
1 1 1 1 92 50.00
184 100.00
17 cases were missing and not included in the analysis. 0 = not
autistic/not on spectrum, 1 = on spectrum/mild autism/severe autism
J Autism Dev Disord (2014) 44:466–470 469
123
mild to moderate symptoms of autism. The CARS is better
at diagnosing children who tend to be lower functioning
than those who are higher functioning (Mayes et al. 2009),
which may explain some of the discrepancy between
CARS classification and the other measures. A newly
released version of the CARS (CARS2-HF) assesses ver-
bally fluent, more high-functioning children, but currently
is only available for children age 6 and older.
The proposed changes to the DSM-5 include severity
criteria for the ASD category, allowing ratings of symp-
toms along ‘‘a continuum from mild to severe rather than a
simple yes or no diagnosis to a specific disorder’’ (APA
2012). Given these changes, measures of symptom severity
may become more critical in autism research and clinical
practice. While the severity measures used in this study
may not match the severity criteria in the proposed DSM-5,
this study is a first step toward examining the agreement, or
lack thereof, of commonly used measures of autism
symptom severity. Additional future studies should exam-
ine the relationships between the current measures of
severity described in this study with the severity classifi-
cations that will be found in the DSM-5.
While there are instruments that can produce reliable
and valid assessments of autism severity available, this
study demonstrates that there is some disagreement among
several of these measures with regard to child classifica-
tions and the categorization of symptom severity. The type
of measure used could affect child classifications, and by
extension, services provided to these children.
Acknowledgments This research was supported by the Institute of
Education Sciences (R324B070219).
References
American Psychiatric Association (2012). DSM-5 proposed criteria for
autism spectrum disorder designed to provide more accurate
diagnosis and treatment. http://www.dsm5.org/Documents/12-03
%20Autism%20Spectrum%20Disorders%20-%20DSM5 . Ac-
cessed 12 Mar 2013.
Chlebowski, C., Green, J., Barton, M., & Fein, D. (2010). Using the
cars to diagnose ASD. Journal of Addiction, 40, 787–799.
Constantino, J. N. (2002). Social Responsiveness Scale (SRS). Los
Angeles:
Western
Psychological Services.
Constantino, J. N., Davis, S. A., Todd, R. D., Schindler, M. K., Gross,
M. M., Brophy, S. L., et al. (2003). Validation of a brief measure
of autistic traits: Comparison of the social responsiveness scale
with the autism diagnostic interview-revised. Journal of Autism
and Developmental Disorders, 33, 427–433.
Gotham, K., Pickles, A., & Lord, C. (2009). Standardizing the ADOS
scores for a measure of severity in autism spectrum disorders.
Journal of Autism and Developmental Disorders, 39, 695–705.
Lord, C., Rutter, M., DiLavore, P., & Risi, S. (1999). Autism
diagnostic observation schedule (ADOS). Los Angeles, CA:
Western Psychological Services.
Matson, J. L., Beighley, J., & Turygin, N. (2012). Autism diagnosis
and screening: Factors to consider in differential diagnosis.
Research in Autism Spectrum Disorders, 6, 19–24.
Mayes, S. D., Calhoun, S. L., Murray, M. J., Morrow, J. D., Yurich,
K. K. L., Mahr, F., et al. (2009). Comparison of scores on the
checklist for autism spectrum disorder, childhood autism rating
scale, and Gilliam Asperger’s disorder scale for children with
low functioning autism, high functioning autism, Asperger’s
disorder, ADHD, and typical development. Journal of Autism
and Developmental Disorders, 39, 1682–1693.
Mullen, E. (1995). The Mullen Scales of early learning. Circle Pines,
MN: American Guidance Service.
Rutter, M., Bailey, A., & Lord, (2003). Social communication Ques-
tionnaire (SCQ).
Los Angeles: Western Psychological Services.
Schopler, E., Reichler, R. J., & Renner, B. R. (1986). The Childhood
Autism Rating Scale (CARS). Los Angeles, CA: Western
Psychological Services.
Schopler, E., Van Bourgondien, M. E., Wellman, G. J., & Love, S. R.
(2010). Childhood autism rating scale, second edition (CARS2).
Los Angeles: Western Psychological Services.
Ventola, P. E., Kleinman, J., Pandey, J., Barton, M., Allen, S., Green,
J., et al. (2006). Agreement among four diagnostic instruments
for autism spectrum disorder in toddlers. Journal of Autism and
Developmental Disorders, 36, 839–847.
Zimmerman, I., Steiner, V., & Pond, R. (2003). Preschool Language
Scale-IV. San Antonio: Psychological Corporation.
470 J Autism Dev Disord (2014) 44:466–470
123
http://www.dsm5.org/Documents/12-03%20Autism%20Spectrum%20Disorders%20-%20DSM5
http://www.dsm5.org/Documents/12-03%20Autism%20Spectrum%20Disorders%20-%20DSM5
Reproduced with permission of the copyright owner. Further reproduction prohibited without
permission.
- c.10803_2013_Article_1879
Brief Report: Concurrent Validity of Autism Symptom Severity Measures
Abstract
Introduction
Methods
Participants
Children
Parents
Teachers
Diagnostic and Severity Measures
Results
Question 1: Concurrent Validity at Pretest
Question 2: Categorization of Diagnostic Status/Severity
Discussion
Acknowledgments
References
https://doi.org/10.1177/1362361318755318
Autism
2019, Vol. 23(2) 468 –476
© The Author(s) 2018
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/1362361318755318
journals.sagepub.com/home/aut
Introduction
The most recent edition of the Diagnostic and Statistical
Manual of Mental Disorders (5th ed.; DSM-5) introduced
substantial revisions to the diagnostic criteria for autism
(American Psychiatric Association, 2013). Key changes
included a shift from triadic to dyadic symptom group-
ings, and a consolidation of previously separate diag-
nostic subcategories (i.e. autistic disorder, Asperger’s
disorder, and pervasive developmental disorder not other-
wise specified) into a single category of autism spectrum
disorder (ASD). These primary changes have received a
great deal of attention from scientific, clinical, and lay
communities, primarily focused on concern about poten-
tial effects on prevalence estimates and service eligibility
(Buxbaum and Baron-Cohen, 2013; Grzadzinski et al.,
2013; Halfon and Kuo, 2013; Volkmar and Reichow,
2013). As a result, a number of studies of sensitivity, spec-
ificity, and diagnostic concordance between DSM-IV and
DSM-5 have been conducted since the draft criteria were
first released (see, for review, Kulage et al., 2014; Smith
et al., 2015). By contrast, the addition of severity level
ratings, an equally significant change to the diagnostic
criteria for ASD, has received little scientific attention.
As noted above, changes to DSM-5 were intended, in
part, to address problems with inter-rater agreement on
DSM-IV subcategories (Lord and Bishop, 2015; Ozonoff,
2012a, 2012b). Moving from three subcategories to a
Factors associated with DSM-5 severity
level ratings for autism spectrum disorder
Micah O Mazurek1 , Frances Lu2, Eric A Macklin2,3
and Benjamin L Handen4
Abstract
The newest edition of the Diagnostic and Statistical Manual of Mental Disorders (5th ed., DSM-5) introduced substantial
changes to the diagnostic criteria for autism spectrum disorder, including new severity level ratings for social
communication and restricted and repetitive behavior domains. The purpose of this study was to evaluate the use of
these new severity ratings and to examine their relation to other measures of severity and clinical features. Participants
included 248 children with autism spectrum disorder who received diagnostic evaluations at one of six Autism Treatment
Network sites. Higher severity ratings in both domains were associated with younger age, lower intelligence quotient,
and greater Autism Diagnostic Observation Schedule–Second Edition domain-specific symptom severity. Greater
restricted and repetitive behavior severity was associated with higher parent-reported stereotyped behaviors. Severity
ratings were not associated with emotional or behavioral problems. The new DSM-5 severity ratings in both domains
were significantly associated with behavioral observations of autism severity but not with measures of other behavioral
or emotional symptoms. However, the strong associations between intelligence quotient and DSM-5 severity ratings
in both domains suggest that clinicians may be including cognitive functioning in their overall determination of severity.
Further research is needed to examine clinician decision-making and interpretation of these specifiers.
Keywords
autism spectrum disorder, diagnosis, DSM-5, need for support, severity level
1University of Virginia, USA
2Massachusetts General Hospital, USA
3Harvard Medical School, USA
4University of Pittsburgh School of Medicine, USA
Corresponding author:
Micah O Mazurek, Curry School of Education, University of Virginia,
417 Emmet Street South, P.O. Box 400267, Charlottesville, VA 22904,
USA.
Email: mazurekm@virginia.edu
755318AUT0010.1177/1362361318755318AutismMazurek et al.
research-article2018
Original Article
https://uk.sagepub.com/en-gb/journals-permissions
https://journals.sagepub.com/home/aut
mailto:mazurekm@virginia.edu
http://crossmark.crossref.org/dialog/?doi=10.1177%2F1362361318755318&domain=pdf&date_stamp=2018-02-01
Mazurek et al. 469
single ASD category with two domain-specific severity
ratings allowed the diagnostic system to retain strong reli-
ability for overall ASD category while allowing for a
multi-dimensional assessment of severity. These new rat-
ings provide a system for documenting an individual’s
symptom severity in the areas of social communication
and restricted and repetitive behavior (RRB). Each domain
receives a rating of 1 (requiring support), 2 (requiring sub-
stantial support), or 3 (requiring very substantial support)
(American Psychiatric Association, 2013). Additional text
explanation and some examples are provided for each
level, but a lack of clear-cut operational definitions means
that rating determinations remain somewhat subjective. As
described in a recent review (Mehling and Tassé, 2016), it
is not clear how clinicians will make these determinations
or whether their ratings will reflect symptom severity
alone or be influenced by other indices of impairment
(such as cognitive functioning) or co-occurring symptoms
(such as internalizing symptoms or challenging behavior).
Symptom-related functional impact is an important ele-
ment of overall severity of psychopathology. The inclusion
of this dimensional coding of symptom-specific severity
has intuitive appeal in that it may help guide treatment
planning and allow for examination of an individual’s pro-
gress over time within a particular symptom domain.
However, to date, the new severity scales have not been
empirically validated against other indicators of severity.
As a result, clinicians have little overt guidance in making
these determinations. To our knowledge, there has been no
published study examining how these severity ratings will
be used in clinical practice, how they relate to other meas-
ures of symptom severity, or how they relate to other child
characteristics. This information is necessary for interpret-
ing the clinical significance, validity, and utility of these
ratings. The relationship between severity ratings and cog-
nitive functioning may be particularly important to exam-
ine, as intellectual impairment may be conflated with
overall severity. In addition, determining whether ASD
severity ratings are distinct from measures of general emo-
tional and behavioral symptomatology will be important
for evaluating discriminant validity.
Current study
The purpose of this study was to evaluate the DSM-5
severity level ratings in a large sample of children and
adolescents with ASD. Our primary aims were to (1)
describe the distribution of DSM-5 defined social com-
munication and RRB severity ratings across our sample,
(2) assess the relationship between DSM-5 severity rat-
ings and a standardized measure of ASD severity, and (3)
assess the relationship between DSM-5 severity ratings
and other clinical features, particularly cognitive and
behavioral functioning.
Methods
Participants and procedures
Participants consisted of 248 children and adolescents
(ages 2–17 years, M = 6.4 years, SD = 4.0 years) with ASD
enrolled in a larger study focused on DSM-5 criteria for
ASD (Mazurek et al., 2017). All children received a com-
prehensive diagnostic evaluation for autism at one of six
Autism Treatment Network (ATN) sites: Children’s
Hospital Los Angeles, Cincinnati Children’s Hospital
Medical Center, Nationwide Children’s Hospital,
University of Missouri, University of Pittsburgh Medical
Center, and Vanderbilt University Medical Center. Each
clinical diagnostic assessment was conducted in accord-
ance with the standard ATN diagnostic process and
included a review of records, a non-standardized diagnos-
tic clinical interview, standardized observation using the
Autism Diagnostic Observation Schedule–Second Edition
(ADOS-2), cognitive assessment, and assessment of
behavioral functioning. Additional measures were included
when necessary on a case-by-case basis to further inform
diagnostic determination. In total, 52% of the participants
were assessed by a psychologist, 5.7% were assessed by a
physician (i.e. developmental behavioral pediatrician, neu-
rologist, pediatrician, or psychiatrist), and 42.3% were
assessed by an interdisciplinary team (all teams included a
psychologist and/or physician).
The study was approved by the Institutional Review
Board at the clinical and data coordinating center at
Massachusetts General Hospital and at each clinical site,
and informed written consent from each family was
obtained prior to participation. Families whose children
were between the ages of 2 and 17 years 11 months and
who were seen for an autism diagnostic evaluation were
recruited for participation. Recruitment and enrollment
continued until the target sample size was met. Only those
meeting DSM-5 criteria for ASD were included in this
study. Most children were male (82%) and Caucasian
(76%), and most primary caregivers had received some
post-secondary education (66%).
Measures
Demographics. Primary caregivers completed a demo-
graphic questionnaire to report child age, sex, ethnicity,
race, caregiver education level, and household income.
Autism symptom severity. The ADOS-2 (Lord et al., 2012)
is a standardized diagnostic observational tool that assesses
communicative behavior, social interaction skills, and
repetitive behaviors and restricted interests. The ADOS-2
was administered at all sites by assessors with extensive
experience and formal training on administration and scor-
ing of the measure. The ADOS-2 comprises five different
470 Autism 23(2)
modules, one of which is selected for administration based
on the child’s age and verbal ability. A continuous 10-point
metric, the ADOS-2 calibrated severity score (CSS), has
been developed as a measure of overall autism symptom
severity (Esler et al., 2015; Gotham et al., 2009; Hus and
Lord, 2014). The CSS was standardized to account for
individual differences in age and language level. Higher
CSS scores indicate greater symptom severity. Separate
scores were calculated by domain: the social affect cali-
brated severity score (SA-CSS) and the restricted and
repetitive behavior calibrated severity score (RRB-CSS).
Each domain score represents a continuous 10-point score
that accounts for individual differences in age and lan-
guage (Hus et al., 2014).
Two subscales from the Aberrant Behavior Checklist
(ABC) (Aman and Singh, 1986) were included to assess
parent-reported severity in social and repetitive behav-
ior domains. The ABC is a 58-item caregiver-report
questionnaire that measures current behavioral func-
tioning across five empirically derived subscales. For
the purpose of this study, the Social Withdrawal sub-
scale (comprising 16 items assessing social isolation,
withdrawal, and lack of social reciprocity) and the
Stereotypic Behavior subscale (comprising seven items
assessing repetitive behaviors and stereotyped move-
ments) were examined as parent-report measures of
symptom severity.
Intellectual ability. A range of measures were used across
ATN sites to assess overall intelligence (Full Scale IQ),
verbal intelligence (VIQ), and nonverbal intelligence
(NVIQ). A small portion (10.9%) of the sample was
administered a nonverbal measure of intelligence, the
Leiter International Performance Scale—Third Edition,
(Roid et al., 2013); therefore, only NVIQ scores were
available for this subset of the sample. Intellectual testing
could not be completed for 16.1% of the sample due to dif-
ficulties participating or understanding task demands. As a
result, valid Full Scale IQ scores were available for 181
children (73% of the total sample). Measures included the
Stanford Binet Scales of Intelligence–Fifth Edition
(24.6%) (Roid, 2003), the Wechsler Intelligence Scale for
Children–Fourth Edition (3.6%) (Wechsler, 2003), the
Wechsler Intelligence Scale for Children–Fifth Edition
(6.9%) (Wechsler, 2014), the Wechsler Preschool and Pri-
mary Scale of Intelligence–Third Edition (1.2%)
(Wechsler, 2002), the Wechsler Abbreviated Scale of Intel-
ligence–Second Edition (8.5%) (Wechsler, 2011), the
Wechsler Adult Intelligence Scale—Fourth Edition (0.4%)
(Wechsler, 2008), the Differential Ability Scales–Second
Edition (3.6%) (Elliot, 2007), the Bayley Scales of Infant
and Toddler Development–Third Edition (2%) (Bayley,
2006), or the Mullen Scales of Early Learning (MSEL,
22.2%) (Elliot, 2007). For those receiving the MSEL, the
Early Learning Composite Standard Score was used as a
measure of Full Scale IQ.
Emotional and behavioral functioning. The Child Behavior
Checklist (CBCL) (Achenbach and Rescorla, 2001) was
administered to assess emotional and behavioral difficul-
ties. The CBCL is a broad-band parent-report question-
naire providing an overall assessment of symptoms (i.e.
Total Problems score) as well as more specific summary
and syndrome scales. Items are rated on a three-point
scale (Not True to Very True). Two separate versions are
available based on the child’s age, including younger
(ages 1.5–5 years) and older (ages 6–18 years) versions.
Although the specific syndrome scales differ across ver-
sions, the Total and Internalizing and Externalizing Scale
T-scores are comparable across versions. For this study,
overall levels of both internalizing and externalizing
problems were examined using Internalizing and Exter-
nalizing composite T-scores. The Internalizing domain
comprises mood and anxiety symptoms, while the Exter-
nalizing domain includes behavioral problems, such as
aggression and noncompliance.
Three additional subscales from the ABC (Aman and
Singh, 1986) were included to assess additional challeng-
ing behaviors, specifically: Irritability, Hyperactivity/
Noncompliance, and Inappropriate Speech.
DSM-5 checklist. After all diagnostic assessment proce-
dures were conducted, clinicians completed a DSM-5
diagnostic checklist for each participant. The checklist
contained seven symptoms grouped in two areas: (1) social
communication deficits (three symptoms), and (2) RRBs
(four symptoms). The clinician noted whether each symp-
tom was “absent,” “present by history,” or “currently pre-
sent,” consistent with DSM-5 descriptions (American
Psychiatric Association, 2013). Additional checklist sec-
tions included whether symptoms were present or absent
in the early developmental period and whether impairment
was present or absent. The checklist also included severity
level ratings for both social communication and RRB on a
three-point scale, consistent with DSM-5 criteria (Ameri-
can Psychiatric Association, 2013).
Data analysis plan
Descriptive statistics (mean, standard deviation, range,
and percentage) were calculated for demographic and pri-
mary variables. To examine the distribution of DSM-5-
defined social communication and RRB severity levels,
cross-tabulation of the percentages at each severity level
were calculated. The second and third research ques-
tions were addressed by first conducting bivariate analyses
to examine whether DSM-5 severity ratings were associ-
ated with individual demographic (i.e. age and sex) or
Mazurek et al. 471
clinical features (i.e. ADOS-2 CSS domain scores, IQ
score, internalizing symptoms, externalizing behaviors,
and aberrant behaviors). DSM-5 severity scores are formally
ordinal metrics with potentially unequal intervals between
levels. We ran three models for each bivariate analysis and
looked for agreement across the three models. The first
model was a cumulative logistic regression model, which
properly accounts for the variable intervals between levels
but also assumes parallel cumulative odds across each pre-
dictor. We tested the parallel cumulative odds assumption
with the proportional odds test. The second model was a
binary logistic regression model, which dichotomized
DSM-5 severity scores between requiring support versus
requiring substantial or very substantial support. This divi-
sion was selected because of the low prevalence of partici-
pants scored as requiring very substantial support. The
third model was a linear regression model, which assigned
the values 1 through 3 to the three severity levels as a con-
tinuous scale. The binary logistic model is correct but
potentially less powerful than the cumulative logistic
model. The cumulative logistic model is appropriate if the
proportion odds assumption is met, but cumulative odds
are difficult to communicate. The linear model is not for-
mally correct, but the interpretation is easy. We focused on
results for which there was agreement across all three
models and thus an unambiguous conclusion of significant
association. Future studies where power is more limited
might choose to focus on inference from the cumulative
logistic model for analyses of ordinal severity scales where
the proportional odds assumption is met. Finally, we used
cumulative and binary logistic and linear multiple regres-
sion models to determine which clinical and demographic
features were independent predictors of DSM-5 severity
scores. For each model, we included all significant varia-
bles from the bivariate models for each DSM-5 severity
score.
Results
Demographic and clinical characteristics of the sample are
presented in Table 1.
For DSM-5 social communication severity, 30% of
the sample were rated as requiring support, 45% as
requiring substantial support, and 25% as requiring very
substantial support (Table 2). For DSM-5 RRB severity,
44% of the sample were rated as requiring support, 39%
as requiring substantial support, and 17% as requiring
very substantial support. In the cross-tabulation, 26% of
the sample were rated as requiring support in both social
communication and RRB domains and 28% were rated as
requiring substantial support in both social communica-
tion and RRB domains. Overall, social communication
severity was greater than RRB severity (test for symme-
try, p < 0.001), although there was substantial concord-
ance between the two metrics (simple kappa = 0.52; 95%
confidence interval (CI) 0.43, 0.60; p < 0.001). Basic
sample characteristics across severity level are shown in
Table 3.
Bivariate analyses
Greater social communication severity was associated
with younger age, lower IQ, and higher ADOS-2 CSS
scores in both social affect and RRB domains (Table 3).
Table 1. Demographic and clinical features.
% (n)
Sex Mean (SD)
Age 6.4 (4.0); range: 2.0–17.6 SA-CSS: social affect calibrated severity score, RRB-CSS: restricted 472 Autism 23(2)
Inferences of significant association were consistent Greater RRB severity was associated with younger age, Behavior, for which two out of three models indicated a Multivariate analyses
Final multiple regression models indicated that age, IQ, Table 2. Distribution of DSM-5 severity ratings across the total sample, n (%).
Restricted and repetitive behavior (RRB) severity Total
RRB Level 1 RRB Level 2 RRB Level 3
Social SC Level 1 64 (25.8) 10 (4.0) 1 (0.4) 75 (30.2) Total 110 (44.4) 97 (39.1) 41 (16.5) 248 (100)
Level 1 = requiring support, Level 2 = requiring substantial support, Level 3 = requiring very substantial support.
Table 3. Sample characteristics by severity level rating.
Restricted and repetitive behavior (RRB) severity
RRB Level 1 RRB Level 2 RRB Level 3 SC Level 1 FSIQ = 91.5 (18.2) FSIQ = 95.8 (7.7) FSIQ = 61.0 (–) VIQ = 93.5 (19.3) VIQ = 101.3 (8.3) No VIQ (0/1)
NVIQ = 93.4 (19.6) NVIQ = 96.6 (8.4) No NVIQ (0/1)
Age = 9.5 (4.3) years Age = 7.4 (3.0) years Age = 10.8 (–) years SC Level 2 FSIQ = 78.1 (22.5) FSIQ = 69.6 (20.9) FSIQ = 73.7 (12.0) VIQ = 76.3 (24.4) VIQ = 62.5 (16.3) VIQ = 77.0 (18.4) NVIQ = 83.5 (21.8) NVIQ = 76.9 (21.0) NVIQ = 83.0 (16.8) Age = 7.4 (4.3) years Age = 5.3 (2.9) years Age = 4.9 (1.2) years SC Level 3 FSIQ = 60.6 (15.1) FSIQ = 52.4 (6.3) FSIQ = 54.8 (9.2) VIQ = 54.5 (7.8) VIQ = 46.7 (7.0) VIQ = 52.8 (10.2) NVIQ = 53.3 (6.6) NVIQ = 68.6 (20.8) NVIQ = 55.0 (10.9) Age = 3.6 (1.2) years Age = 4.7 (2.7) years Age = 3.5 (1.8) years IQ = M (SD) of Full Scale IQ (FSIQ), Verbal IQ (VIQ), and Nonverbal IQ (NVIQ) for each cell; % IQ = percentage and frequency of children for Mazurek et al. 473
all models), and that age IQ, and ADOS RRB-CSS were Discussion
In our analysis of DSM-5 severity level ratings for ASD in and repetitive behavior domains; 27% were rated as Table 4. DSM-5 social communication severity levels and clinical features: bivariate analyses.
Cumulative logit Binary logit Linear model
Odds 95% CI p Odds 95% CI p Slope 95% CI p
Age 0.75 (0.70, 0.81) <0.001 0.77 (0.71, 0.83) <0.001 −0.093 (–0.113, –0.073) <0.001
Sex 0.90 (0.50, 1.63) 0.742 1.08 (0.54, 2.27) 0.827 −0.045 (–0.286, 0.197) 0.717
IQ 0.94 (0.92, 0.95) <0.001 0.94 (0.93, 0.96) <0.001 −0.019 (–0.023, –0.016) <0.001
CBCL Externalizing T-score 1.00 (0.98, 1.02) 0.965 0.99 (0.97, 1.02) 0.607 0.000 (–0.008, 0.009) 0.923
CBCL Internalizing T-score 0.98 (0.96, 1.01) 0.169 0.98 (0.95, 1.00) 0.100 −0.007 (–0.017, 0.003) 0.193
ABC Irritability 1.01 (0.98, 1.04) 0.644 1.00 (0.97, 1.03) 0.840 0.003 (–0.008, 0.014) 0.573
ABC Social Withdrawal 1.01 (0.97, 1.04) 0.689 1.00 (0.96, 1.03) 0.852 0.003 (–0.009, 0.015) 0.624
ABC Stereotypic Behaviora 1.05 (0.99, 1.12) 0.079 1.01 (0.95, 1.08) 0.758 0.020 (–0.001, 0.042) 0.063
ABC Hyperactivity 1.01 (0.99, 1.04) 0.343 1.01 (0.98, 1.03) 0.688 0.004 (–0.004, 0.013) 0.326
ABC Inappropriate Speech 0.94 (0.85, 1.03) 0.169 0.95 (0.86, 1.06) 0.368 −0.024 (–0.058, 0.010) 0.164
ADOS-2 SA-CSS 1.27 (1.11, 1.46) <0.001 1.31 (1.12, 1.54) <0.001 0.092 (0.042, 0.142) <0.001
ADOS-2 RRB-CSS 1.15 (1.03, 1.29) 0.018 1.16 (1.02, 1.32) 0.028 0.056 (0.012, 0.100) 0.014
CI: confidence interval; CBCL: Child Behavior Checklist; ABC: Aberrant Behavior Checklist; ADOS-2: Autism Diagnostic Observation Schedule– aABC Stereotypic Behavior failed to meet the proportional odds assumption (p = 0.007).
Table 5. DSM-5 restricted and repetitive behavior severity levels and clinical features: bivariate analyses.
Cumulative logit Binary logit Linear model Age 0.76 (0.70, 0.82) <0.001 0.78 (0.72, 0.84) <0.001 –0.081 (–0.101, –0.060) <0.001
Sex 0.46 (0.24, 0.86) 0.017 0.46 (0.24, 0.88) 0.021 –0.285 (–0.519, –0.050) 0.018
IQ 0.96 (0.94, 0.97) <0.001 0.96 (0.94, 0.97) <0.001 –0.015 (–0.019, –0.011) <0.001
CBCL Externalizing T-Score 1.01 (0.99, 1.03) 0.533 1.00 (0.98, 1.02) 0.869 0.003 (–0.005, 0.011) 0.429
CBCL Internalizing T-Scorea 0.99 (0.96, 1.01) 0.388 0.98 (0.95, 1.00) 0.091 –0.002 (–0.012, 0.007) 0.643
ABC Irritability 1.02 (0.99, 1.05) 0.195 1.01 (0.98, 1.05) 0.354 0.008 (–0.002, 0.018) 0.131
ABC Social Withdrawal 0.99 (0.96, 1.02) 0.593 0.99 (0.95, 1.02) 0.408 –0.002 (–0.014, 0.010) 0.707
ABC Stereotypic Behavior 1.07 (1.00, 1.13) 0.036 1.06 (1.00, 1.13) 0.074 0.024 (0.003, 0.045) 0.026
ABC Hyperactivity 1.02 (0.99, 1.04) 0.134 1.02 (0.99, 1.04) 0.186 0.007 (–0.002, 0.016) 0.127
ABC Inappropriate Speech 1.02 (0.93, 1.11) 0.749 1.04 (0.94, 1.14) 0.468 0.002 (–0.032, 0.035) 0.919
ADOS-2 SA-CSS 1.20 (1.05, 1.38) 0.007 1.19 (1.03, 1.38) 0.019 0.071 (0.021, 0.121) 0.006
ADOS-2 RRB-CSSb 1.40 (1.22, 1.62) <0.001 1.45 (1.25, 1.70) <0.001 0.100 (0.058, 0.143) <0.001
CI: confidence interval; CBCL: Child Behavior Checklist; ABC: Aberrant Behavior Checklist; ADOS-2: Autism Diagnostic Observation Schedule– 474 Autism 23(2)
Our findings indicate that clinician ratings of severity The results also revealed that intellectual functioning symptom severity alone (more consistent with text exam- Age was also found to be inversely associated with Table 6. DSM-5 severity levels and clinical features: final multiple regression models.
Cumulative logit Binary logit Linear model Outcome variable: social communication severity level CI: confidence interval; CBCL: Child Behavior Checklist; ABC: Aberrant Behavior Checklist; ADOS-2: Autism Diagnostic Observation Schedule– Mazurek et al. 475
Limitations and future directions
As the first study of this type, the current findings provide an Additional measurement limitations should also be con- Acknowledgements
The authors are extremely grateful to all the families and clinicians Declaration of conflicting interests
Dr M.O.M has received research support from National Institute support from Curemark, Neuropharm, Lilly, Forest, Bristol Myers Funding
The author(s) disclosed receipt of the following financial support ORCID iD
Micah O Mazurek https://orcid.org/0000-0001-7715-6538
References
Achenbach TM & Rescorla L (2001) Manual for the ASEBA Aman M and Singh N (1986) Aberrant Behavior Checklist: American Psychiatric Association (2013) Diagnostic and Bayley N (2006) Bayley Scales of Infant and Toddler Development. Buxbaum JD and Baron-Cohen S (2013) DSM-5: the debate con- Centers for Disease Control and Prevention (CDC) (2014) Elliot C (2007) Differential Abilities Scale—2nd Edition (DAS-II) Esler AN, Bal VH, Guthrie W, et al. (2015) The autism diagnostic Gotham K, Pickles A and Lord C (2009) Standardizing ADOS Grzadzinski R, Huerta M and Lord C (2013) DSM-5 and autism Halfon N and Kuo AA (2013) What DSM-5 could mean to chil- Hus V, Gotham K and Lord C (2014) Standardizing ADOS Hus V and Lord C (2014) The autism diagnostic observation https://orcid.org/0000-0001-7715-6538 476 Autism 23(2)
severity scores. Journal of Autism and Developmental Kulage KM, Smaldone AM and Cohn EG (2014) How will DSM-5 Lord C and Bishop SL (2015) Recent advances in autism Lord C, Rutter M, DiLavore PC, et al. (2012) Autism Diagnostic Mazurek MO, Handen BL, Wodka EL, et al. (2014) Age Mazurek MO, Lu, Symecko H, et al. (2017) A prospective study Mehling MH and Tassé MJ (2016) Severity of autism spectrum Ozonoff S (2012a) Editorial: DSM-5 and autism spectrum Ozonoff S (2012b) Editorial perspective: autism spectrum dis- for change. Journal of Child Psychology and Psychiatry Roid GH (2003) Stanford-Binet Intelligence Scales. 5th ed. Roid GH, Miller LJ, Pomplun M, et al. (2013) Leiter-3: Leiter Smith IC, Reichow B and Volkmar FR (2015) The effects of Volkmar RF and Reichow B (2013) Autism in DSM-5: progress Wechsler D (2002) Wechsler Preschool and Primary Scale Wechsler D (2003) Wechsler Intelligence Scale for Children. 4th Wechsler D (2008) Wechsler Adult Intelligence Scale. 4th ed. Wechsler D (2011) Wechsler Abbreviated Scale of Intelli- Wechsler D (2014) Wechsler Intelligence Scale for Children. 5th Weitlauf AS, Gotham K, Vehorn AC, et al. (2014) Brief report: Wiggins LD, Baio J and Rice C (2006) Examination of the time Vol.:(0123456789)
1 3
Journal of Autism and Developmental Disorders
https://doi.org/10.1007/s10803-020-04839-z
O R I G I N A L PA P E R
Systematic Review and Meta‑Analysis of the Clinical Utility Jenna B. Lebersfeld1 · Marissa Swanson1 · Christian D. Clesi1 · Sarah E. O’Kelley1
Accepted: 9 December 2020 Abstract Keywords Autism spectrum disorder · ADOS-2 · ADI-R · Meta-analysis · Diagnosis · HSROC
Introduction
Diagnostic evaluations are crucial for children with autism et al. 2003; Howes et al. 2017; Penner et al. 2017) have high Supplementary Information The online version of this article * Jenna B. Lebersfeld 1 University of Alabama at Birmingham, 1720 7th Ave S, http://crossmark.crossref.org/dialog/?doi=10.1007/s10803-020-04839-z&domain=pdf https://doi.org/10.1007/s10803-020-04839-z Journal of Autism and Developmental Disorders
1 3 these measures when used in clinical practice compared to Several statistical approaches are available and accepted Methods
This systematic review and meta-analysis utilized methods Measures for Index Tests
Autism Diagnostic Observation Schedule, Second Edition The ADOS-2 is a semi-structured, 45- to 60-minute obser- Autism Diagnostic Interview, Revised (ADI‑R; Rutter et al. The ADI-R is a semi-structured diagnostic interview given Table 1 Sensitivity and specificity of published ADI-R algorithms
Se sensitivity, Sp specificity
Article n Se Sp
Cox 1999 https://www.crd.york.ac.uk/prospero/display_record.php?ID https://www.crd.york.ac.uk/prospero/display_record.php?ID Journal of Autism and Developmental Disorders Eligibility Criteria
Studies administering either one or both of the ADI-R and Reference Standard for Diagnosis
The reference standard for diagnosis was the final consensus Studies which used another method for determining ASD Study Design
Article eligibility included peer-reviewed original research Search Strategy
Searches were conducted in September 2018 from Psy- Assessment of Methodological Quality
The QUADAS-2 (Quality Assessment of Diagnostic Accu- Study Selection
Figure 1 reviews the process by which articles were selected Journal of Autism and Developmental Disorders Fig. 1 PRISMA flow diagram Journal of Autism and Developmental Disorders Data Extraction
True positives (TP), false positives (FP), true negatives Data Analysis
Articles were organized using the RevMan software. The Statistical analyses were conducted separately for the For the ADI-R analysis, the model converged using these ADOS-2 analyses of the setting covariate. Combining the Outliers and Sensitivity Analysis
Studies were plotted graphically on HSROC plots and The Gotham et al. (2007, 2008) papers provided sepa- Journal of Autism and Developmental Disorders Results
Table 3 outlines study characteristics for the 22 articles Quality of the Included Studies
Figure 2 displays metrics used for evaluating quality of the Risk of bias was unclear or high risk for 12 of the 22 papers Diagnostic Accuracy of Measures
ADOS‑2
Estimates of overall Se (.89–.92) and Sp (.81–.85) of the Table 2 ADOS-2 Analytical Approaches
Approach Outlier article Gotham et al. 2007, 1 Included ASD vs. NS Table 3 Study characteristics
a 57 to 86% male Study Test(s) Total N Sex Age Diagnosis
ADOS-2 ADI-R Male n Female n M or Range ASD n Non-ASD n
1. Baird 2006 X 255 223 32 12 years 158 97 Journal of Autism and Developmental Disorders Fig. 2 QUADAS-2 risk of bias Journal of Autism and Developmental Disorders (Se =.85–1.00; Sp =.44–1.00) are presented in Table 4 and was significant (−2LL = 7.87, p < .05) when the Gotham
et al. (2007, 2008) papers were excluded (Table 4). The
highest DOR was reported within clinical samples when
the outlier was excluded and the Gotham et al. (2007, 2008)
papers utilized the Autism vs. Non-Spectrum algorithms
(Table 4). When all articles were included, the DOR was
higher for research compared with clinical samples; how-
ever, inclusion of the setting covariate was not significant (p
=.071). Exclusion of the outlier had little effect on Se of the
clinical sample but increased the Sp of the clinical sample
from .80 to .90, which is higher than specificities reported
in research samples (.81 and .83; Table 4).
Interpretation of the SROC plot (Fig. 3) for all three set- ADI‑R
The ADI-R pooled Se was .75, Sp was .82, and individual Inclusion of the setting covariate in the model compared Discussion
This study utilized a systematic review and meta-analysis Fig. 3 SROC plot of ADOS-2 by setting for Approach 1 (outlier Fig. 4 SROC plot of ADOS-2 by setting for Approach 3 (outlier Journal of Autism and Developmental Disorders to accuracy reported in the published manual and was more For the ADOS-2, when comparing samples of children Table 4 Sensitivity and Se sensitivity, Sp specificity, DOR diagnostic odds ratio, −2LL −2 log likelihood difference, “–-” data not Approach Overall Research or both Clinical −2LL p
n Se Sp DOR n Se Sp DOR n Se Sp DOR
1 14 .89 .81 36.5 11 .89 .80 34.8 3 .89 .80 31.0 7.02 .071 Fig. 5 Forest plot of ADOS-2 by setting using the Gotham ASD vs. NS estimates
Table 5 Sensitivity and NVMA nonverbal mental age, ASD autism spectrum disorder, NS non-spectrum, AUT autism, Se sensitivity, Gotham et al. 2007 Gotham et al. 2008
Module and algorithm AUT vs. NS ASD vs. NS AUT vs. NS ASD vs. NS
Se Sp Se Sp Se Sp Se Sp
Module 1, no words, .95 .94 .82 .79 .86 .80 – –
Module 1, some words .97 .91 .77 .82 .89 .91 .95 1.00 Journal of Autism and Developmental Disorders that given the small number of studies identified that were Sources of Heterogeneity
One limitation of this meta-analysis is that additional Fig. 6 ADI-R forest plot by setting
Table 6 Sensitivity and specificity of ADI-R Overall and by evalua- Setting n Sens Spec DOR
Overall 13 .75 .82 13.6 Fig. 7 ADI-R SROC plot by setting. Note: size of shape indicates Journal of Autism and Developmental Disorders these measures function in clinical settings, articles utiliz- Risk and Sources of Bias
The QUADAS-2 identified no studies with concerns of the Additionally, only peer-reviewed, published articles were by providers administering the ASD diagnostic measures. An additional consideration is the decision to include Conclusion
This systematic review and meta-analysis of the ADOS-2 Acknowledgements The authors would like to thank the UAB Journal of Autism and Developmental Disorders Cassandra Newsom, Ph.D. for serving as article reviewers, and Dustin Author Contributions JL designed the project and wrote the protocol References
Review Manager (RevMan) [Computer program]. Version 5.3. Copen- American Psychiatric Association (2000). Diagnostic and statistical American Psychiatric Association. (2013). Diagnostic and statistical Baird, G., Simonoff, E., Pickles, A., Chandler, S., Loucas, T., Meldrum, Camodeca, A. (2018). Utility of three N-Item scales of the child Bishop, S. L., Huerta, M., Gotham, K., Havdahl, K. A., Pickles, A., Cox, A., Klein, K., Charman, T., Baird, G., Baron-Cohen, S., Swetten- De Bildt, A., Sytema, S., Ketelaars, C., Kraijer, D., Mulder, E., Volk- De Bildt, A., Sytema, S., van Lang, N. D. J., Minderaa, R. B., van Deeks, J. J. (2001). Systematic reviews of evaluations of diagnostic and Deeks J. J., Wisniewski S., & Davenport C. (2013). Cochrane Hand- 1.0.0. The Cochrane Collaboration, 2013. http://srdta .cochr ane. Dereu, M., Roeyers, H., Raymaekers, R., Meirsschaut, M., & Warreyn, DiLavore, P. C., Lord, C., & Rutter, M. (1995). The pre-linguistic Dorlack, T. P., Myers, O. B., & Kodituwakku, P. W. (2018). A com- Dykens, E. M., Roof, E., Hunt-Hawkins, J., Dankner, N., Lee, E. B., Falkmer, T., Anderson, K., Falkmer, M., & Horlin, C. (2013). Diag- Gilchrist, A., Green, J., Cox, A., Burton, D., Rutter, M., & Le Couteur, Gillentine, M. A., Berry, L. N., Goin-Kochel, R. P., Ali, M. A., Ge, Gotham, K., Risi, S., Dawson, G., Tager-Flusberg, H., Joseph, R., Gotham, K., Risi, S., Pickles, A., & Lord, C. (2007). The Autism Diag- Gray, K. M., Tonge, B. J., & Sweeney, D. J. (2008). Using the Autism Grzadzinski, R., Dick, C., Lord, C., & Bishop, S. (2016). Parent- Guthrie, W., Swineford, L. B., Nottke, C., & Wetherby, A. M. (2013). Harris, S. W., Hess, D., Goodlin-Jones, B., Ferranti, J., Bacal- Havdahl, K. A., von Tetzchner, S., Huerta, M., Lord, C., & Bishop, S. https://doi.org/10.1016/j.rasd.2018.04.004 https://doi.org/10.1016/j.rasd.2018.04.004 https://doi.org/10.1002/aur.1645 https://doi.org/10.1007/s10803-009-0749-9 http://srdta.cochrane.org/ http://srdta.cochrane.org/ https://doi.org/10.1186/s11689-017-9200-2 https://doi.org/10.1007/s00787-013-0375-0 https://doi.org/10.1007/s10803-016-2961-8 https://doi.org/10.1007/s10803-016-2961-8 https://doi.org/10.1097/CHI.0b013e31816bffb7 https://doi.org/10.1186/s13229-016-0072-1 https://doi.org/10.1111/jcpp.12008 https://doi.org/10.1352/2008.113:427-438 https://doi.org/10.1352/2008.113:427-438 https://doi.org/10.1002/aur.1515 https://doi.org/10.1002/aur.1515 Journal of Autism and Developmental Disorders Howes, O. D., Rogdaki, M., Findon, J. L., Wichers, R. H., Charman, T., Kamp-Becker, I., Albertowski, K., Becker, J., Ghahreman, M., Lang- Kim, S. H., & Lord, C. (2012). Combining information from multi- Langmann, A., Becker, J., Poustka, L., Becker, K., & Kamp-Becker, Le Couteur, A., Haden, G., Hammal, D., & McConachie, H. (2008). Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Leventhal, B. L., Lord, C., Rutter, M., DiLavore, P. C., Risi, S., Gotham, K., & Bishop, Lord, C., Rutter, M., DiLavore, P. C., Risi, S., Gotham, K., & Bishop, Luyster, R., Gotham, K., Guthrie, W., Coffing, M., Petrak, R., Pierce, Maddox, B. B., Brodkin, E. S., Calkins, M. E., Shea, K., Mullan, Mazefsky, C., & Oswald, D. P. (2006). The discriminative ability McInnes, M. D., Moher, D., Thombs, B. D., McGrath, T. A., Molloy, C., Murray, D. S., Akers, R., Mitchell, T., & Manning- Neuhaus, E., Beauchaine, T. P., Bernier, R. A., & Webb, S. J. (2017). Oosterling, I. J., Roos, S., de Bildt, A., Rommelse, N., de Jonge, sample. Journal of Autism and Developmental Disorders, 40(6), Papanikolaou, K., Paliokosta, E., Houliaras, G., Vgenopoulou, S., Penner, M., Anagnostou, E., Andoni, L. Y., & Ungar, W. J. (2017). Reaven, J. A., Hepburn, S. L., & Ross, R. G. (2008). Use of the Risi, S., Lord, C., Gotham, K., Corsello, C., Chrysler, C., Szatmari, Rutter, C. M. (1995). Regression methods for meta-analysis of diag- Rutter, C. M., & Gatsonis, C. A. (2001). A hierarchical regression Rutter, M., Le Couteur, A., Lord, C., et al. (2003). Autism diagnos- Stewart, J. R., Vigil, D. C., Ryst, E., & Yang, W. (2014). Refin- Takwoingi, Y. & Deeks, J. (2010). MetaDAS: A SAS macro for meta- Tomanik, S. S., Pearson, D. A., Loveland, K. A., Lane, D. M., & Ventola, P. E., Kleinman, J., Pandey, J., Barton, M., Allen, S., Green, Vllasaliu, L., Jensen, K., Hoss, S., Landenberger, M., Menze, M., Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S., Wiggins, L. D., Reynolds, A., Rice, C. E., Moody, E. J., Bernal, Wiggins, L. D., & Robins, D. L. (2008). Brief Report: excluding the Zander, E., Sturm, H., & Bӧlte, S. (2015). The added value of the https://doi.org/10.1177/0269881117741766 https://doi.org/10.1007/s10803-007-0403-3 https://doi.org/10.1007/s10803-017-3188-z https://doi.org/10.1177/1362361306068505 https://doi.org/10.1177/1362361306068505 https://doi.org/10.1177/1362361310379241 https://doi.org/10.1007/s10803-009-0915-0 https://doi.org/10.1177/1359104507086343 http://srdta.cochrane.org/ https://doi.org/10.7326/0003-4819-155-8-201110180-00009 https://doi.org/10.7326/0003-4819-155-8-201110180-00009 https://doi.org/10.1007/s10803-014-2287-3 https://doi.org/10.1007/s10803-014-2287-3 https://doi.org/10.1007/s10803-007-0456-3 Journal of Autism and Developmental Disorders autism diagnostic observation schedule: Diagnostic validity in Zander, E., Willfors, C., Berggren, S., Choque-Olsson, N., Coco, C., Zander, E., Willfors, C., Berggren, S., Coco, C., Holm, A., Jifält, I., Ziats, M. N., Goin-Kochel, R. P., Berry, L. N., Ali, M., Ge, J., Zwaigenbaum, L., Bryson, S. E., Brian, J., Smith, I. M., Roberts, W., Publisher’s Note Springer Nature remains neutral with regard to https://doi.org/10.1038/gim.2016.9 https://doi.org/10.1002/aur.1 https://doi.org/10.1002/aur.1 Abstract Results Discussion
Female 18.1 (45)
Male 81.9 (203)
Race
Asian 1.2 (3)
Black or African American 10.1 (25)
Caucasian/White 76.2 (189)
Other 7.3 (18)
Ethnicity
Hispanic/Latino 8.1 (20)
Not Hispanic/Latino 85.9 (213)
Parental education
IQ 76.1 (22.5); range: 33–127
Child Behavior Checklist (CBCL)
Externalizing T-score 61.7 (11.6)
Internalizing T-score 65.2 (9.6)
Aberrant Behavior Checklist (ABC)
Irritability 15.0 (10.0)
Social Withdrawal 12.8 (8.7)
Stereotypic Behavior 5.9 (4.9)
Hyperactivity/Noncompliance 20.0 (11.8)
Inappropriate Speech 3.7 (3.1)
ADOS-2 SA-CSS 7.4 (1.9)
ADOS-2 RRB-CSS 7.3 (2.1)
and repetitive behavior calibrated severity score. ADOS-2: Autism
Diagnostic Observation Schedule–Second Edition.
(p < 0.05) for these variables across all three model types.
lower IQ, higher ABC Stereotypic Behavior subscale
scores, higher ADOS-2 SA-CSS and RRB-CSS scores,
and being male (Table 4). Inferences of significant associ-
ation were consistent (p < 0.05) across all models for most
of these variables, with the exception of ABC Stereotypic
statistically significant association (Table 5).
and ADOS-2 SA-CSS were significant independent pre-
dictors of social communication severity (p < 0.05 across
communication
(SC) severity
SC Level 2 39 (15.7) 69 (27.8) 3 (1.2) 111 (44.8)
SC Level 3 7 (2.8) 18 (7.3) 37 (14.9) 62 (25)
Social
communication
(SC) severity
%FSIQ = 89 (57/64)
%FSIQ = 80 (8/10)
%FSIQ = 100 (1/1)
%VIQ = 63 (40/64)
%VIQ = 70 (7/10)
%NVIQ = 72 (46/64)
%NVIQ = 90 (9/10)
% Module 1 = 6 (4/64) % Module 1 = 0 (0/10) % Module 1 = 0 (0/1)
%FSIQ = 69 (27/39)
%FSIQ = 71 (49/69)
%FSIQ = 100 (3/3)
%VIQ = 46 (18/39)
%VIQ = 52 (36/69)
%VIQ = 67 (2/3)
%NVIQ = 64 (25/39)
%NVIQ = 68 (47/69)
%NVIQ = 100 (3/3)
% Module 1 = 21 (8/39) % Module 1 = 45 (31/69) % Module 1 = 0 (0/3)
%FSIQ = 71 (5/7)
%FSIQ = 39 (7/18)
%FSIQ = 65 (24/37)
%VIQ = 29 (2/7)
%VIQ = 39 (7/18)
%VIQ = 49 (18/37)
%NVIQ = 57 (4/7)
%NVIQ = 67 (12/18)
%NVIQ = 54 (20/37)
% Module 1 = 57 (4/7) % Module 1 = 56 (10/18) % Module 1 = 57 (20/37)
whom IQ was available; age = M (SD) of age for each cell; % Module 1 = percentage and frequency of children who were administered Module 1 of
the ADOS-2 (intended for children with minimal verbal abilities).
significant independent predictors of RRB severity
(p < 0.05 across all models) (Table 6).
a large sample of children and adolescents with ASD, we
observed that 25% of children were rated as requiring
support, the lowest severity, in both social communication
requiring substantial support, the intermediate severity, in
both domains; and 15% were rated as requiring very sub-
stantial support, the most severe symptoms, in both
domains. Severity was largely consistent across domains,
with only a handful of children receiving the lowest sever-
ity ratings in one domain and most severe in the other. In
general, social communication symptoms were rated at a
higher level of severity than repetitive behaviors across
the sample.
ratio
ratio
estimate
Second Edition; SA-CSS: social affect calibrated severity score; RRB-CSS: restricted and repetitive behavior calibrated severity score.
Odds
ratio
95% CI p Odds
ratio
95% CI p Slope
estimate
95% CI p
Second Edition; SA-CSS: social affect calibrated severity score; RRB-CSS: restricted and repetitive behavior calibrated severity score.
aCBCL Internalizing Problems T-score failed to meet the proportional odds assumption (p = 0.005).
bADOS-2 RRB-CSS failed to meet the proportional odds assumption (p = 0.049).
are consistent to some degree with both behavioral obser-
vations and parental ratings of severity. Specifically, the
results revealed significant associations between both
social communication and RRB severity ratings and
respective ADOS-2 domain scores. Significant associa-
tions were also observed between RRB severity ratings
and parent-reported symptoms of stereotyped behavior on
the ABC. By contrast, parental ratings of social with-
drawal on the ABC were not associated with DSM-5 rat-
ings of social communication severity. The ABC subscales
included in this study provide a narrow assessment of
very specific types of RRB and social communication.
Thus, future studies should include more comprehensive
parent-report measures of the full range of both RRB and
social communication functioning. It is also noteworthy
that parent-reported behavioral and emotional problems
were not significantly associated with DSM-5 symptom
severity in either domain, providing some evidence that
clinicians are not basing their severity ratings on general
behavioral or emotional problems.
was strongly associated with both social communication
and RRB severity ratings. Children with lower IQ had sig-
nificantly greater clinician-rated severity in both domains.
It could be the case that children who were more signifi-
cantly affected by autism were also more likely to have
global cognitive or developmental impairment. Alterna-
tively, intellectual impairment may contribute indepen-
dently to social communication deficits and repetitive
behaviors above and beyond the effects of core ASD
symptoms alone. Thus, the DSM-5 symptom severity rat-
ings may reflect the combined manifestation of both symp-
tom-specific and global developmental impairment. Given
the wording of the new DSM-5 severity level descriptors
(e.g. “requiring support”), clinicians may also have diffi-
culty determining whether to assign ratings based on ASD
ples) or based largely on need for support (more consistent
with the level descriptors). If clinicians adhere to the latter
interpretation, there may be greater potential for confla-
tion of intellectual and symptom-related impairment. This
poses problems for both inter-rater reliability and con-
struct validity. Without more specific guidance, clinicians
are likely to vary in the extent to which they classify
severity based on domain-specific deficits, cognitive
impairments, or need for support in activities of daily liv-
ing. As shown in a recent descriptive study of children
with ASD, there is a potential for significant discrepancy
in severity classification depending on the measure and
construct (Weitlauf et al., 2014). Further research is
needed to better understand clinician decision-making and
interpretation of the intended construct assessed by these
new DSM-5 specifiers.
DSM-5 symptom severity in both social communication
and RRB domains. This is difficult to interpret within the
context of this study because of the potential for sampling
bias and may not reflect a true decrease in ASD severity
with age. Children were recruited and enrolled into this
study based on referral for autism diagnostic assessments.
It is likely that individuals who were referred for an initial
diagnostic assessment in adolescence generally had more
subtle symptom presentation than those who were referred
in early childhood. This would be consistent with prior
research finding an inverse relationship between autism
symptom severity and age at first diagnosis (Mazurek et al.,
2014; Wiggins et al., 2006). Because of this, it is likely that
the adolescents in our study had less severe symptoms than
the larger population of adolescents with ASD. To fully
examine the associations between age and DSM-5 symp-
tom severity indicators, it would be most informative to
enroll a broader population of individuals with ASD, not
only those seen at the time of initial diagnosis.
Odds
ratio
95% CI p Odds
ratio
95% CI p Slope
estimate
95% CI p
Age 0.82 (0.73, 0.91) <0.001 0.82 (0.72, 0.93) 0.002 –0.054 (–0.081, –0.028) <0.001
IQ 0.95 (0.93, 0.96) <0.001 0.95 (0.93, 0.97) <0.001 –0.015 (–0.019, –0.010) <0.001
ADOS-2 SA-CSS 1.30 (1.08, 1.57) 0.006 1.47 (1.16, 1.89) 0.002 0.065 (0.018, 0.113) 0.007
Outcome variable: restricted and repetitive behavior severity level
Age 0.83 (0.74, 0.91) <0.001 0.83 (0.74, 0.93) 0.002 –0.046 (–0.072, –0.021) <0.001
IQ 0.97 (0.96, 0.99) 0.002 0.98 (0.96, 0.99) 0.014 –0.008 (–0.013, –0.004) <0.001
ADOS-2 RRB-CSS 1.69 (1.37, 2.12) <0.001 1.80 (1.42, 2.38) <0.001 0.109 (0.066, 0.153) <0.001
Second Edition; SA-CSS: social affect calibrated severity score; RRB-CSS: restricted and repetitive behavior calibrated severity score.
important first examination of the clinical application of
DSM-5 ASD severity ratings across a large and well-charac-
terized sample. The sample spanned a wide range of func-
tioning and was typical of the male:female ratio found in
population studies of ASD (Centers for Disease Control and
Prevention (CDC), 2014). However, several factors may
limit generalizability to the larger ASD population. First, the
centers participating in our study were all located at aca-
demic medical centers and specialize in ASD diagnosis,
treatment, and research. Thus, the clinicians in our study may
not be representative of the larger population of clinicians
practicing in community-based or other settings. Future
research should examine how clinicians in different settings
may be using these DSM-5 severity ratings. It would also be
informative to evaluate potential differences in clinical deci-
sion-making across professional disciplines, as well as inter-
rater reliability in assignment of severity level ratings.
sidered. First, we chose to include ADOS-2 CSS scores
rather than raw scores because they were specifically
designed to account for individual differences in age and
language level. However, it should be noted that these CSS
scores still do not fully account for the associations between
autism symptoms and age and language. In addition,
although ADOS-2 assessments were overseen by research-
reliable clinicians at each site, we did not specifically track
whether all assessments were directly administered by
research-reliable clinicians. Another limitation is that we
did not collect data related to adaptive functioning. Although
many clinicians administered adaptive measures as part of
their clinical evaluations, these data were not collected dur-
ing this study. In the future, it would be informative to eval-
uate the extent to which adaptive functioning correlates with
clinician ratings of symptom severity. Overall, the current
findings suggest that further guidance and more specific
operational definitions may be helpful for clinicians assign-
ing these new DSM-5 severity level ratings.
who participated in this study.
of Mental Health (NIMH), Autism Speaks, and Health Resources
and Services Administration (HRSA). Ms F.L. has received
research support from Autism Speaks and HRSA. Dr E.A.M.
serves as a DSMB member for Acorda Therapeutics and Shire
Human Genetic Therapies and receives research support from
Adolph Coors Foundation, ALS Association, ALS Finding a
Cure, Autism Speaks, Biotie Therapies, Michael J Fox Foundation,
FDA, HRSA, NIH, and PCORI. Dr B.L.H. has received research
Squibb, Roche, Pediamed, Pfizer, and Autism Speaks.
for the research, authorship, and/or publication of this article: This
network activity was supported by Autism Speaks and coopera-
tive agreement UA3 MC11054 through the US Department of
Health and Human Services, Health Resources and Services
Administration, Maternal and Child Health Research Program to
the Massachusetts General Hospital. This work was conducted
through the Autism Speaks Autism Treatment Network.
school-age forms & profiles: an integrated system of
multi-informant assessment. Burlington, VT: University of
Vermont, Research Center for Children, Youth & Families.
Manual. East Aurora, NY: Slosson Educational Publications.
Statistical Manual of Mental Disorders (DSM-5). 5th ed.
Washington, DC: APA.
3rd ed. San Antonio, TX: Harcourt Assessment, Inc.
tinues. Molecular Autism 4(1): 11.
Prevalence of autism spectrum disorder among children
aged 8 years—autism and developmental disabilities moni-
toring network, 11 sites, United States, 2010. MMWR
Surveill Summ 63(2): 1–21.
Manual. 2nd ed. San Antonio, TX: Harcourt Assessment, Inc.
observation schedule, toddler module: Standardized sever-
ity scores. Journal of Autism and Developmental Disorders
45(9): 2704–2720.
scores for a measure of severity in autism spectrum disor-
ders. Journal of Autism and Developmental Disorders 39(5):
693–705.
spectrum disorders (ASDs): an opportunity for identifying
ASD subtypes. Molecular Autism 4(1): 12.
dren with autism and their families. JAMA Pediatrics 167(7):
608–613.
domain scores: separating severity of social affect and
restricted and repetitive behaviors. Journal of Autism and
Developmental Disorders 44: 2400–2412.
schedule, module 4: Revised algorithm and standardized
Disorders 44(8): 1996–2012.
affect autism diagnosis? A systematic literature review
and meta-analysis. Journal of Autism and Developmental
Disorders 44(8): 1918–1932.
research as reflected in DSM-5 criteria for autism spec-
trum disorder. Annual Review of Clinical Psychology 11:
53–70.
Observation Schedule, Second Edition (ADOS-2) Manual
(Part 1): Modules 1–4. 2nd ed. Torrance, CA: Western
Psychological Services.
at first autism spectrum disorder diagnosis: The role of
birth cohort, demographic factors, and clinical features.
Journal of Developmental and Behavioral Pediatrics 35(9):
561–569.
of the concordance of DSM-IV and DSM-5 diagnostic cri-
teria for autism spectrum disorder. Journal of Autism and
Developmental Disorders 47(9): 2783–2794.
disorders: current conceptualization, and transition to DSM-
5. Journal of Autism and Developmental Disorders 46(6):
2000–2016.
disorders—two decades of perspectives from the JCPP.
Journal of Child Psychology and Psychiatry 53(9):
e4–e6.
orders in DSM-5—an historical perspective and the need
53(10): 1092–1094.
Itasca, IL: Riverside Publishing.
International Performance Scale. Torrance, CA: Western
Psychological Services.
DSM-5 criteria on number of individuals diagnosed with
autism spectrum disorder: a systematic review. Journal of
Autism and Developmental Disorders 45(8): 2541–2552.
and challenges. Molecular Autism 4(1): 13.
of Intelligence. 3rd ed. San Antonio, TX: Psychological
Corporation.
ed. San Antonio, TX: Psychological Corporation.
San Antonio, TX: Psychological Corporation.
gence (WASI-II). 2nd ed. San Antonio, TX: Psychological
Corporation.
ed. San Antonio, TX: NCS Pearson.
DSM-5 “levels of support”: a comment on discrepant con-
ceptualizations of severity in ASD. Journal of Autism and
Developmental Disorders 44(2): 471–476.
between first evaluation and first autism spectrum diagno-
sis in a population-based sample. Journal of Developmental
and Behavioral Pediatrics 27(2): S79–S87.
of the ADOS‑2 and the ADI‑R in Diagnosing Autism Spectrum
Disorders in Children
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021
The Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) and the Autism Diagnostic Interview, Revised
(ADI-R) have high accuracy as diagnostic instruments in research settings, while evidence of accuracy in clinical settings is
less robust. This meta-analysis focused on efficacy of these measures in research versus clinical settings. Articles (n = 22)
were analyzed using a hierarchical summary receiver operating characteristics (HSROC) model. ADOS-2 performance was
stronger than the ADI-R. ADOS-2 sensitivity and specificity ranged from .89-.92 and .81-.85, respectively. ADOS-2 accuracy
in research compared with clinical settings was mixed. ADI-R sensitivity and specificity were .75 and .82, respectively, with
higher specificity in research samples (Research = .85, Clinical = .72). A small number of clinical studies were identified,
indicating ongoing need for investigation outside research settings.
spectrum disorder (ASD) to access early intervention ser-
vices and therapies. Determining the accuracy of meas-
ures commonly used for ASD assessment is necessary to
aid clinicians in making better and more accurate clinical
diagnoses. A comprehensive evaluation for autism spectrum
disorder (ASD) is most accurately conducted by a multidis-
ciplinary team through the use of information from multi-
ple sources, including a clinical observation of the child, an
ASD-focused clinical interview with caregivers, and child
and family history (Risi et al. 2006; Kim and Lord 2012;
Stewart et al. 2014). The Autism Diagnostic Observation
Schedule, Second Edition (ADOS-2; Lord et al. 2012a) and
the Autism Diagnostic Interview, Revised (ADI-R; Rutter
levels of diagnostic accuracy; however, both instruments
require specialized training and experience to administer and
score. Using both instruments together improves diagnostic
accuracy (sensitivity .70-.98, specificity .80-.96) compared
to each measure alone (Risi et al. 2006; Ventola et al. 2006;
Kim and Lord 2012). The multidisciplinary team, often led
by a clinical psychologist or physician (e.g., developmental/
behavioral pediatrician), takes the results of these measures
as well as other information gathered during the evaluation
and uses clinical judgement to render a final diagnosis. The
ADOS-2 and ADI-R were initially developed as research
tools and have been studied at length in the research litera-
ture, with subsequent publication for use in clinical settings
to aid diagnosis. Much of the literature published on the
accuracy of the ADOS-2 and the ADI-R utilized evaluations
from populations recruited specifically for research, and
these are the studies on which the published psychometrics
were based. However, research samples often utilize strict
exclusion criteria, such as excluding children with behavio-
ral challenges, intellectual disability, and genetic disorders,
to create a more homogenous research sample. Therefore,
results may not generalize to a clinical community sample
(de Bildt et al. 2004; Tomanik et al. 2007; Neuhaus et al.
2017), and it is important to understand the accuracy of
(https ://doi.org/10.1007/s1080 3-020-04839 -z) contains
supplementary material, which is available to authorized users.
JBL1@uab.edu
Birmingham, AL 35233, USA
research settings.
in evaluating the accuracy of these measures in diagnosing
ASD. Sensitivity (Se) is the likelihood that a child with a
clinical diagnosis of ASD will score in the ASD range on
the measure, and specificity (Sp) indicates the likelihood
that a child without ASD will score in the non-ASD range
on the measure. Positive predictive value (PPV) is the likeli-
hood that a child who received an ASD classification on a
measure truly has a diagnosis of ASD, and negative predic-
tive value (NPV) is the likelihood that a child who scores in
the non-ASD range on a measure will not receive a clinical
ASD diagnosis. PPV and NPV are influenced by the preva-
lence of the disorder in the sample whereas Se and Sp are
not; therefore, Se and Sp are used to measure diagnostic test
accuracy when comparing across samples. The accuracy of
the ADOS-2 and ADI-R has been shown to be lower in clini-
cal settings compared to the research context (de Bildt et al.
2009; Zander et al. 2016; Langmann et al. 2017; Zander
et al. 2017; Kamp-Becker et al. 2018); however, the majority
of these clinical studies were conducted in countries outside
of the United States, including the Netherlands (de Bildt
et al. 2009; Oosterling et al. 2010), Greece (Papanikolaou
et al. 2009), Australia (Dereu et al. 2012; Gray et al. 2008),
Germany (Kamp-Becker et al. 2018) and Sweden (Zander
et al. 2015; Zander et al. 2017). Differences in sociocultural
norms as well as translations of the originally published
measures may have increased the error associated with these
measures in clinical settings. Given that children referred for
clinical evaluations in the community often have more com-
plex presentations than samples recruited for and included
in research studies, it was hypothesized that these two diag-
nostic tools may be less accurate in clinical settings than the
reported psychometrics from large scale studies conducted
in the research setting. Therefore, the purpose of this system-
atic review and meta-analysis was to determine the accuracy
and clinical utility of the ADOS-2 and the ADI-R.
outlined in the Preferred Reporting Items for Systematic
Review and Meta-Analysis (PRISMA) of Diagnostic Test
Accuracy studies guidelines (McInnes et al. 2018) and the
Handbook for Diagnostic Test Accuracy Reviews (Deeks
2013) and was approved by the university Institutional
Review Board. This protocol was registered with PROS-
PERO 2018 (https ://www.crd.york.ac.uk/prosp ero/displ ay_
recor d.php?ID = CRD42018111589, Registration number:
CRD42018111589).
(ADOS‑2; Lord et al. 2012a)
vation and interaction session with an evaluator and the
child which is used to aid in the diagnosis of ASD. Only the
ADOS-2 and its direct precursors were considered as accept-
able index tests (i.e., ADOS-Toddler (Luyster et al. 2009;
ADOS-G with revised algorithms (Gotham et al. 2007)), as
they formed the basis for the WPS ADOS-2 publication. For
ease of reference, these will be referred to collectively as the
“ADOS-2.” Older ADOS versions were not considered eli-
gible index tests for the purpose of this study (i.e., ADOS-G
without the revised algorithms (Lord et al. 2000); PL-ADOS
(DiLavore et al. 1995)). Published ADOS-2 sensitivity (Se)
ranges from .60 to .95 and specificity (Sp) ranges from .75 to
1.00 (Lord et al. 2012a, b). A recent meta-analysis indicated
pooled Se ranging from .77 to .90 and Sp ranging from .62
to .90 for the ADOS-2 (Dorlack et al. 2018).
2003).
to a parent or caregiver by a trained clinician asking detailed
questions about development and underlying behaviors asso-
ciated with ASD. Each section of the algorithm has a raw
score cut-off, and a child must meet or exceed the cut-off
in all four sections to receive a classification of autism or
not autism if any domain cut-off is not exceeded. Se and Sp
were not published in the ADI-R manual; however, origi-
nal research literature conducted prior to measure publica-
tion indicated that Se varied widely and ranged from .19
to .88, and Sp was 1.00 (Cox et al. 1999; Gilchrist et al.
2001; Table 1). More recent literature suggests the Se of the
ADI-R ranges from .53 to .92, and Sp ranges from .62 to .95
(Risi et al. 2006; Falkmer et al. 2013). Studies which used
other algorithms such as the ADI-R Toddler diagnostic algo-
rithms or those developed by the Autism Genetic Research
Exchange (AGRE) were excluded given these algorithms are
not yet published for clinical use.
ADI-R at 20 months, diagnosis at 42 months 45 .19 1.00
ADI-R at 42 months, diagnosis at 42 months 45 .48 1.00
Gilchrist 2001 53 .88 1.00
1 3
the ADOS-2 to children under 18 years for the purpose of an
initial diagnostic evaluation in a clinical or research setting
were eligible. A clinical setting was defined as a commu-
nity setting where participants were not recruited specifically
for research. A research setting included any studies which
recruited participants for research. Some studies included
participants from both community and research settings and
were classified as such. Studies in which diagnostic tests
were administered to confirm ASD diagnosis or assess treat-
ment outcome were excluded. Data for all included articles
were collected in the United States, Canada, or the United
Kingdom and were published in English.
diagnosis of a comprehensive evaluation for ASD (i.e., ASD
or non-ASD), using the following conservative approach.
The comprehensive evaluation must have included any ver-
sion of the ADOS and any ASD-focused clinical interview.
Papers using the ADOS-2 as the index test were required to
include some type of ASD-focused clinical interview in the
evaluation, but this interview did not necessarily have to be
the ADI-R. For studies in which the ADI-R served as the
index test, the evaluation must have included the administra-
tion of any version of the ADOS (i.e., PL-ADOS, ADOS-G,
ADOS-G with revised algorithms, ADOS-T, or ADOS-2) but
did not need to include the ADOS-2 specifically. Papers were
excluded in which the ADI-R was administered but no ver-
sion of the ADOS was administered.
or non-ASD diagnosis (e.g., pre-determined algorithm based
on a combination of ADOS and ADI-R results) were not
included, given that this type of methodology for determin-
ing ASD diagnosis does not reflect clinical practice. Studies
which did not report a final consensus clinical diagnosis and
included only diagnoses reported by a parent, pediatrician,
or educator, and/or other forms of ASD diagnosis were not
included.
with prospective, retrospective, cross-sectional, or longi-
tudinal study designs. Case studies and case series were
excluded. Review articles, meta-analyses, and grey litera-
ture were not included, but citations within were reviewed.
cINFO, ERIC, PubMed/MEDLINE, Cochrane Database of
Systematic Reviews (including Cochrane Central Register
of Controlled Trials (CENTRAL)), Journal of Autism and
Developmental Disorders, Research in Autism Spectrum
Disorders, Autism Research, and Autism. Google Scholar
was used informally to identify keywords but not included
in the formal search strategy. All articles published since the
original publication date of each measure were considered
(ADI-R – 2003, ADOS-2 with revised algorithms—2007).
Detailed search terms are included in Online Appendix A.
racy Studies-2, Whiting et al. 2011) is a tool used in sys-
tematic reviews to evaluate risk of bias and applicability
concerns in diagnostic test accuracy studies related to patient
selection, index tests, reference standard, and flow and tim-
ing. The QUADAS-2 tool for this study was adapted and
operationalized from Vllasaliu et al. (2016).
for inclusion in the study. Citations from searches (n =
11,672) were exported into EndNote and duplicate articles
(n = 2,591) were eliminated automatically. An additional
949 duplicate articles were identified manually. Therefore,
8,132 unique citations were reviewed. All titles, abstracts,
and possibly relevant full-text articles were reviewed by
two authors (JL and MS). Given differences in initial article
eligibility identification between the two authors resulting
in low initial agreement (i.e., 48 articles, 23% agreement),
the inclusion criteria were clarified, and articles were re-
reviewed by the same two authors yielding 62% agreement.
Remaining discrepancies were rectified through discussion
between these two authors and the last author (SO) as well
as via outside review by two clinical psychologists with
research backgrounds and expertise in ASD. These outside
reviewers had 100% agreement with one another. These
procedures resulted in 22 articles deemed appropriate for
inclusion in the meta-analysis, with 14 articles included in
the ADOS-2 analyses and 13 papers included in the ADI-R
analyses. Despite the complexity of the inclusion criteria,
the additional steps taken to rectify initial low agreement
likely resulted in the inclusion of all appropriate papers in
the meta-analysis.
1 3
1 3
(TN), false negatives (FN), Se, and Sp for the ADOS-2 and/
or the ADI-R classifications were extracted by two authors
(JL and CC). For some articles, these metrics were stated
directly in the text or presented in supplementary materials.
For articles in which these numbers were not directly stated,
these statistics were calculated using the Review Manager
(RevMan) software provided by Cochrane Library (Review
Manager 2014). A total of 116 data points was extracted by
JL and CC, and 112 data points were agreed upon (97%)
across the 22 articles. Discrepancies were identified as errors
due to referring to wrong text in the table (n = 2) or typo-
graphical or calculation errors (n = 2).
hierarchical summary receiver operating characteristic
(HSROC) model of Rutter and Gatsonis (Rutter 1995; Rut-
ter and Gatsonis 2001) was conducted using the MetaDAS
SAS macro (Takwoingi and Deeks 2010). This model pro-
duces pooled Se and Sp and accounts for the correlation
between Se and Sp across studies. Separate pooling of Se
and Sp results in underestimation of these statistics, since
it does not take into account the inherent trade-off between
these statistics (Deeks 2001). Positive and negative predic-
tive values are influenced by prevalence in the sample, which
introduce heterogeneity and uncertainty. The chosen method
for statistical analysis uses a Bayesian model to determine
random effects and was preferred to fixed effects due to
the large amount of heterogeneity commonly seen among
diagnostic test accuracy studies. Additionally, the HSROC
method is recommended when covariates are included in the
model. This model also produces the Diagnostic Odds Ratio
(DOR), a global estimate of overall test accuracy. The DOR
is a summary of the diagnostic accuracy of a test and can be
interpreted as how many times higher the odds are of a per-
son with ASD to score in the ASD range on the diagnostic
test compared to someone without ASD. DOR can be used
to interpret and compare across tests and models.
ADOS-2 and the ADI-R. The HSROC model was computed
with and without the setting covariate to determine whether
setting had an effect on diagnostic test accuracy. The setting
covariate included three groups: clinical, research, and both.
three groups. For the ADOS-2 analyses, having three groups
did not allow the model to converge. Therefore, the “both”
group was combined with the “research” group for the
“both” group with the “research group” was viewed as the
more conservative approach compared with combining the
“both” and “clinical” groups. If, as hypothesized, the admin-
istration of the ADOS-2 in research settings was more accu-
rate than clinical settings, including articles with clinical
evaluations in the “research” group would dilute the accu-
racy of the ASD diagnostic measures within the “research”
setting and reduce the difference in accuracy of the ADOS-2
in clinical and research settings in this study.
visually inspected for outliers, with one article with low
Sp identified as an outlier in the ADOS-2 analysis. Study
characteristics were reviewed, and low sensitivity was
likely due to the clinical population, which included many
children with severe developmental and behavioral chal-
lenges, resulting in many false positives on the ADOS-2.
Although these children are often excluded from research
studies, they present for clinical evaluations, and it is
important to investigate the accuracy of diagnostic meas-
ures in these populations. However, these study results
may not generalize to other clinical settings given the
sample characteristics. Therefore, the authors conducted
analyses both with and without the outlier. Sp analyses
were conducted by removing the outlier article and repeat-
ing the analyses. Results were compared with and without
the outlier to determine the effect of this specific study on
the results, as discussed below. No outliers were identified
for the ADI-R analysis.
rate Se and Sp estimates based on differing criteria from
the Diagnostic and Statistical Manual for Mental Disorders,
Fourth Edition (DSM-IV, American Psychiatric Association
2000) for two instances: Autism (i.e., Autistic Disorder) vs.
Non-spectrum (NS) and ASD vs. NS. In the Autism vs.
NS analysis, PDD-NOS and Asperger Disorder cases were
excluded and ADOS-2 classifications of ASD were classified
as non-spectrum. In the ASD vs NS condition, children with
Autistic Disorder were excluded and ADOS-2 classifications
of “autism spectrum” and “autism” were both considered
classifications of ASD. For the purposes of the current study,
including both estimates in a single analysis would result
in the inclusion of the non-spectrum cases more than once,
thus separate analyses were conducted for the Autism vs.
NS and ASD vs. NS estimates for the Gotham et al. articles.
Additionally, results were analyzed both with and without
the outlier. For clarity, analytic approaches for the ADOS-2
are defined in Table 2.
1 3
included in the meta-analysis.
studies including risk of bias and applicability concerns.
(54%), and there were concerns regarding the use of the
reference standard (i.e., unclear or high risk for all articles).
This was primarily due to the clinicians’ knowledge of the
results of the index tests prior to the implementation of
the reference standard, as opposed to using blind raters to
come to a diagnostic conclusion. This is common practice
in clinical settings, as the index tests (i.e., the ASD diagnos-
tic measures) are inextricably linked and used as a primary
source of information in the reference standard (i.e., ASD
diagnostic evaluation and final clinical diagnosis) (Figs 3
and 4). Overall, there was low risk of bias from the index
tests, flow and timing, and applicability of the findings to
practice.
ADOS-2 as well as individual estimates for identified articles
2008
2 Included Autism vs. NS
3 Excluded ASD vs. NS
4 Excluded Autism vs. NS
5 Included Excluded
6 Excluded Excluded
b Diagnosis deferred n = 14
c 4 years for younger group, 9 years for older group
d One participant diagnosis not reported
2. Bishop et al. 2017 X X 289 203 86 8 years 142 126
3. Camodeca 2018 X 483 355 128 10 years 127 356
4. Dykens 2017 X 146 72 74 11 years 32 114
5. Gillentine 2017 X X 18 12 6 9 years 7 10
6. Gotham et al. 2007 X 1630 a a 41 to 104 months 1,351 279
7. Gotham et al. 2008 X 1282 923 359 37 to 118 months 1,068 214
8. Grzadzinski 2016 X X 212 176 36 9 years 164 48
9. Guthrie 2013 X 82b 64 18 19 months 56 12
10. Harris 2008 X 63 63 0 8 years 38 25
11. Havdahl 2016 X 389 288 101 c 255 163
12. Kim 2012 X 695d 353 160 33 months 491 203
13. Le Couteur 2008 X 101 81 20 36 months 77 24
14. Luyster 2009 X 206 158 48 15 to 26 months 59 147
15. Mazefsky 2006 X 78 56 22 4 years 59 19
16. Molloy 2011 X 584 507 77 3 to 9 years 329 255
17. Risi 2006 X 1039 818 221 27 to 94 months 881 158
18. Ventola 2006 X 45 37 8 26 months 36 9
19. Wiggins 2008 X 142 112 30 26 months 73 69
20. Wiggins 2015 X X 922 581 341 59 months 584 338
21. Ziats 2016 X X 18 14 4 14 years 8 10
22. Zwaigenbaum 2016 X 381 215 166 39 months 103 278
1 3
and applicability concerns
1 3
Fig. 5. These estimates were generally comparable to pub-
lished algorithms (Table 5). Addition of the setting covariate
ting types (clinical, research, and both) when all articles
were included in the analysis and the Gotham et al. (2007,
2008) ASD vs. NS accuracy estimates were used (Approach
1) suggests research samples have higher levels of accuracy
compared with clinical samples and combined clinical and
research samples. When the outlier (Sp =.44) was removed
from the analysis (Fig. 4), and the ASD vs. NS accuracy
estimates were used (Approach 3), visual inspection of the
SROC curve suggests there was not a difference between
accuracy of the ADOS-2 in research and clinical settings, and
accuracy of the ADOS-2 for studies including both research
and clinical evaluations was lower than either research or
clinical settings individually.
articles ranged widely (Se =.33–1.00, Sp =.61–1.00, see
Fig. 6 and Table 6).
to the model without the covariate trended toward signif-
icance (−2LL difference = 11.788, p = .067, see Fig. 7).
Clinical and research samples had comparable Se (clinical
= .71, research = .73) but articles utilizing both research
and clinical samples had higher Se (.82). Sp was higher for
research samples (.85) compared to clinical samples (.72)
and those including both research and clinical evaluations
in the study (.76, see Table 6 and Fig. 6).
to investigate the accuracy of the ADOS-2 and the ADI-R
in clinical settings compared to research settings, and it was
hypothesized that these measures would perform better in
research settings given the heterogeneity and complexity of
children referred for an ASD evaluation in clinical samples.
ADOS-2 accuracy from the meta-analysis was comparable
included). Note: Size of shape indicates sample size
excluded). Note: Size of shape indicates sample size
1 3
accurate than the ADI-R in both research and clinical set-
tings. For the ADI-R, the current meta-analysis painted a
more nuanced picture than the literature cited in the pub-
lished manual with overall Se of .75 and Sp of .82, and the
ADI-R was less accurate in clinical studies compared to
research-only studies or those utilizing both research and
clinical samples.
evaluated in clinical settings with those whose evalua-
tions were completed in research settings (or which used a
combination of clinical and research evaluations), analyses
indicated Se was comparable across settings and Sp results
were mixed. Some analyses indicated comparable or slightly
lower Sp in clinical compared to research samples, whereas
when an outlier was excluded, results showed that Sp in clin-
ical samples was higher than research samples. This suggests
specificity of ADOS-2 overall
and by evaluation setting
available
* p < .05
2 14 .92 .83 52.7 11 .92 .83 59.2 3 .89 .80 30.9 5.81 .120
3 13 .89 .83 42.3 11 .89 .81 36.3 2 .88 .90 71.1 3.23 .357
4 13 .92 .85 61.9 11 .92 .83 59.7 2 .88 .90 70.8 2.83 .418
5 12 .91 .81 47.0 9 .93 .81 53.8 3 .89 .80 31.0 7.87 .049*
6 11 .92 .84 55.7 – – – – – – – – 7.53 .057
specificity of published ADOS
algorithms
Sp specificity, “–”data not available
NVMA > 15 mo.
Module 2, younger .98 .93 .84 .77 .94 1.00 .65 .88
Module 2, older .98 .90 .83 .83 – – – –
Module 3 .91 .84 .72 .76 .82 .92 .60 .75
1 3
conducted in solely clinical settings, a single article can
have a large effect on results. Therefore, more research is
needed to further examine ADOS-2 performance in clinical
evaluations. Given current findings, Sp of the ADOS-2 may
be more variable across clinical settings, whereas Se may
remain relatively stable.
sources of heterogeneity were not investigated due to the
limited number of eligible articles identified for inclusion.
One consideration is the shifting definition of autism spec-
trum disorder over time. Current diagnoses are based on cri-
teria for Autism Spectrum Disorder outlined in the Diagnos-
tic and Statistical Manual of Mental Disorders, Fifth Edition
(DSM-5, American Psychiatric Association 2013), which
conceptualizes ASD as a single disorder with differing levels
of severity. The DSM-IV defined multiple types of autism
spectrum disorders including Asperger’s Disorder; Autis-
tic Disorder; and Pervasive Developmental Disorder, Not
Otherwise Specified (PDD-NOS). However, these disorders
could not be reliably differentiated, which led to the revi-
sion of the diagnostic criteria in the DSM-5. The ADOS-2
revised algorithms reflect this change in conceptualization of
ASD. However, the ADI-R has not yet been updated, and the
ADI-R manual states the measure only reliably differentiates
between those with Autistic Disorder (DSM-IV) and other
non-spectrum conditions, not those with milder ASD symp-
toms. These factors further complicate the already multifac-
eted ASD diagnostic process. Within the research literature,
ADI-R algorithms have been developed to capture milder
presentations of ASD. However, these types of algorithms
are used predominately for research, have not been published
for clinical use, and are not widely used clinically. Given
that a primary aim of this study was to investigate how
tion setting
Research 9 .73 .85 15.8
Both 2 .82 .76 15.9
Clinical 2 .71 .72 6.2
sample size
1 3
ing alternative and less disseminated diagnostic algorithms
were not included in the meta-analysis. Wider clinical use of
these types of algorithms may be beneficial in improving the
diagnostic accuracy of the ADI-R. This distinction further
emphasizes the importance of clinical expertise in accurate
differential diagnosis of ASD from other non-ASD condi-
tions and neurodevelopmental disorders which impact social
communication (Maddox et al. 2017; Reaven et al. 2008).
applicability of the results to practice. This is likely due to
the eligibility criteria of the studies included in the meta-
analysis, which specified the types of measures and evalu-
ations which were considered acceptable based on current
clinical practice. However, nearly half of the articles had
high risk of bias regarding patient selection, most often due
to not enrolling a consecutive or random sample of par-
ticipants in the study. Additionally, high or unclear levels
regarding risk of bias of the reference standard were indi-
cated for all studies. This is inherent in the nature of conduct-
ing ASD evaluations, as the reference standard (i.e., outcome
diagnosis) was almost always interpreted with knowledge of
the index tests (e.g., ADOS-2), as is true in clinical practice.
Clinicians making ASD diagnoses were therefore not blind
to the results of the index tests; in fact, clinicians utilize the
results of the index tests as part of the information used to
make the final clinical diagnosis. Therefore, the reference
standard is inherently influenced by the results of the index
tests and cannot be interpreted separately. Although some
research studies may consider utilizing techniques to miti-
gate these concerns of bias, including having outside video
reviewers or independent re-evaluations, this does not occur
clinically. The accuracy of these measures in clinical prac-
tice is predicated on the clinician administering and scor-
ing the measures accurately, without outside confirmation.
Given that a primary goal of this study was to investigate the
utility of these measures in clinical practice, and the index
test results and reference standard are inextricably linked in
this type of evaluation, this bias is considered inherent in any
comprehensive clinical evaluation for ASD.
considered for inclusion in this meta-analysis since the reli-
ability of the information presented from other types of
sources can be variable and difficult to determine. However,
many other sources of potentially useful information were
excluded. There is a clear publication bias within the inter-
vention literature wherein studies with negative findings are
often not accepted for publication, but this bias is less often
observed in studies focused on diagnostics. There may be a
publication bias regarding the level of training completed
Two levels of training are available for the ADOS-2 and the
ADI-R: clinical training, which is for professionals using the
measure in clinical practice, and research training, which
is designed for those who use the instrument for research.
The clinical training is a prerequisite for the research train-
ing. The majority of professionals utilizing the ADOS-2
and the ADI-R in clinical practice likely have completed the
clinical training but have not attended the research training.
However, peer-reviewed journals may favor publication of
studies utilizing research-reliable clinicians. Therefore, the
identified sensitivity and specificity in clinical settings in
this study may overestimate the accuracy of these meas-
ures when conducted by providers who only have completed
the clinical training, but the effect of excluding non-peer-
reviewed articles is not known.
only articles conducted in the United States, Canada, and
the United Kingdom. Sociocultural and language factors
are crucial to consider when conducting ASD evaluations.
Although many language translations are available for both
the ADOS-2 and the ADI-R, these measures were initially
designed in English using Western sociocultural norms, and
the vast majority of research and development of these meas-
ures was conducted under similar parameters. Notably, the
language of test administration was not reported for all but
one of the studies included in this meta-analysis. Therefore,
restricting the inclusion criteria to research conducted in
the United States, Canada, and the United Kingdom was
determined to be the best method available as a proxy to
representing the sample for which these measures were ini-
tially developed. It would be beneficial for future articles to
directly specify the language in which the evaluations were
conducted and the sociocultural background of the partici-
pants and their families.
and the ADI-R determined that the ADOS-2 is more accu-
rate than the ADI-R. The ADOS-2 indicated high levels of
sensitivity and specificity across settings, and it should be
considered for any ASD evaluation. ASD diagnostic meas-
ures may be less accurate in clinical compared to research
settings, but more research utilizing solely clinical popula-
tions is needed.
Libraries Reference Department for their support in formalizing and
improving the search strategy for this project, Sarah Ryan, Ph.D. and
1 3
Long, Ph.D. for assistance with biostatistical analyses. This research
was supported in part by the Health Resources and Services Adminis-
tration (HRSA) Maternal and Child Health Bureau (MCH) Leadership
Education in Neurodevelopmental and Related Disabilities (LEND; PI:
Biasini), UAB Civitan International Science Center and Foundation for
Children with Intellectual and Developmental Disabilities McNulty
Scientist Award (O’Kelley), and the UAB Civitan-Sparks Clinics.
supervised by SO. JL and MS conducted the literature searches and
determined article eligibility. JL and CD completed data extraction,
and JL conducted the statistical analysis. JL wrote the first draft of the
manuscript and SO provided substantial edits and guidance. All authors
have approved the final manuscript.
hagen: The Nordic Cochrane Centre, The Cochrane Collaboration,
2014.
manual of mental disorders (4th ed., Text Revision). Washington,
DC: Author.
manual of mental disorders (5th ed.). Arlington, VA: Author.
D., & Charman, T. (2006). Prevalence of disorders of the autism
spectrum in a population cohort of children in South Thames: the
Special Needs and Autism Project (SNAP). Lancet, 368, 210–15.
behavior checklist 6–18 in autism diagnosis. Research in
Autism Spectrum Disorder, 51, 75–85. https ://doi.org/10.1016/j.
rasd.2018.04.004.
Duncan, A., et al. (2017). The Autism Symptom Interview,
School-Age: A brief telephone interview to identify autism spec-
trum disorders in 5-to-12-year-old children. Autism Research,
10(1), 78–88. https ://doi.org/10.1002/aur.1645.
ham, J., et al. (1999). Autism spectrum disorders at 20 and 42
months of age: stability of clinical and ADI-R diagnosis. Journal
of Child Psychology and Psychiatry, and Allied Disciplines, 40(5),
719–32.
mar, F., & Minderaa, R. (2004). Interrelationship between autism
diagnostic observation schedule-generic (ADOS-G), autism diag-
nostic interview-revised (ADI-R), and the diagnostic and statis-
tical manual of mental disorders (DSM-IV-TR) classification
in children and adolescents with mental retardation. Journal of
Autism and Developmental Disorders, 34(2), 129–137.
Engeland, H., & de Jonge, M. V. (2009). Evaluation of the ADOS
revised algorithm: the applicability in 558 Dutch children and
adolescents. Journal of Autism and Developmental Disorders,
39(9), 1350–8. https ://doi.org/10.1007/s1080 3-009-0749-9.
screening tests. British Medical Journal, 323, 157–162.
book for Systematic Reviews of Diagnostic Test Accuracy Version
org/.
P. (2012). How useful are screening instruments for toddlers to
predict outcome at age 4? General development, language skills,
and symptom severity in children with a false positive screen for
autism spectrum disorder. European Child Adolesc Psychiatry,
21(10), 541–551.
autism diagnostic observation schedule. Journal of Autism and
Developmental Disorders, 25(4), 355–379.
parative analysis of the ADOS-G and ADOS-2 algorithms:
preliminary findings. Journal of Autism and Developmental
Disorders, 1–12.
Shivers, C. M., et al. (2017). Diagnoses and characteristics of
autism spectrum disorders in children with Prader-Willi syn-
drome. Journal of Neurodevelopmental Disorders, 9(18), 1–12.
https ://doi.org/10.1186/s1168 9-017-9200-2.
nostic procedures in autism spectrum disorders: A systematic
literature review. European Child & Adolescent Psychiatry,
22(6), 329–40. https ://doi.org/10.1007/s0078 7-013-0375-0.
A. (2001). Development and current functioning in adolescents
with Asperger syndrome: a comparative study. Journal of Child
Psychology and Psychiatry, and Allied Disciplines, 42(2), 227–40.
J., Guffey, D., et al. (2017). The cognitive and behavioral phe-
notypes of individuals with CHRNA7 duplications. Journal of
Autism and Developmental Disorders, 47(3), 549–562. https ://
doi.org/10.1007/s1080 3-016-2961-8.
Carter, A., et al. (2008). A Replication of the autism diagnos-
tic observation schedule (ADOS) revised algorithms. Journal of
the American Academy of Child & Adolescent Psychiatry, 47(6),
642–651. https ://doi.org/10.1097/CHI.0b013 e3181 6bffb 7.
nostic Observation Schedule: Revised algorithms for improved
diagnostic validity. Journal of Autism and Developmental Dis-
orders, 37(4), 613.
Diagnostic Interview-Revised and the Autism Diagnostic Obser-
vation Schedule with young children with developmental delay:
evaluating diagnostic validity. Journal of Autism and Develop-
mental Disorders, 38(4), 657–667.
reported and clinician-observed autism spectrum disorder (ASD)
symptoms in children with attention deficit/hyperactivity disor-
der (ADHD): implications for practice under DSM-5. Molecular
Autism, 7(7), 1–12. https ://doi.org/10.1186/s1322 9-016-0072-1.
Early diagnosis of autism spectrum disorder: stability and change
in clinical diagnosis and symptom presentation. Journal of Child
Psychiatry, 54(5), 582–590. https ://doi.org/10.1111/jcpp.12008 .
man, S., Barbato, I., et al. (2008). Autism profiles of males
with fragile X syndrome. American Journal on Intellectual
and Developmental Disabilities, 113(6), 427–438. https ://doi.
org/10.1352/2008.113:427-438.
L. (2016). Utility of the child behavior checklist as a screener for
autism spectrum disorder. Autism Research, 9(1), 33–42. https ://
doi.org/10.1002/aur.1515.
1 3
King, B. H., et al. (2017). Autism spectrum disorder: Consensus
guidelines on assessment, treatment and research from the British
Association for Psychopharmacology. Journal of Psychopharma-
cology, 32(1), 3–29. https ://doi.org/10.1177/02698 81117 74176 6.
mann, A., Mingebach, T., Poustka, L., Weber, L., Schmidt, H.,
Smidt, J., Stehr, T., Roessner, V., Kucharczyk, K., Wolff, N.,
& Stroth, S., (2018). Diagnostic accuracy of the ADOS and
ADOS-2 in clinical practice. European Child \& Adolescent
Psychiatry, 1–15.
ple sources for the diagnosis of autism spectrum disorders for
toddlers and young preschoolers from 12 to 47 months of age.
Journal of Child Psychology and Psychiatry, 53(2), 143–151.
I. (2017). Diagnostic utility of the autism diagnostic observa-
tion schedule in a clinical sample of adolescents and adults.
Research in Autism Spectrum Disorders, 34, 34–43.
Diagnosing autism spectrum disorders in pre-school children
using two standardised assessment instruments: The ADI-R and
the ADOS. Journal of Autism and Developmental Disorders, 38,
362–372. https ://doi.org/10.1007/s1080 3-007-0403-3.
DiLavore, P. C., et al. (2000). The autism diagnostic observation
schedule – generic: A standard measure of social and communi-
cation deficits associated with the spectrum of autism. Journal
of Autism and Developmental Disorders, 30(3), 205–223.
S. (2012). Autism diagnostic observation schedule: ADOS-2.
Los Angeles, CA: Western Psychological Services.
S. (2012b). ADOS-2. Autism Diagnostic Observation Schedule.
Manual (Part I): Modules 1-4. Western Psychological Services
Los Angeles, CA.
K., et al. (2009). The Autism diagnostic observation schedule
– toddler module: A new module of a standardized diagnostic
measure for autism spectrum disorders. Journal of Autism and
Developmental Disorders, 39(9), 1305–1320.
K., Hostager, J., et al. (2017). The accuracy of the ADOS-2
in identifying autism among adults with complex psychiatric
conditions. Journal of Autism and Developmental Disorders,
47(9), 2703–2709. https ://doi.org/10.1007/s1080 3-017-3188-z.
and diagnostic utility of the ADOS–G, ADI–R, and GARS for
children in a clinical setting. Autism, 10(6), 533–549. https ://
doi.org/10.1177/13623 61306 06850 5.
Bossuyt, P. M., Clifford, T., et al. (2018). Preferred reporting
items for a systematic review and meta-analysis of diagnos-
tic test accuracy studies: the PRISMA-DTA statement. JAMA,
319(4), 388–396.
Courtney, P. (2011). Use of the autism diagnostic observation
schedule (ADOS) in a clinical setting. Autism, 15(2), 143–162.
https ://doi.org/10.1177/13623 61310 37924 1.
Child and family characteristics moderate agreement between
caregiver and clinician report of autism symptoms. Autism
Research, 11(3), 476–487.
M., Visser, J., et al. (2010). Improved diagnostic validity of the
ADOS revised algorithms: a replication study in an independent
689–703. https ://doi.org/10.1007/s1080 3-009-0915-0.
Giouroukou, E., Pehlivanidis, A., et al. (2009). Using the autism
diagnostic interview-revised and the autism diagnostic obser-
vation schedule-generic for the diagnosis of autism spectrum
disorders in a Greek sample with a wide range of intellectual
abilities. Journal of Autism and Developmental Disorders,
39(3), 414–420.
Systematic review of clinical guidance documents for autism
spectrum disorder diagnostic assessment in select regions.
Autism, 22(5), 517–527.
ADOS and ADI-R in children with psychosis: Importance of
clinical judgment. Clinical Child Psychology and Psychiatry,
13(1), 81–94. https ://doi.org/10.1177/13591 04507 08634 3.
P., et al. (2006). Combining information from multiple sources
in the diagnosis of autism spectrum disorders. Journal of the
American Academy of Child and Adolescent Psychiatry, 45(9),
1094–103.
nostic test data. Academic Radiology, 2, S48–S56.
approach to meta-analysis of diagnostic test accuracy evalua-
tions. Statistics in Medicine, 20(19), 2865–2884.
tic interview-revised. Los Angeles, CA: Western Psychological
Services, 29, 30.
ing best practices for the diagnosis of autism: A comparison
between individual healthcare practitioner diagnosis and trans-
disciplinary assessment. Nevada Journal of Public Health,
11(1), 1.
analysis of diagnostic accuracy studies. User Guide Version 1.3.
2010 July. http://srdta .cochr ane.org/.
Shaw, J. B. (2007). Improving the reliability of autism diag-
noses: Examining the utility of adaptive behavior. Journal of
Autism and Developmental Disorders, 37(5), 921–928.
J., et al. (2006). Agreement among four diagnostic instruments
for autism spectrum disorders in toddlers. Journal of Autism and
Developmental Disorders, 36(7), 839–47.
Schütz, M., et al. (2016). Diagnostic instruments for autism
spectrum disorder (ASD). The Cochrane Library.
Deeks, J. J., Reitsma, J. B., et al. (2011). QUADAS-2: A revised
tool for the quality assessment of diagnostic accuracy stud-
ies. Annals of Internal Medicine, 155(8), 529–536. https ://doi.
org/10.7326/0003-4819-155-8-20111 0180-00009 .
P., Blaskey, L., et al. (2015). Using standardized diagnostic
instruments to classify children with autism in the Study to
Explore Early Development. Journal of Autism and Develop-
mental Disorders, 45, 1271–1280. https ://doi.org/10.1007/s1080
3-014-2287-3.
ADI-R behavioral domain improves diagnostic agreement in
toddlers. Journal of Autism and Developmental Disorders, 38,
972–976. https ://doi.org/10.1007/s1080 3-007-0456-3.
combined use of the autism diagnostic interview-revised and the
1 3
a clinical Swedish sample of toddlers and young preschoolers.
Autism, 19(2), 187–199.
Elmund, A., et al. (2016). The objectivity of the Autism Diag-
nostic Observation Schedule (ADOS) in naturalistic clinical set-
tings. European Child Adolescent Psychiatry, 25(7), 769–780.
et al. (2017). The interrater reliability of the autism diagnostic
interview-revised (ADI-R) in clinical settings. Psychopathol-
ogy, 50(3), 219–227.
Guffey, D., et al. (2016). Genetics in Medicine, 18(11), 1111–
1118. https ://doi.org/10.1038/gim.2016.9.
Szatmari, P., et al. (2016). Stability of diagnostic assessment for
autism spectrum disorder between 18 and 36 months in a high-
risk cohort. Autism Research, 9, 790–800. https ://doi.org/10.1002/
aur.1.
jurisdictional claims in published maps and institutional affiliations.
Introduction
Methods
Measures for Index Tests
Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; Lord et al. 2012a)
Autism Diagnostic Interview, Revised (ADI-R; Rutter et al. 2003).
Eligibility Criteria
Reference Standard for Diagnosis
Study Design
Search Strategy
Assessment of Methodological Quality
Study Selection
Data Extraction
Data Analysis
Outliers and Sensitivity Analysis
Quality of the Included Studies
Diagnostic Accuracy of Measures
ADOS-2
ADI-R
Sources of Heterogeneity
Risk and Sources of Bias
Conclusion
Acknowledgements
References