bcba exam review bcbastudy pass the aba exam aba notes

Using Single Subject Experimental Designs

single subject experimental designs applied behavior analysis

What are the Characteristics of Single Subject Experimental Designs?

Single-subject designs are the staple of applied behavior analysis research. Those preparing for the BCBA exam or the BCaBA exam must know single subject terms and definitions. When choosing a single-subject experimental design, ABA researchers are looking for certain characteristics that fit their study. First, individuals serve as their own control in single subject research. In other words, the results of each condition are compared to the participant’s own data. If 3 people participate in the study, each will act as their own control. Second, researchers are trying to predict, verify, and replicate the outcomes of their intervention. Prediction, replication, and verification are essential to single-subject design research and help prove experimental control. Prediction: the hypothesis related to what the outcome will be when measured Verification : showing that baseline data would remain consistent if the independent variable was not manipulated Replication: repeating the independent variable manipulation to show similar results across multiple phases Some experimental designs like withdrawal designs are better suited for demonstrating experimental control than others, but each design has its place. We will now look at the different types of single subject experimental designs and the core features of each.

Reversal Design/Withdrawal Design/A-B-A

Arguably the simplest single subject design, the reversal/withdrawal design is excellent at identifying experimental control. First, baseline data is recorded. Then, an intervention is introduced and the effects are recorded. Finally, the intervention is withdrawn and the experiment returns to baseline. The researcher or researchers then visually analyze the changes from baseline to intervention and determine whether or not experimental control was established. Prediction, verification, and replication are also clearly demonstrated in the withdrawal design. Below is a simple example of this A-B-A design.

reversal design withdrawal design

Advantages: Demonstrate experimental control Disadvantages: Ethical concerns, some behaviors cannot be reversed, not great for high-risk or dangerous behaviors

Multiple Baseline Design/Multiple Probe Design

Multiple baseline designs are used when researchers need to measure across participants, behaviors, or settings. For instance, if you wanted to examine the effects of an independent variable in a classroom, in a home setting, and in a clinical setting, you might use a multiple baseline across settings design. Multiple baseline designs typically involve 3-5 subjects, settings, or behaviors. An intervention is introduced into each segment one at a time while baseline continues in the other conditions. Below is a rough example of what a multiple baseline design typically looks like:

multiple baseline design single subject design

Multiple probe designs are identical to multiple baseline designs except baseline is not continuous. Instead, data is taken only sporadically during the baseline condition. You may use this if time and resources are limited, or you do not anticipate baseline changing. Advantages: No withdrawal needed, examine multiple dependent variables at a time Disadvantages : Sometimes difficult to demonstrate experimental control

Alternating Treatment Design

The alternating treatment design involves rapid/semirandom alternating conditions taking place all in the same phase. There are equal opportunities for conditions to be present during measurement. Conditions are alternated rapidly and randomly to test multiple conditions at once.

alternating treatment design applied behavior analysis

Advantages: No withdrawal, multiple independent variables can be tried rapidly Disadvantages : The multiple treatment effect can impact measurement

Changing Criterion Design

The changing criterion design is great for reducing or increasing behaviors. The behavior should already be in the subject’s repertoire when using changing criterion designs. Reducing smoking or increasing exercise are two common examples of the changing criterion design. With the changing criterion design, treatment is delivered in a series of ascending or descending phases. The criterion that the subject is expected to meet is changed for each phase. You can reverse a phase of a changing criterion design in an attempt to demonstrate experimental control.

changing criterion design aba

Summary of Single Subject Experimental Designs

Single subject designs are popular in both social sciences and in applied behavior analysis. As always, your research question and purpose should dictate your design choice. You will need to know experimental design and the details behind single subject design for the BCBA exam and the BCaBA exam. For BCBA exam study materials check out our BCBA exam prep. For a full breakdown of the BCBA fifth edition task list, check out our YouTube :

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

A Meta-Analysis of Single-Case Research on Applied Behavior Analytic Interventions for People With Down Syndrome

Affiliation.

  • 1 Nicole Neil, Ashley Amicarelli, Brianna M. Anderson, and Kailee Liesemer, Western University, Canada.
  • PMID: 33651891
  • DOI: 10.1352/1944-7558-126.2.114

This systematic review evaluates single-case research design studies investigating applied behavior analytic (ABA) interventions for people with Down syndrome (DS). One hundred twenty-five studies examining the efficacy of ABA interventions on increasing skills and/or decreasing challenging behaviors met inclusion criteria. The What Works Clearinghouse standards and Risk of Bias in N-of-1 Trials scale were used to analyze methodological characteristics, and Tau-U effect sizes were calculated. Results suggest the use of ABA-based interventions are promising for behavior change in people with DS. Thirty-six high-quality studies were identified and demonstrated a medium overall effect. A range of outcomes was targeted, primarily involving communication and challenging behavior. These outcomes will guide future research on ABA interventions and DS.

Keywords: Down syndrome; Tau-U; applied behavior analysis; single-case research.

PubMed Disclaimer

Similar articles

  • Communication intervention for individuals with Down syndrome: Systematic review and meta-analysis. Neil N, Jones EA. Neil N, et al. Dev Neurorehabil. 2018 Jan;21(1):1-12. doi: 10.1080/17518423.2016.1212947. Epub 2016 Aug 18. Dev Neurorehabil. 2018. PMID: 27537068 Review.
  • The future of Cochrane Neonatal. Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
  • Behavioral and Pharmacotherapy Weight Loss Interventions to Prevent Obesity-Related Morbidity and Mortality in Adults: An Updated Systematic Review for the U.S. Preventive Services Task Force [Internet]. LeBlanc EL, Patnode CD, Webber EM, Redmond N, Rushkin M, O’Connor EA. LeBlanc EL, et al. Rockville (MD): Agency for Healthcare Research and Quality (US); 2018 Sep. Report No.: 18-05239-EF-1. Rockville (MD): Agency for Healthcare Research and Quality (US); 2018 Sep. Report No.: 18-05239-EF-1. PMID: 30354042 Free Books & Documents. Review.
  • A Systematic Review of Interventions to Promote Varied Social-Communication Behavior in Individuals With Autism Spectrum Disorder. Wolfe K, Pound S, McCammon MN, Chezan LC, Drasgow E. Wolfe K, et al. Behav Modif. 2019 Nov;43(6):790-818. doi: 10.1177/0145445519859803. Epub 2019 Jul 26. Behav Modif. 2019. PMID: 31347382
  • The Effects of a Peer-Delivered Social Skills Intervention for Adults with Comorbid Down Syndrome and Autism Spectrum Disorder. Davis MAC, Spriggs A, Rodgers A, Campbell J. Davis MAC, et al. J Autism Dev Disord. 2018 Jun;48(6):1869-1885. doi: 10.1007/s10803-017-3437-1. J Autism Dev Disord. 2018. PMID: 29274009
  • Role of child demographic, executive functions, and behavioral challenges on feelings about parenting among parents of youth with Down syndrome. Soltani A, Esbensen AJ. Soltani A, et al. Res Dev Disabil. 2024 May;148:104717. doi: 10.1016/j.ridd.2024.104717. Epub 2024 Mar 12. Res Dev Disabil. 2024. PMID: 38479073
  • Dohsa-hou for unexplained regression in Down syndrome in a 19-year-old man: A case report. Fujino H, Moritsugu A. Fujino H, et al. Clin Case Rep. 2022 May 15;10(5):e05827. doi: 10.1002/ccr3.5827. eCollection 2022 May. Clin Case Rep. 2022. PMID: 35600012 Free PMC article.

Publication types

  • Search in MeSH

Related information

Linkout - more resources, full text sources.

  • Silverchair Information Systems

Other Literature Sources

  • scite Smart Citations
  • MedlinePlus Health Information

Miscellaneous

  • NCI CPTAC Assay Portal

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Writing Center
  • Brightspace
  • Campus Directory
  • My Library Items

Banner Image

Applied Behavior Analysis: Single Subject Research Design

  • Find Articles
  • Formatting the APA 7th Paper
  • Using crossref.org
  • Single Subject Research Design

Terms to Use for Articles

"reversal design" OR "withdrawal design" OR "ABAB design" OR "A-B-A-B design" OR "ABC design" OR "A-B-C design" OR "ABA design" OR "A-B-A design" OR "multiple baseline" OR "alternating treatments design" OR "multi-element design" OR "multielement design" OR "changing criterion design" OR "single case design" OR "single subject design" OR “single case series" or "single subject" or "single case"

Go To Databases

  • ProQuest Education Database This link opens in a new window ProQuest Education Database indexes, abstracts, and provides full-text to leading scholarly and trade publications as well as reports in the field of education. Content includes primary, secondary, higher education, special education, home schooling, adult education, and more.
  • PsycARTICLES This link opens in a new window PsycARTICLES, from the American Psychological Association (APA), provides full-text, peer-reviewed scholarly and scientific articles in the field of psychology. The database is indexed using APA's Thesaurus of Psychological Index Terms®.

Research Hints

Stimming – or self-stimulatory behaviour – is  repetitive or unusual body movement or noises . Stimming might include:

  • hand and finger mannerisms – for example, finger-flicking and hand-flapping
  • unusual body movements – for example, rocking back and forth while sitting or standing
  • posturing – for example, holding hands or fingers out at an angle or arching the back while sitting
  • visual stimulation – for example, looking at something sideways, watching an object spin or fluttering fingers near the eyes
  • repetitive behaviour – for example, opening and closing doors or flicking switches
  • chewing or mouthing objects
  • listening to the same song or noise over and over.

How to Search for a Specific Research Methodology in JABA

Single Case Design (Research Articles)

  • Single Case Design (APA Dictionary of Psychology) an approach to the empirical study of a process that tracks a single unit (e.g., person, family, class, school, company) in depth over time. Specific types include the alternating treatments design, the multiple baseline design, the reversal design, and the withdrawal design. In other words, it is a within-subjects design with just one unit of analysis. For example, a researcher may use a single-case design for a small group of patients with a tic. After observing the patients and establishing the number of tics per hour, the researcher would then conduct an intervention and watch what happens over time, thus revealing the richness of any change. Such studies are useful for generating ideas for broader studies and for focusing on the microlevel concerns associated with a particular unit. However, data from these studies need to be evaluated carefully given the many potential threats to internal validity; there are also issues relating to the sampling of both the one unit and the process it undergoes. Also called N-of-1 design; N=1 design; single-participant design; single-subject (case) design.
  • Anatomy of a Primary Research Article Document that goes through a research artile highlighting evaluative criteria for every section. Document from Mohawk Valley Community College. Permission to use sought and given
  • Single Case Design (Explanation) Single case design (SCD), often referred to as single subject design, is an evaluation method that can be used to rigorously test the success of an intervention or treatment on a particular case (i.e., a person, school, community) and to also provide evidence about the general effectiveness of an intervention using a relatively small sample size. The material presented in this document is intended to provide introductory information about SCD in relation to home visiting programs and is not a comprehensive review of the application of SCD to other types of interventions.
  • Single-Case Design, Analysis, and Quality Assessment for Intervention Research The purpose of this article is to describe single-case studies, and contrast them with case studies and randomized clinical trials Lobo, M. A., Moeyaert, M., Baraldi Cunha, A., & Babik, I. (2017). Single-case design, analysis, and quality assessment for intervention research. Journal of neurologic physical therapy : JNPT, 41(3), 187–197. https://doi.org/10.1097/NPT.0000000000000187
  • The difference between a case study and single case designs There is a big difference between case studies and single case designs, despite them superficially sounding similar. (This is from a Blog posting)
  • Single Case Design (Amanda N. Kelly, PhD, BCBA-D, LBA-aka Behaviorbabe) Despite the aka Behaviorbabe, Dr. Amanda N. Kelly, PhD, BCBA-D, LBA] provides a tutorial and explanation of single case design in simple terms.
  • Lobo (2018). Single-Case Design, Analysis, and Quality Assessment for Intervention Research Lobo, M. A., Moeyaert, M., Cunha, A. B., & Babik, I. (2017). Single-case design, analysis, and quality assessment for intervention research. Journal of neurologic physical therapy: JNPT, 41(3), 187.. https://doi.org/10.1097/NPT.0000000000000187
  • << Previous: Using crossref.org
  • Next: Feedback >>
  • Last Updated: Sep 16, 2024 11:01 AM
  • URL: https://mville.libguides.com/appliedbehavioranalysis

Applied Behavior Analysis

  • Find Articles on a Topic

Two Ways to Find Single Subject Research Design (SSRD) Articles

Finding ssrd articles via the browsing method, finding ssrd articles via the searching method.

  • Search by Article Citation in OneSearch
  • Find Reading Lists (AKA 'Course Reserves')
  • Get Articles We Don't Have through Interlibrary Loan
  • Browse ABA Journals
  • APA citation style
  • Install LibKey Nomad

Types of Single Subject Research Design

 Types of SSRDs to look for as you skim abstracts:

  • reversal design
  • withdrawal design
  • ABAB design
  • A-B-A-B design
  • A-B-C design
  • A-B-A design
  • multiple baseline
  • alternating treatments design
  • multi-element design
  • changing criterion design
  • single case design
  • single subject design
  • single case series

Behavior analysts recognize the advantages of single-subject design for establishing intervention efficacy.  Much of the research performed by behavior analysts will use SSRD methods.

When you need to find SSRD articles, there are two methods you can use:

single case study design aba

  • Click on a title from the list of ABA Journal Titles .
  • Scroll down on the resulting page to the View Online section.
  • Choose a link which includes the date range you're interested in.
  • Click on a link to an issue (date) you want to explore.
  • From the resulting Table of Contents, explore titles of interest, reading the abstract carefully for signs that the research was carried out using a SSRD.  (To help, look for the box on this page with a list of SSRD types.)

BSU credentials required for log in

Description: PsycInfo is a key database in the field of psychology. Includes information of use to psychologists, students, and professionals in related fields such as psychiatry, management, business, and education, social science, neuroscience, law, medicine, and social work. Time Period: 1887 to present Sources: Indexes more than 2,500 journals. Subject Headings: Education, Mobile, Psychology, Social Sciences (Psychology) Scholarly or Popular: Scholarly Primary Materials: Journal Articles Information Included: Abstracts, Citations, Linked Full Text FindIt@BALL STATE: Yes Print Equivalent: None Publisher: American Psychological Association Updates: Monthly Number of Simultaneous Users: Unlimited

icon for database searching

First , go to APA PsycInfo.

Second , copy and paste this set of terms describing different types of SSRDs into an APA PsycInfo search box, and choose "Abstract" in the drop-down menu.

Drop-down menu showing "AB Abstract"

Third , copy and paste this list of ABA journals into another search box in APA PsycInfo, and choose "SO Publication Name" in the drop-down menu.

Drop-down menu showing: "SO Publication Name"

Fourth , type in some keywords in another APA PsycInfo search box (or two) describing what you're researching.  Use OR and add synonyms or related words for the best results.

Hit SEARCH, and see what kind of results you get!

Here's an example of a search for SSRDs in ABA journals on the topic of fitness:

APA PsycInfo search with 3 boxes.  1st box: "reversal design" OR "withdrawal design" etc. 2nd box: "Analysis of Verbal Behavior" OR "Behavior Analyst" OR etc. 3rd box: exercise or physical activity or fitness

Note that the long list of terms in the top two boxes gets cut off in the screenshot - - but they're all there!

The reason this works:

  • To find SSRD articles, we can't just search on the phrase "single subject research" because many studies which use SSRD do not include that phrase anywhere in the text of the article; instead such articles typically specify in the abstract (and "Methods" section) what type of SSRD method was used (ex. withdrawal design, multiple baseline, or ABA design).  That's why we string together all the possible descriptions of SSRD types with the word OR in between -- it enables us to search for any sort of SSRD, regardless of how it's described.  Choosing "Abstract" in the drop-down menu ensures that we're focusing on these terms being used in the abstract field (not just popping up in discussion in the full-text).
  • To search specifically for studies carried out in the field of Applied Behavior Analysis, we enter in the titles of the ABA journals, strung together, with OR in between.  The quotation marks ensure each title is searched as a phrase.  Choosing "SO Publication Name" in the drop-down menu ensures that results will be from articles published in those journals (not just references to those journals).
  • To limit the search to a topic we're interested in, we type in some keywords in another search box.  The more synonyms you can think of, the better; that ensures you'll have a decent pool of records to look through, including authors who may have described your topic differently.

Search ideas:

To limit your search to just the top ABA journals, you can use this shorter list in place of the long one above:

"Behavior Analysis in Practice" OR "Journal of Applied Behavior Analysis" OR "Journal of Behavioral Education" OR "Journal of Developmental and Physical Disabilities" OR "Journal of the Experimental Analysis of Behavior"

To get more specific, topic-wise, add another search box with another term (or set of terms), like in this example:

Four search boxes in PsycInfo.  Same as above, but with a 4th box: autism OR "developmental disorders"

To search more broadly and include other psychology studies outside of ABA journals, simply remove the list of journal titles from the search, as shown here:

Search in PsycInfo without list of journal terms.

  • << Previous: Find Articles on a Topic
  • Next: Search by Article Citation in OneSearch >>
  • Last Updated: Aug 14, 2024 2:52 PM
  • URL: https://bsu.libguides.com/appliedbehavioranalysis

Making behavior analysis fun and accessible

The AllDayABA Blog

Join our mailing list.

If you want to be the first to read new blog posts, gain access to awesome resources, and hear about upcoming projects, then click "Sign Up" to become a part of our family today!

Copyright © 2022 AllDayABA - All Rights Reserved.

Powered by GoDaddy

Cookie Policy

This website uses cookies. By continuing to use this site, you accept our use of cookies.

10.2 Single-Subject Research Designs

Learning objectives.

  • Describe the basic elements of a single-subject research design.
  • Design simple single-subject studies using reversal and multiple-baseline designs.
  • Explain how single-subject research designs address the issue of internal validity.
  • Interpret the results of simple single-subject studies based on the visual inspection of graphed data.

General Features of Single-Subject Designs

Before looking at any specific single-subject research designs, it will be helpful to consider some features that are common to most of them. Many of these features are illustrated in Figure 10.1, which shows the results of a generic single-subject study. First, the dependent variable (represented on the  y -axis of the graph) is measured repeatedly over time (represented by the  x -axis) at regular intervals. Second, the study is divided into distinct phases, and the participant is tested under one condition per phase. The conditions are often designated by capital letters: A, B, C, and so on. Thus Figure 10.1 represents a design in which the participant was tested first in one condition (A), then tested in another condition (B), and finally retested in the original condition (A). (This is called a reversal design and will be discussed in more detail shortly.)

Figure 10.2 Results of a Generic Single-Subject Study Illustrating Several Principles of Single-Subject Research

Figure 10.1 Results of a Generic Single-Subject Study Illustrating Several Principles of Single-Subject Research

Another important aspect of single-subject research is that the change from one condition to the next does not usually occur after a fixed amount of time or number of observations. Instead, it depends on the participant’s behavior. Specifically, the researcher waits until the participant’s behavior in one condition becomes fairly consistent from observation to observation before changing conditions. This is sometimes referred to as the steady state strategy  (Sidman, 1960) [1] . The idea is that when the dependent variable has reached a steady state, then any change across conditions will be relatively easy to detect. Recall that we encountered this same principle when discussing experimental research more generally. The effect of an independent variable is easier to detect when the “noise” in the data is minimized.

Reversal Designs

The most basic single-subject research design is the  reversal design , also called the  ABA design . During the first phase, A, a  baseline  is established for the dependent variable. This is the level of responding before any treatment is introduced, and therefore the baseline phase is a kind of control condition. When steady state responding is reached, phase B begins as the researcher introduces the treatment. There may be a period of adjustment to the treatment during which the behavior of interest becomes more variable and begins to increase or decrease. Again, the researcher waits until that dependent variable reaches a steady state so that it is clear whether and how much it has changed. Finally, the researcher removes the treatment and again waits until the dependent variable reaches a steady state. This basic reversal design can also be extended with the reintroduction of the treatment (ABAB), another return to baseline (ABABA), and so on.

The study by Hall and his colleagues employed an ABAB reversal design. Figure 10.2 approximates the data for Robbie. The percentage of time he spent studying (the dependent variable) was low during the first baseline phase, increased during the first treatment phase until it leveled off, decreased during the second baseline phase, and again increased during the second treatment phase.

Figure 10.3 An Approximation of the Results for Hall and Colleagues’ Participant Robbie in Their ABAB Reversal Design

Figure 10.2 An Approximation of the Results for Hall and Colleagues’ Participant Robbie in Their ABAB Reversal Design

Why is the reversal—the removal of the treatment—considered to be necessary in this type of design? Why use an ABA design, for example, rather than a simpler AB design? Notice that an AB design is essentially an interrupted time-series design applied to an individual participant. Recall that one problem with that design is that if the dependent variable changes after the treatment is introduced, it is not always clear that the treatment was responsible for the change. It is possible that something else changed at around the same time and that this extraneous variable is responsible for the change in the dependent variable. But if the dependent variable changes with the introduction of the treatment and then changes  back  with the removal of the treatment (assuming that the treatment does not create a permanent effect), it is much clearer that the treatment (and removal of the treatment) is the cause. In other words, the reversal greatly increases the internal validity of the study.

There are close relatives of the basic reversal design that allow for the evaluation of more than one treatment. In a  multiple-treatment reversal design , a baseline phase is followed by separate phases in which different treatments are introduced. For example, a researcher might establish a baseline of studying behavior for a disruptive student (A), then introduce a treatment involving positive attention from the teacher (B), and then switch to a treatment involving mild punishment for not studying (C). The participant could then be returned to a baseline phase before reintroducing each treatment—perhaps in the reverse order as a way of controlling for carryover effects. This particular multiple-treatment reversal design could also be referred to as an ABCACB design.

In an  alternating treatments design , two or more treatments are alternated relatively quickly on a regular schedule. For example, positive attention for studying could be used one day and mild punishment for not studying the next, and so on. Or one treatment could be implemented in the morning and another in the afternoon. The alternating treatments design can be a quick and effective way of comparing treatments, but only when the treatments are fast acting.

Multiple-Baseline Designs

There are two potential problems with the reversal design—both of which have to do with the removal of the treatment. One is that if a treatment is working, it may be unethical to remove it. For example, if a treatment seemed to reduce the incidence of self-injury in a child with an intellectual delay, it would be unethical to remove that treatment just to show that the incidence of self-injury increases. The second problem is that the dependent variable may not return to baseline when the treatment is removed. For example, when positive attention for studying is removed, a student might continue to study at an increased rate. This could mean that the positive attention had a lasting effect on the student’s studying, which of course would be good. But it could also mean that the positive attention was not really the cause of the increased studying in the first place. Perhaps something else happened at about the same time as the treatment—for example, the student’s parents might have started rewarding him for good grades. One solution to these problems is to use a  multiple-baseline design , which is represented in Figure 10.3. There are three different types of multiple-baseline designs which we will now consider.

Multiple-Baseline Design Across Participants

In one version of the design, a baseline is established for each of several participants, and the treatment is then introduced for each one. In essence, each participant is tested in an AB design. The key to this design is that the treatment is introduced at a different  time  for each participant. The idea is that if the dependent variable changes when the treatment is introduced for one participant, it might be a coincidence. But if the dependent variable changes when the treatment is introduced for multiple participants—especially when the treatment is introduced at different times for the different participants—then it is unlikely to be a coincidence.

Figure 10.4 Results of a Generic Multiple-Baseline Study. The multiple baselines can be for different participants, dependent variables, or settings. The treatment is introduced at a different time on each baseline.

Figure 10.3 Results of a Generic Multiple-Baseline Study. The multiple baselines can be for different participants, dependent variables, or settings. The treatment is introduced at a different time on each baseline.

As an example, consider a study by Scott Ross and Robert Horner (Ross & Horner, 2009) [2] . They were interested in how a school-wide bullying prevention program affected the bullying behavior of particular problem students. At each of three different schools, the researchers studied two students who had regularly engaged in bullying. During the baseline phase, they observed the students for 10-minute periods each day during lunch recess and counted the number of aggressive behaviors they exhibited toward their peers. After 2 weeks, they implemented the program at one school. After 2 more weeks, they implemented it at the second school. And after 2 more weeks, they implemented it at the third school. They found that the number of aggressive behaviors exhibited by each student dropped shortly after the program was implemented at his or her school. Notice that if the researchers had only studied one school or if they had introduced the treatment at the same time at all three schools, then it would be unclear whether the reduction in aggressive behaviors was due to the bullying program or something else that happened at about the same time it was introduced (e.g., a holiday, a television program, a change in the weather). But with their multiple-baseline design, this kind of coincidence would have to happen three separate times—a very unlikely occurrence—to explain their results.

Multiple-Baseline Design Across Behaviors

In another version of the multiple-baseline design, multiple baselines are established for the same participant but for different dependent variables, and the treatment is introduced at a different time for each dependent variable. Imagine, for example, a study on the effect of setting clear goals on the productivity of an office worker who has two primary tasks: making sales calls and writing reports. Baselines for both tasks could be established. For example, the researcher could measure the number of sales calls made and reports written by the worker each week for several weeks. Then the goal-setting treatment could be introduced for one of these tasks, and at a later time the same treatment could be introduced for the other task. The logic is the same as before. If productivity increases on one task after the treatment is introduced, it is unclear whether the treatment caused the increase. But if productivity increases on both tasks after the treatment is introduced—especially when the treatment is introduced at two different times—then it seems much clearer that the treatment was responsible.

Multiple-Baseline Design Across Settings

In yet a third version of the multiple-baseline design, multiple baselines are established for the same participant but in different settings. For example, a baseline might be established for the amount of time a child spends reading during his free time at school and during his free time at home. Then a treatment such as positive attention might be introduced first at school and later at home. Again, if the dependent variable changes after the treatment is introduced in each setting, then this gives the researcher confidence that the treatment is, in fact, responsible for the change.

Data Analysis in Single-Subject Research

In addition to its focus on individual participants, single-subject research differs from group research in the way the data are typically analyzed. As we have seen throughout the book, group research involves combining data across participants. Group data are described using statistics such as means, standard deviations, correlation coefficients, and so on to detect general patterns. Finally, inferential statistics are used to help decide whether the result for the sample is likely to generalize to the population. Single-subject research, by contrast, relies heavily on a very different approach called  visual inspection . This means plotting individual participants’ data as shown throughout this chapter, looking carefully at those data, and making judgments about whether and to what extent the independent variable had an effect on the dependent variable. Inferential statistics are typically not used.

In visually inspecting their data, single-subject researchers take several factors into account. One of them is changes in the  level  of the dependent variable from condition to condition. If the dependent variable is much higher or much lower in one condition than another, this suggests that the treatment had an effect. A second factor is  trend , which refers to gradual increases or decreases in the dependent variable across observations. If the dependent variable begins increasing or decreasing with a change in conditions, then again this suggests that the treatment had an effect. It can be especially telling when a trend changes directions—for example, when an unwanted behavior is increasing during baseline but then begins to decrease with the introduction of the treatment. A third factor is  latency , which is the time it takes for the dependent variable to begin changing after a change in conditions. In general, if a change in the dependent variable begins shortly after a change in conditions, this suggests that the treatment was responsible.

In the top panel of Figure 10.4, there are fairly obvious changes in the level and trend of the dependent variable from condition to condition. Furthermore, the latencies of these changes are short; the change happens immediately. This pattern of results strongly suggests that the treatment was responsible for the changes in the dependent variable. In the bottom panel of Figure 10.4, however, the changes in level are fairly small. And although there appears to be an increasing trend in the treatment condition, it looks as though it might be a continuation of a trend that had already begun during baseline. This pattern of results strongly suggests that the treatment was not responsible for any changes in the dependent variable—at least not to the extent that single-subject researchers typically hope to see.

Figure 10.5 Results of a Generic Single-Subject Study Illustrating Level, Trend, and Latency. Visual inspection of the data suggests an effective treatment in the top panel but an ineffective treatment in the bottom panel.

Figure 10.4 Results of a Generic Single-Subject Study Illustrating Level, Trend, and Latency. Visual inspection of the data suggests an effective treatment in the top panel but an ineffective treatment in the bottom panel.

The results of single-subject research can also be analyzed using statistical procedures—and this is becoming more common. There are many different approaches, and single-subject researchers continue to debate which are the most useful. One approach parallels what is typically done in group research. The mean and standard deviation of each participant’s responses under each condition are computed and compared, and inferential statistical tests such as the  t  test or analysis of variance are applied (Fisch, 2001) [3] . (Note that averaging  across  participants is less common.) Another approach is to compute the  percentage of non-overlapping data  (PND) for each participant (Scruggs & Mastropieri, 2001) [4] . This is the percentage of responses in the treatment condition that are more extreme than the most extreme response in a relevant control condition. In the study of Hall and his colleagues, for example, all measures of Robbie’s study time in the first treatment condition were greater than the highest measure in the first baseline, for a PND of 100%. The greater the percentage of non-overlapping data, the stronger the treatment effect. Still, formal statistical approaches to data analysis in single-subject research are generally considered a supplement to visual inspection, not a replacement for it.

Key Takeaways

  • Single-subject research designs typically involve measuring the dependent variable repeatedly over time and changing conditions (e.g., from baseline to treatment) when the dependent variable has reached a steady state. This approach allows the researcher to see whether changes in the independent variable are causing changes in the dependent variable.
  • In a reversal design, the participant is tested in a baseline condition, then tested in a treatment condition, and then returned to baseline. If the dependent variable changes with the introduction of the treatment and then changes back with the return to baseline, this provides strong evidence of a treatment effect.
  • In a multiple-baseline design, baselines are established for different participants, different dependent variables, or different settings—and the treatment is introduced at a different time on each baseline. If the introduction of the treatment is followed by a change in the dependent variable on each baseline, this provides strong evidence of a treatment effect.
  • Single-subject researchers typically analyze their data by graphing them and making judgments about whether the independent variable is affecting the dependent variable based on level, trend, and latency.
  • Does positive attention from a parent increase a child’s tooth-brushing behavior?
  • Does self-testing while studying improve a student’s performance on weekly spelling tests?
  • Does regular exercise help relieve depression?
  • Practice: Create a graph that displays the hypothetical results for the study you designed in Exercise 1. Write a paragraph in which you describe what the results show. Be sure to comment on level, trend, and latency.
  • Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology . Boston, MA: Authors Cooperative. ↵
  • Ross, S. W., & Horner, R. H. (2009). Bully prevention in positive behavior support. Journal of Applied Behavior Analysis, 42 , 747–759. ↵
  • Fisch, G. S. (2001). Evaluating data from behavioral analysis: Visual inspection or statistical models. Behavioral Processes, 54 , 137–154. ↵
  • Scruggs, T. E., & Mastropieri, M. A. (2001). How to summarize single-participant research: Ideas and applications.  Exceptionality, 9 , 227–244. ↵

Creative Commons License

Share This Book

  • Increase Font Size

Randomized single-case AB phase designs: Prospects and pitfalls

  • Published: 18 July 2018
  • Volume 51 , pages 2454–2476, ( 2019 )

Cite this article

single case study design aba

  • Bart Michiels 1 , 2 &
  • Patrick Onghena 1  

10k Accesses

46 Citations

15 Altmetric

Explore all metrics

Single-case experimental designs (SCEDs) are increasingly used in fields such as clinical psychology and educational psychology for the evaluation of treatments and interventions in individual participants. The AB phase design , also known as the interrupted time series design , is one of the most basic SCEDs used in practice. Randomization can be included in this design by randomly determining the start point of the intervention. In this article, we first introduce this randomized AB phase design and review its advantages and disadvantages. Second, we present some data-analytical possibilities and pitfalls related to this design and show how the use of randomization tests can mitigate or remedy some of these pitfalls. Third, we demonstrate that the Type I error of randomization tests in randomized AB phase designs is under control in the presence of unexpected linear trends in the data. Fourth, we report the results of a simulation study investigating the effect of unexpected linear trends on the power of the randomization test in randomized AB phase designs. The implications of these results for the analysis of randomized AB phase designs are discussed. We conclude that randomized AB phase designs are experimentally valid, but that the power of these designs is sufficient only for large treatment effects and large sample sizes. For small treatment effects and small sample sizes, researchers should turn to more complex phase designs, such as randomized ABAB phase designs or randomized multiple-baseline designs.

Similar content being viewed by others

single case study design aba

Quantitative Techniques and Graphical Representations for Interpreting Results from Alternating Treatment Design

single case study design aba

The Power to Explain Variability in Intervention Effectiveness in Single-Case Research Using Hierarchical Linear Modeling

single case study design aba

Experimental Designs for Research on Adaptive Interventions: Singly and Sequentially Randomized Trials

Avoid common mistakes on your manuscript.

Introduction

Single-case experimental designs (SCEDs) can be used to evaluate treatment effects for specific individuals or to assess the efficacy of individualized treatments. In such designs, repeated observations are recorded for a single person on a dependent variable of interest, and the treatment can be considered as one of the levels of the independent variable (Barlow, Nock, & Hersen, 2009 ; Kazdin, 2011 ; Onghena, 2005 ). SCEDs are widely used as a methodological tool in various domains of science, including clinical psychology, school psychology, special education, and medicine (Alnahdi, 2015 ; Chambless & Ollendick, 2001 ; Gabler, Duan, Vohra, & Kravitz, 2011 ; Hammond & Gast, 2010 ; Kratochwill & Stoiber, 2000 ; Leong, Carter, & Stephenson, 2015 ; Shadish & Sullivan, 2011 ; Smith, 2012 ; Swaminathan & Rogers, 2007 ). The growing interest in these types of designs can be inferred from the recent publication of guidelines for reporting the results of SCEDs in various fields of the educational, behavioral, and health sciences (Shamseer et al., 2015 ; Tate et al., 2016 ; Vohra et al., 2015 ).

SCEDs are often confused with case studies or other nonexperimental research, but these types of studies should be clearly distinguished from each other (Onghena & Edgington, 2005 ). More specifically, SCEDs involve the deliberate manipulation of an independent variable, whereas such a manipulation is absent in nonexperimental case studies. In addition, the reporting of results from SCEDs usually involves visual and statistical analyses, whereas case studies are often reported in a narrative way.

SCEDs should also be distinguished from experimental designs that are based on comparing groups. The principal difference between SCEDs and between-subjects experimental designs concerns the definition of the experimental units. Whereas the experimental units in group-comparison studies refer to participants assigned to different groups, the experimental units in SCEDs refer to repeated measurements of specific entities under investigation (e.g., a person) that are assigned to different treatments (Edgington & Onghena, 2007 ). Various types of SCEDs exist. In the following section we will discuss the typology of single-case designs.

Typology of single-case experimental designs

A comprehensive typology of SCEDs can be constructed using three dimensions: (1) whether the design is a phase or an alternation design, (2) whether or not the design contains random assignment, and (3) whether or not the design is replicated. We will discuss each of these dimensions in turn.

Design type

Various types of SCEDs can be broadly categorized into two main types: phase designs and alternation designs (Heyvaert & Onghena, 2014 ; Onghena & Edgington, 2005 ; Rvachew & Matthews, 2017 ), although hybrids of both types are possible (see, e.g., Levin, Ferron, & Gafurov, 2014 ; Onghena, Vlaeyen, & de Jong, 2007 ). Phase designs divide the sequence of measurement occasions in a single-case experiment (SCE) into separate treatment phases, with each phase containing multiple measurements (Edgington, 1975a , 1980 ; Onghena, 1992 ). The basic building block of phase designs is the AB phase design that features a succession of a baseline phase (A) and a treatment phase (B). This basic design can be expanded by including more A phases or B phases leading to more complex phase designs such as ABA and ABAB phase designs. Furthermore, it is also possible to construct phase designs that compare more than two treatments (e.g., an ABC design). In contrast to phase designs, alternation designs do not feature distinct phases but rather involve rapid alternation of the experimental conditions throughout the course of the SCE. Consequently, these designs are intended for research situations in which rapid and frequent alternation of treatments is possible (Barlow & Hayes, 1979 ; Onghena & Edgington, 1994 ). Some common alternation designs include the completely randomized design (CRD), the randomized block design (RBD), and the alternating treatments design (ATD, Onghena, 2005 ). Manolov and Onghena ( 2017 ) provide a recent overview of the use of ATDs in published single-case research and discuss various data-analytical techniques for this type of design.

Random assignment

When treatment labels are randomly assigned to measurement occasions in an SCED, one obtains a randomized SCED. This procedure of random assignment in an SCED is similar to the way in which subjects are randomly assigned to experimental groups in a between-subjects design. The main difference is that in SCEDs repeated measurement occasions for one subject are randomized across two or more experimental conditions whereas in between-subjects designs individual participants are randomized across two or more experimental groups. The way in which SCEDs can be randomized depends on the type of design. Phase designs can be randomized by listing all possible intervention start points and then randomly selecting one of them for conducting the actual experiment (Edgington, 1975a ). Consider, for example, an AB design, consisting of a baseline (A) phase and a treatment (B) phase, with a total of ten measurement occasions and a minimum of three measurement occasions per phase. For this design there are six possible start points for the intervention, leading to the following divisions of the measurement occasions:

This type of randomization can also be applied to more complex phase designs, such as ABA or ABAB phase designs, by randomly selecting time points for all the moments of phase change in the design (Onghena, 1992 ). Alternation designs are randomized by imposing a randomization scheme on the set of measurement occasions, in which the treatment conditions are able to alternate throughout the experiment. The CRD is the simplest alternation design as it features “unrestricted randomization.” In this design, only the number of measurement occasions for each level of the independent variable has to be fixed. For example, if we consider a hypothetical SCED with two conditions (A and B) and three measurement occasions per condition, there are 20 possible randomizations \( \left(\genfrac{}{}{0pt}{}{6}{3}\right) \) using a CRD:

AAABBB

BBBAAA

AABABB

BBABAA

AABBAB

BBAABA

AABBBA

BBAAAB

ABAABB

BABBAA

ABABAB

BABABA

ABABBA

BABAAB

ABBAAB

BAABBA

ABBABA

BAABAB

ABBBAA

BAAABB

The randomizations schemes for an RBD or an ATD can be constructed by imposing additional constraints on the CRD randomization scheme. For example, an RBD is obtained by grouping measurement occasions in pairs and randomizing the treatment order within each pair. For the same number of measurement occasions as in the example above, an RBD yields 2 3 = 8 possible randomizations, which are a subset of the CRD randomizations.

ABABAB

BABABA

ABABBA

BABAAB

ABBAAB

BAABBA

ABBABA

BAABAB

This type of randomization can be useful to counter the effect of time-related confounding variables on the dependent variable, as the randomization within pairs (or blocks of a certain) size eliminates any time-related effects that might occur within these pairs. An ATD randomization scheme can be constructed from a CRD randomization scheme with the restriction that only a certain maximum number of successive measurement occasions are allowed to have the same treatment, which ensures rapid treatment alternation. Using the example of our hypothetical SCED, an ATD with a maximum number of two consecutive administrations of the same condition yields the following 14 randomizations:

AABABB

BBABAA

AABBAB

BBAABA

ABAABB

BABBAA

ABABAB

BABABA

ABABBA

BABAAB

ABBAAB

BAABBA

ABBABA

BAABAB

Note again that all of these randomizations are a subset of the CRD randomizations. Many authors have emphasized the importance of randomizing SCEDs for making valid inferences (e.g., Dugard, 2014 ; Dugard, File, & Todman, 2012 ; Edgington & Onghena, 2007 ; Heyvaert, Wendt, Van den Noortgate, & Onghena, 2015 ; Kratochwill & Levin, 2010 ). The benefits and importance of incorporating random assignment in SCEDs are also stressed in recently developed guidelines for the reporting of SCE results, such as the CONSORT extension for reporting N -of-1 trials (Shamseer et al., 2015 ; Vohra et al., 2015 ) and the single-case reporting guideline in behavioral interventions statement (Tate et al., 2016 ). SCEDs that do not incorporate some form of random assignment are still experimental designs in the sense that they feature a deliberate manipulation of an independent variable, so they must still be distinguished from nonexperimental research such as case studies. That being said, the absence of random assignment in a SCED makes it harder to rule out alternative explanations for the occurrence of a treatment effect, thus weakening the internal validity of the design. In addition, it should be noted that the incorporation of randomization in SCEDs is still relatively rare in many domains of research.

Replication

It should be noted that research projects and single-case research publications rarely involve only one SCED, and that usually replication is aimed at. Kratochwill et al. ( 2010 ) noted that replication also increases the internal validity of an SCED. In this sense it is important to emphasize that randomization and replication should be used concurrently for increasing the internal validity of an SCED. Replication can occur in two different ways: simultaneously or sequentially (Onghena & Edgington, 2005 ). Simultaneous replication designs entail conducting multiple alternation or phase designs at the same time. The most widely used simultaneous replication design is the multiple baseline across participants design, which combines two or more phase designs (usually AB phase designs), in which the treatment is administered in a time-staggered manner across the individual participants (Hammond & Gast, 2010 ; Shadish & Sullivan, 2011 ). Sequential replication designs entail conducting individual SCEs sequentially in order to test the generalizability of the results to other participants, settings, or outcomes (Harris & Jenson, 1985 ; Mansell, 1982 ). Also for this part of the typology, it is possible to create hybrid designs by combining simultaneous and sequential features—for example, by sequentially replicating multiple-baseline across-participant designs or using a so-called “nonconcurrent multiple baseline design,” with only partial temporal overlap (Harvey, May, & Kennedy, 2004 ; Watson & Workman, 1981 ). Note that alternative SCED taxonomies have been proposed (e.g., Gast & Ledford, 2014 ). The focus of the present article is on the AB phase design, also known as the interrupted time series design (Campbell & Stanley, 1966 ; Cook & Campbell, 1979 ; Shadish, Cook, & Campbell, 2002 ).

The single-case AB phase design

The AB phase design is one of the most basic and practically feasible experimental designs for evaluating treatments in single-case research. Although widely used in practice, the AB phase design has received criticism for its low internal validity (Campbell, 1969 ; Cook & Campbell, 1979 ; Kratochwill et al., 2010 ; Shadish et al., 2002 ; Tate et al., 2016 ; Vohra et al., 2015 ). Several authors have rated the AB phase design as “quasi-experimental” or even “nonexperimental,” because the lack of a treatment reversal phase leaves the design vulnerable to the internal validity threats of history and maturation (Kratochwill et al., 2010 ; Tate et al., 2016 ; Vohra et al., 2015 ). History refers to the confounding influence of external factors on the treatment effect during the course of the experiment, whereas maturation refers to changes within the subject during the course of the experiment that may influence the treatment effect (Campbell & Stanley, 1966 ). These confounding effects can serve as alternative explanations for the occurrence of a treatment effect other than the experimental manipulation and as such threaten the internal validity of the SCED. Kratochwill et al. argue that the internal validity threats of history and maturation are mitigated when SCEDs contain at least two AB phase pair repetitions. More specifically, their argument is that the probability that history effects (e.g., the participant turns ill during the experiment) occurring simultaneously with the introduction of the treatment is smaller when there are multiple introductions of the treatment than in the situation in which there is only one introduction of the treatment. Similarly, to lessen the impact of potential maturation effects (e.g., spontaneous improvement of the participant yielding an upward or downward trend in the data) on the internal validity of the SCED, Kratochwill et al. argue that an SCED should be able to record at least three demonstrations of the treatment effect. For these reasons, they argue that only phase designs with at least two AB phase pair repetitions (e.g., an ABAB design) are valid SCEDs, and that designs with only one AB phase pair repetition (e.g., an AB phase design) are inadequate for drawing valid inferences. Similarly, Tate et al. and Vohra et al. do not consider the AB phase design as a valid SCED. More specifically, Tate et al. consider the AB phase design as a quasi-experimental design, and Vohra et al. even regard the AB phase design as a nonexperimental design, putting it under the same label as case studies. In contrast, the SCED classification by Logan, Hickman, Harris, and Heriza ( 2008 ) does include the AB phase design as a valid design.

Rather than using discrete classifications, we propose a gradual view of evaluating the internal validity of an SCED. In the remainder of this article we will argue that randomized AB phase designs have an important place in the methodological toolbox of the single-case researcher as valid SCEDs. It is our view that the randomized AB phase design can be used as a basic experimental design for situations in which this design is the only feasible way to collect experimental data (e.g., when evaluating treatments that cannot be reversed due to the nature of the treatment or because of ethical concerns). We will build up this argument in several steps. First, we will explain how random assignment strengthens the internal validity of AB phase designs as compared to AB phase designs without random assignment, and discuss how the internal validity of randomized AB phase designs can be increased further through the use of replication and formal statistical analysis. Second, after mentioning some common statistical techniques for analyzing randomized AB phase designs we will discuss the use of a statistical technique that can be directly derived from the random assignment that is present in randomized AB phase designs: the randomization test (RT). In addition we will discuss some potential data-analytical pitfalls that can occur when analyzing randomized AB phase designs and argue how the use of the RT can mitigate some of these pitfalls. Furthermore, we will provide a worked example of how AB phase designs can be randomized and subsequently analyzed with the RT using the randomization method proposed by Edgington ( 1975a ). Third, we will demonstrate the validity of the RT when analyzing randomized AB phase designs containing a specific manifestation of a maturation effect: An unexpected linear trend that occurs in the data yielding a gradual increase in the scores of the dependent variable that is unrelated to the administration of the treatment. More specifically we will show that the RT controls the Type I error rate when unexpected linear trends are present in the data. Finally, we will also present the results of a simulation study that investigated the power of the RT when analyzing randomized AB phase designs containing various combinations of unexpected linear trends in the baseline phase and/or treatment phase. Apart from controlled Type I error rates, adequate power is another criterion for the usability of the RT for specific types of datasets. Previous research already investigated the effect of different levels of autocorrelation on the power of the RT in randomized AB phase designs but only for data without trend (Ferron & Ware, 1995 ). However, a study by Solomon ( 2014 ) showed that trend is quite common in single-case research, making it important to investigate the implications of trend effects on the power of the RT.

Randomized AB phase designs are valid single-case experimental designs

There are several reasons why the use of randomized AB phase designs should be considered for conducting single-case research. First of all, the randomized AB phase design contains all the required elements to fit the definition of an SCED: A design that involves repeated measurements on a dependent variable and a deliberate experimental manipulation of an independent variable. Second, the randomized AB phase design is the most feasible single-case design for treatments that cannot be withdrawn for practical or ethical reasons and also the most cost-efficient and the most easily implemented of all phase designs (Heyvaert et al., 2017 ). Third, if isolated randomized AB phase designs were dismissed as invalid, and if only a randomized AB phase design was feasible, given the very nature of psychological and educational interventions that cannot be reversed or considered undone, then practitioners would be discouraged from using an SCED altogether, and potentially important experimental evidence would never be collected.

We acknowledge that the internal validity threats of history and maturation have to be taken into account when drawing inferences from AB phase designs. Moreover we agree with the views from Kratochwill et al. ( 2010 ) that designs with multiple AB phase pairs (e.g., an ABAB design) offer better protection from threats to internal validity than designs with only one AB phase pair (e.g., the AB phase design). However, we also argue that the internal validity of the basic AB phase design can be strengthened in several ways.

First, the internal validity of the AB phase design (as well as other SCEDs) can be increased considerably by incorporating random assignment into the design (Heyvaert et al., 2015 ). Random assignment can neutralize potential history effects in SCEDs as random assignment of measurement occasions to treatment conditions allows us to statistically control confounding variables that may manifest themselves throughout the experiment. In a similar vein, random assignment can also neutralize potential maturation effects because any behavioral changes that might occur within the subject are unrelated to the random allocation of measurement occasions to treatment conditions (Edgington, 1996 ). Edgington ( 1975a ) proposed a way to incorporate random assignment into the AB phase design. Because the phase sequence in a AB phase design is fixed, random assignment should respect this phase structure. Therefore, Edgington ( 1975a ) proposed to randomize the start point of the treatment phase. In this approach the researcher initially specifies the total number of measurement occasions to be included in the design along with limits for the minimum number of measurement occasions to be included in each phase. This results in a range of potential start points for the treatment phase. The researcher then randomly selects one of these start points to conduct the actual experiment. By randomizing the start point of the treatment phase in the AB phase design it becomes possible to evaluate the treatment effect for each of the hypothetical start points from the randomization process and to compare these hypothetical treatment effects to the observed treatment effect from the start point that was used for the actual experiment. Under the assumption that potential confounding effects such as history and maturation are constant for the various possible start points of the treatment phase these effects are made less plausible as alternative explanations in case a statistically significant treatment effect is found. As such, incorporating random assignment into the AB phase design can also provide a safeguard for threats against internal validity without the need for adding extra phases to the design. This method of randomizing start points in AB phase designs can easily be extended to more complex phase designs such as ABA or ABAB designs by generating random start points for each moment of phase change in the design (Levin et al., 2014 ; Onghena, 1992 ).

Second, the internal validity of randomized AB phase designs can be increased further by replications, and replicated randomized AB phase designs are acceptable by most standards (e.g., Kratochwill et al., 2010 ; Tate et al., 2016 ). When a treatment effect can be demonstrated across multiple replicated randomized AB phase designs, it lowers the probability that this treatment effect is caused by history or maturation effects rather than by the treatment itself. In fact, when multiple randomized AB phase designs are replicated across participants and the treatment is administered in a staggered manner across the participants, one obtains a multiple-baseline across-participant design, which is accepted as a valid SCED according to many standards (Kratochwill et al., 2010 ; Logan et al., 2008 ; Tate et al., 2016 ; Vohra et al., 2015 ).

Third, one can increase the chance of making valid inferences from randomized AB phase designs by analyzing them statistically with adequate statistical techniques. Many data-analytical techniques for single-case research focus mainly on analyzing randomized AB phase designs and strengthening the resulting inferences (e.g., interrupted time series analysis, Borckardt & Nash, 2014 ; Gottman & Glass, 1978 ; nonoverlap effect size measures, Parker, Vannest, & Davis, 2011 ; multilevel modeling, Van den Noortgate & Onghena, 2003 ). Furthermore, one can analyze the randomized AB phase design using a statistical test that is directly derived from the random assignment that is present in the design: the RT (Kratochwill & Levin, 2010 ; Onghena & Edgington, 2005 ).

Data analysis of randomized AB phase designs: techniques and pitfalls

Techniques for randomized AB phase designs can be broadly categorized in two groups: visual analysis and statistical analysis (Heyvaert et al., 2015 ). Visual analysis refers to inspecting the observed data for changes in level, phase overlap, variability, trend, immediacy of the effect, and consistency of data patterns across similar phases (Horner, Swaminathan, Sugai, & Smolkowski, 2012 ). The advantages of visual analysis are that it is quick, intuitive, and requires little methodological knowledge. The main disadvantages of visual analysis are that small but systematic treatment effects are hard to detect (Kazdin, 2011 ) and that it is associated with low interrater agreement (e.g., Bobrovitz & Ottenbacher, 1998 ; Ximenes, Manolov, Solanas, & Quera, 2009 ). Although visual analysis remains widely used for analyzing randomized AB phase designs (Kazdin, 2011 ), there is a general consensus that visual analysis should be used concurrently with supplementary statistical analyses to corroborate the results (Harrington & Velicer, 2015 ; Kratochwill et al., 2010 ).

Techniques for the statistical analysis of randomized AB phase designs can be divided into three groups: effect size calculation, statistical modeling, and statistical inference. Effect size (ES) calculation involves evaluating treatment ESs by calculating formal ES measures. One can discern proposals that are based on calculating standardized mean difference measures (e.g., Busk & Serlin, 1992 ; Hedges, Pustejovsky, & Shadish, 2012 ), proposals that are based on calculating overlap between phases (see Parker, Vannest, & Davis, 2011 , for an overview), proposals that are based on calculating standardized or unstandardized regression coefficients (e.g., Allison & Gorman, 1993 ; Solanas, Manolov, & Onghena, 2010 ; Van den Noortgate & Onghena, 2003 ), and proposals that are based on Bayesian methods (Rindskopf, Shadish, & Hedges, 2012 ; Swaminathan, Rogers, & Horner, 2014 ). Statistical modeling refers to constructing an adequate description of the data by fitting the data to a statistical model. Some proposed modeling techniques include interrupted time series analysis (Borckardt & Nash, 2014 ; Gottman & Glass, 1978 ), generalized mixed models (Shadish, Zuur, & Sullivan, 2014 ), multilevel modeling (Van den Noortgate & Onghena, 2003 ), Bayesian modeling techniques (Rindskopf, 2014 ; Swaminathan et al., 2014 ), and structural equation modeling (Shadish, Rindskopf, & Hedges, 2008 ).

Statistical inference refers to assessing the statistical significance of treatment effects through hypothesis testing or by constructing confidence intervals for the parameter estimates (Heyvaert et al., 2015 ; Michiels, Heyvaert, Meulders, & Onghena, 2017 ). On the one hand, inferential procedures can be divided into parametric and nonparametric procedures, and on the other hand, they can be divided into frequentist and Bayesian procedures. One possibility for analyzing randomized AB phase designs is to use parametric frequentist procedures, such as statistical tests and confidence intervals based on t and F distributions. The use of these procedures is often implicit in some of the previously mentioned data-analytical proposals, such as the regression-based approach of Allison and Gorman ( 1993 ) and the multilevel model approach of Van den Noortgate and Onghena ( 2003 ). However, it has been shown that data from randomized AB phase designs often violate the specific distributional assumptions made by these parametric procedures (Shadish & Sullivan, 2011 ; Solomon, 2014 ). As such, the validity of these parametric procedures is not guaranteed when they are applied to randomized AB phase designs. Bayesian inference can be either parametric or nonparametric, depending on the assumptions that are made for the prior and posterior distributions of the Bayesian model employed. De Vries and Morey ( 2013 ) provide an example of parametric Bayesian hypothesis testing for the analysis of randomized AB phase designs.

An example of a nonparametric frequentist procedure that has been proposed for the analysis of randomized AB phase designs is the RT (e.g., Bulté & Onghena, 2008 ; Edgington, 1967 ; Heyvaert & Onghena, 2014 ; Levin, Ferron, & Kratochwill, 2012 ; Onghena, 1992 ; Onghena & Edgington, 1994 , 2005 ). The RT can be used for statistical inference based on random assignment. More specifically, the test does not make specific distributional assumptions or an assumption of random sampling, but rather obtains its validity from the randomization that is present in the design. When measurement occasions are randomized to treatment conditions according to the employed randomization scheme, a statistical reference distribution for a test statistic S can be calculated. This reference distribution can be used for calculating nonparametric p values or for constructing nonparametric confidence intervals for S by inverting the RT (Michiels et al., 2017 ). The RT is also flexible with regard to the choice of the test statistic (Ferron & Sentovich, 2002 ; Onghena, 1992 ; Onghena & Edgington, 2005 ). For example, it is possible to use an ES measure based on standardized mean differences as the test statistic in the RT (Michiels & Onghena, 2018 ), but also ES measures based on data nonoverlap (Heyvaert & Onghena, 2014 ; Michiels, Heyvaert, & Onghena, 2018 ). This freedom to devise a test statistic that fits the research question makes the RT a versatile statistical tool for various research settings and treatment effects (e.g., with mean level differences, trends, or changes in variability; Dugard, 2014 ).

When using inferential statistical techniques for randomized AB phase designs, single-case researchers can encounter various pitfalls with respect to reaching valid conclusions about the efficacy of a treatment. A first potential pitfall is that single-case data often violate the distributional assumptions of parametric hypothesis tests (Solomon, 2014 ). When distributional assumptions are violated, parametric tests might inflate or deflate the probability of Type I errors in comparison to the nominal significance level of the test. The use of RTs can provide a safeguard from this pitfall: Rather than invoking distributional assumptions, the RT procedure involves the derivation of a reference distribution from the observed data. Furthermore, an RT is exactly valid by construction: It can be shown that the probability of committing a Type I error using the RT is never larger than the significance level α , regardless of the number of measurement occasions or the distributional properties of the data (Edgington & Onghena, 2007 ; Keller, 2012 ). A second pitfall is the presence of serial dependencies in the data (Shadish & Sullivan, 2011 ; Solomon, 2014 ). Serial dependencies can lead to inaccurate variance estimates in parametric hypothesis tests, which in turn can result in either too liberal or too conservative tests. The use of RTs can also provide a solution for this pitfall. Although the presence of serial dependencies does affect the power of the RT (Ferron & Onghena, 1996 ; Ferron & Sentovich, 2002 ; Levin et al., 2014 ; Levin et al., 2012 ), the Type I error of the RT will always be controlled at the nominal level, because the serial dependency is identical for each element of the reference distribution (Keller, 2012 ). A third pitfall that can occur when analyzing randomized AB phase designs is that these designs typically employ a small number of measurement occasions (Shadish & Sullivan, 2011 ). As such, statistical power is an issue with these designs. A fourth pitfall to analyzing single-case data is the presence of an unexpected data trend (Solomon, 2014 ). One way that unexpected data trends can occur is through maturation effects (e.g., a gradual reduction in pain scores of a patient due to a desensitization effect). In a subsequent section of this article, we will show that the RT does not alter the probability of a Type I error above the nominal level for data containing general linear trends, and thus it also mitigates this pitfall.

Analyzing randomized AB phase designs with randomization tests: a hypothetical example

For illustrative purposes, we will discuss the steps involved in constructing a randomized AB phase design and analyzing the results with an RT by means of a hypothetical example. In a first step, the researcher chooses the number of measurement occasions to be included in the design and the minimum number of measurement occasions to be included in each separate phase. For this illustration we will use the hypothetical example of a researcher planning to conduct a randomized AB phase design with 26 measurement occasions and a minimum of three measurement occasions in each phase. In a second step, the design can be randomized using the start point randomization proposed by Edgington ( 1975a ). This procedure results in a range of potential start points for the treatment throughout the course of the SCE. Each individual start point gives rise to a unique division of measurement occasions into baseline and treatment occasions in the design (we will refer to each such a division as an assignment ). The possible assignments for this particular experiment can be obtained by placing the start point at each of the measurement occasions, respecting the restriction of at least three measurement occasions in each phase. There are 21 possible assignments, given this restriction (not all assignments are listed):

AAABBBBBBBBBBBBBBBBBBBBBBB

AAAABBBBBBBBBBBBBBBBBBBBBB

AAAAABBBBBBBBBBBBBBBBBBBBB

AAAAAAAAAAAAAAAAAAAAABBBBB

AAAAAAAAAAAAAAAAAAAAAABBBB

AAAAAAAAAAAAAAAAAAAAAAABBB

Suppose that the researcher randomly selects the assignment with the 13th measurement occasion as the start point of the B phase for the actual experiment: AAAAAAAAAAAABBBBBBBBBBBBBB. In a third step, the researcher chooses a test statistic that will be used to quantify the treatment effect. In this example, we will use the absolute difference between the baseline phase mean and the treatment phase mean as a test statistic. In a fourth step, the actual experiment with the randomly selected start point is conducted, and the data are recorded. Suppose that the recorded data of the experiment are 0, 2, 2, 3, 1, 3, 3, 2, 2, 2, 2, 2, 6, 7, 5, 8, 5, 6, 5, 7, 4, 6, 8, 5, 6, and 7. Figure 1 displays these hypothetical data graphically. In a fifth step, the researcher calculates the randomization distribution, which consists of the value of the test statistic for each of the possible assignments. The randomization distribution for the present example consists of 21 values (not all values are listed; the observed value is marked in bold):

AAABBBBBBBBBBBBBBBBBBBBBBB

3.23

AAAABBBBBBBBBBBBBBBBBBBBBB

2.89

. . .

. . .

. . .

. . .

AAAAAAAAAAAAAAAAAAAAAABBBB

2.73

AAAAAAAAAAAAAAAAAAAAAAABBB

2.04

figure 1

Data from a hypothetical AB design

In a final step, the researcher can calculate a two-sided p value for the observed test statistic by determining the proportion of test statistics in the randomization distribution that are at least as extreme as the observed test statistic. In this example, the observed test statistic is the most extreme value in the randomization distribution. Consequently, the p value is 1/21, or .0476. This p value can be interpreted as the probability of observing the data (or even more extreme data) under the null hypothesis that the outcome is unrelated to the levels of the independent variable. Note that the calculation of two-sided p values are preferable if the treatment effects can go in both directions. Alternatively, the randomization test can also be inverted, in order to obtain a nonparametric confidence interval of the observed treatment effect (Michiels et al., 2017 ). The benefit of calculating confidence intervals over p values is that the former conveys the same information as the latter, with the advantage of providing a range of “plausible values” for the test statistic in question (du Prel, Hommel, Röhrig, & Blettner, 2009 ).

The Type I error of the randomization test for randomized AB phase designs in the presence of unexpected linear trend

One way in which a maturation effect can manifest itself in an SCED is through a linear trend in the data. Such a linear trend could be the result of a sensitization or desensitization effect that occurs in the participant, yielding an unexpected upward or downward trend throughout the SCE that is totally unrelated to the experimental manipulation of the design. The presence of such an unexpected data trend can seriously diminish the power of hypothesis tests in which the null and alternative hypotheses are formulated in terms of differences in mean level between phases, to the point that they become useless. A convenient property of the start point randomization of the randomized AB phase design in conjunction with the RT analysis is that the RT offers nominal Type I error rate protection for data containing linear trends under the null hypothesis that there is no differential effect of the treatment on the A phase and the B phase observations. Before illustrating this property with a simple derivation, we will demonstrate that, in contrast to the RT, a two-sample t test greatly increases the probability of a Type I error for data with a linear trend. Suppose that we have a randomized AB phase design with ten measurement occasions (with five occasions in the A phase and five in the B phase). Suppose there is no intervention effect and we just have a general linear time trend (“maturation”):

A

A

A

A

A

B

B

B

B

B

1

2

3

4

5

6

7

8

9

10

A t test on these data with a two-sided alternative hypothesis results in a t value of 5 for eight degrees of freedom, and a p value of .0011, indicating a statistically significant difference between the means at any conventional significance level. In contrast, an RT on these data produces a p value of 1, which is quite the opposite from a statistically significant treatment effect. The p value of 1 can be explained by looking at the randomization distribution for this particular example (assuming a minimum of three measurement occasions per case):

AAABBBBBBB

5

AAAABBBBBB

5

AAAAABBBBB

5

AAAAAABBBB

5

AAAAAAABBB

5

The test statistic values for all randomizations are identical, leading to a maximum p value of 1. The result for the RT in this hypothetical example is reassuring, and it can be shown that the RT with differences between means as the test statistic guarantees Type I error rate control in the presence of linear trends, whereas the rejection rate of the t test increases dramatically with increasing numbers of measurement occasions.

The nominal Type I error rate protection of the RT in a randomized AB phase design for data containing a linear trend holds in a general way. If the null hypothesis is true, the data from a randomized AB phase design with a linear trend can be written as

with Y t being the dependent variable score at time t , β 0 being the intercept, β 1 being the slope of the linear trend, ε t being the residual error, T being the time variable, and t being the time index. Assuming that the errors have a zero mean, the expected value for these data is

In a randomized AB phase design, these scores are divided between an A phase ( \( {\widehat{Y}}_{\mathrm{At}} \) ) and a B phase ( \( {\widehat{Y}}_{\mathrm{Bt}} \) ):

and with n A + n B = n . The mean of the expected A phase scores ( \( {\widehat{\overline{Y}}}_{\mathrm{A}} \) ) and the mean of the expected B phase scores ( \( {\widehat{\overline{Y}}}_{\mathrm{B}} \) ) are equal to

Consequently, the difference between \( {\widehat{\overline{Y}}}_{\mathrm{B}} \) and \( {\widehat{\overline{Y}}}_{\mathrm{A}} \) equals

which simplifies to

This derivation shows that, under the null hypothesis, \( {\widehat{\overline{Y}}}_{\mathrm{B}}-{\widehat{\overline{Y}}}_{\mathrm{A}} \) is expected to be a constant for every assignment of the randomized AB phase design. The expected difference between means, \( {\widehat{\overline{Y}}}_{\mathrm{B}}-{\widehat{\overline{Y}}}_{\mathrm{A}} \) , is only a function of the slope of the linear trend, β 1 , and the total number of measurement occasions, n . This implies that the expected value of the test statistic for each random start point is identical if the null hypothesis is true, exactly what is needed for Type I error rate control. In contrast, the rejection rate of the t test will increase with increasing β 1 and increasing n , because the difference between means constitutes the numerator of the t test statistic, and the test will only refer to Student’s t distribution with n – 2 degrees of freedom. The t test will therefore detect a difference between means that is merely the result of a general linear trend.

The result of this derivation can be further clarified by comparing the null hypotheses that are evaluated in both the RT and the t test. The null hypothesis of the t test states that there is no difference in means between the A phase observations and the B phase observations, whereas the null hypothesis of the RT states that there is no differential effect of the levels of the independent variable (i.e., the A and B observations) on the dependent variable. A data set with a perfect linear trend such as the one displayed above yields a mean level difference between the A phase observations and the B phase observations, but no differential effect between the A phase observations and the B phase observations (i.e., the trend effect is identical for both the A phase and the B phase observations). For this reason, the null hypothesis of the t test gets rejected, whereas the null hypothesis of the RT is not. Consequently, we can conclude that the RT is better suited for detecting unspecified treatment effects than is the t test, because its null hypothesis does not specify the nature of the treatment effect. Note that the t test, in contrast to the RT, assumes a normal distribution, homogeneity of variances, and independent errors, assumptions that are often implausible for SCED data. It is also worth noting that, with respect to the prevention of Type I errors, the RT also has a marked advantage over visual analysis, as the latter technique offers no way to prevent such errors when dealing with unexpected treatment effects. Consequently, we argue that statistical analysis using RTs is an essential technique for achieving valid conclusions from randomized AB phase designs.

The effect of unexpected linear trends on the power of the randomization test in randomized AB phase designs: a simulation study

In the previous section, we showed the validity of the randomized AB phase design and the RT with respect to the Type I error for data containing unexpected linear trends. Another criterion for the usability of the RT for specific types of data sets, apart from controlled Type I error rates, is adequate power. In this section we focus on the power of the RT in the randomized AB phase design when the data contain unexpected linear trends. Previous research has not yet examined the effect of unexpected linear data trends on the power of the RT in randomized AB phase designs. However, Solomon ( 2014 ) investigated the presence of linear trends in a large sample of published single-case research and found that the single-case data he surveyed were characterized by moderate levels of linear trend. As such, it is important to investigate the implications of unexpected data trends for the power of the RT in randomized AB phase designs.

When assessing the effect of linear trend on the power of the RT, we should make a distinction between the situation in which a data trend is expected and the situation in which a data trend is not expected. Edgington ( 1975b ) proposed a specific type of RT for the former situation. More specifically, the proposed RT utilizes a test statistic that takes the predicted trend into account, in order to increase its statistical power. Using empirical data from completely randomized designs, Edgington ( 1975b ) illustrated that such an RT can be quite powerful when the predicted trend is accurate. Similarly, a study by Levin, Ferron, and Gafurov ( 2017 ) showed that the power of the RT can be increased for treatment effects that are delayed and/or gradual in nature, by using adjusted test statistics that account for these types of effects. Of course, in many realistic research situations, data trends are either unexpected or are expected but cannot be accurately predicted. Therefore, we performed a Monte Carlo simulation study to investigate the effect of unexpected linear data trends on the power of the RT when it is used to assess treatment effects in randomized AB phase designs. A secondary goal was to provide guidelines for the number of measurement occasions to include in a randomized AB phase design, in order to achieve sufficient power for different types of data patterns containing trends and various treatment effect sizes. Following the guidelines by Cohen ( 1988 ), we defined “sufficient power” as a power of 80% or more.

The Monte Carlo simulation study contained the following factors: mean level change, a trend in the A phase, a trend in the B phase, autocorrelation in the residuals, and the number of measurement occasions for each data set. We used the model of Huitema and McKean ( 2000 ) to generate the data. This model uses the following regression equation:

Y t being the outcome at time t , with t = 1, 2, . . . , n A , n A +1, . . . , n A + n B ,

n A being the number of observations in the A phase,

n B being the number of observations in the B phase,

β 0 being the regression intercept,

T t being the time variable that indicates the measurement occasions,

D t being the value of the dummy variable indicating the treatment phase at time t ,

[ T t – ( n A +1)] D t being the value of the slope change variable at time t ,

β 1 being the regression coefficient for the A phase trend,

β 2 being the regression coefficient for the mean level treatment effect,

β * 3 being the regression coefficient for the slope change variable, and

ε t being the error at time t .

In this simulation study, we will sample ε t from a standard normal distribution or from a first-order autoregressive model (AR1) model.

The A phase trend, the treatment effect, and the B phase slope change correspond to the β 1 , β 2 , and β * 3 regression coefficients of the Huitema–McKean model, respectively. Note that β * 3 of the Huitema–McKean model indicates the amount of slope change in the B phase relative to the A phase trend. For our simulation study, we defined a new parameter (denoted by β 3 ) that indicates the value of the trend in the B phase independent of the level of trend in the A phase. The relation between β * 3 and β 3 can be written as follows: β 3 = β * 3 + β 1 . To include autocorrelation in the simulated data sets, the ε t s were generated from an AR1 model with different values for the AR parameter. Note that residuals with an autocorrelation of 0 are equivalent to the residuals from a standard normal distribution. The power of the RT was evaluated for two different measures of ES: an absolute mean difference statistic (MD) and an immediate treatment effect index (ITEI).

The MD is defined as

with \( \overline{A} \) being the mean of all A phase observations and \( \overline{B} \) being the mean of all B phase observations. The ITEI is defined as

with \( {\overline{A}}_{ITEI} \) being the mean of the last three A phase observations before the introduction of the treatment and \( {\overline{B}}_{ITEI} \) being the mean of the first three B phase observations after the introduction of the treatment. For each of the simulation factors, the following levels were used in the simulation study:

β 1 : 0, .25, .50

β 2 : – 4, – 1, 0, 1, 4

β 3 : – .50, – .25, 0, .25, .50

AR1: – .6, – .3, 0, .3, .6.

N : 30, 60, 90, 120

ES: MD, ITEI

The β 1 and β 3 values were based on a survey by Solomon ( 2014 ), who calculated trend values through linear regression for a large number of single-case studies. A random-effects meta-analysis showed that the mean standardized trend regression weight for all analyzed data was .37, with a 95% confidence interval of [.28 ; .43]. On the basis of these results, we defined a “small” trend as a standardized regression weight of .25 and a “large” trend as a standardized regression weight of .50. Note that we included upward trends (i.e., β 3 values with a positive sign) as well as downward trends in the B phase (i.e., β 3 with a negative sign), in order to account for data patterns with A phase trends and B phase trends that go in opposite directions. It was not necessary to also include downward trends in the A phase, because this would lead to some data patterns being just mirror images (when only the direction of the A phase trend as compared to the B phase trend was considered) in the full factorial crossing of all included parameter values. The full factorial combination of these three β 1 values and five β 3 values resulted in 15 different data patterns containing an A phase trend and/or a B phase trend. Table 1 provides an overview of these 15 data patterns, and Fig. 2 illustrates the data patterns visually. Note that the data patterns in Fig. 2 only serve to illustrate the described A phase trends and/or B phase trends, as these patterns do not contain any data variability nor a mean level treatment effect. Hereafter, we will use the numbering in Table 1 to refer to each of the 15 data patterns individually.

figure 2

Fifteen AB data patterns containing an A phase trend and/or a B phase trend

The values for β 2 were based on the standardized treatment effects reported by Harrington and Velicer ( 2015 ), who used interrupted time series analyses on a large number of empirical single-case data sets published in the Journal of Applied Behavioral Analysis. The Huitema–McKean model is identical to the interrupted time series model of Harrington and Velicer when the autoregressive parameter of the latter model is zero. We collected the d values (which correspond to standardized β 2 values in the Huitema–McKean model) reported in Table 1 of Harrington and Velicer’s study, and defined β 2 = 1 as a “small” treatment effect and β 2 = 4 as a “large” treatment effect. These values were the 34th and 84th percentiles of the empirical d distribution, respectively. The AR1 parameter values were based on a survey by Solomon ( 2014 ), who reported a mean absolute autocorrelation of .36 across a large number of single-case data sets. On the basis of this value, we defined .3 as a realistic AR1 parameter value. To obtain an additional “bad case scenario” condition with respect to autocorrelation, we doubled the empirical value of .3. Both the AR1 values of .3 and .6 were included with negative and positive signs in the simulation study, in order to assess the effects of both negative and positive autocorrelation. The numbers of measurement occasions of the simulated data sets were either 30, 60, 90, or 120. We chose a lower limit of 30 measurement occasions because this is the minimum number of measurement occasions that is needed in a randomized AB phase design with at least five measurement occasions in each phase to achieve a p value equal to .05 or smaller. The upper limit of 120 measurement occasions was chosen on the basis of a survey by Harrington and Velicer that showed that SCEDs rarely contain more than 120 measurement occasions.

The ES measures used in this simulation study are designed to quantify two important aspects of evaluating treatment effects of single-case data, according to the recommendations of the What Works Clearinghouse (WWC) Single-Case Design Standards (Kratochwill et al., 2010 ). The first aspect is the overall difference in level between phases, which we quantified using the absolute mean difference between all A phase observations and all B phase observations. Another important indicator for treatment effectiveness in randomized AB phase designs is the immediacy of the treatment effect (Kratochwill et al., 2010 ). For this aspect of the data, we calculated an immediate treatment effect index (ITEI). On the basis of the recommendation by Kratochwill et al., we defined the ITEI in a randomized AB phase design as the average difference between the last three A observations and the first three B observations. Both ESs were used as the test statistic in the RT for this simulation study. In accordance with the WWC standards’ recommendation that a “phase” should consist of five or more measurement occasions (Kratochwill et al., 2010 ), we took a minimum limit of five measurement occasions per phase into account for the start point randomization in the RT. A full factorial crossing of all six simulation factors yielded 3,750 simulation conditions. The statistical power of the RT for each condition was calculated by generating 1,000 data sets and calculating the proportion of rejected null hypotheses at a 5% significance level across these 1,000 replications.

The results will be presented in two parts. To evaluate the effect of the simulation factors on the power of the RT, we will present the main effects of each simulation factor. Apart from a descriptive analysis of the statistical power in the simulation conditions, we will also look at the variation between conditions using a multiway analysis of variance (ANOVA). We will limit the ANOVA to main effects because the interaction effects between the simulation factors were small and difficult to interpret. For each main effect, we will calculate eta-squared ( η 2 ) in order to identify the most important determinants of the results. Second, we will report the power for each specific AB data pattern that was included in the simulation study for both the MD and the ITEI.

Main effects

The results from the multiway ANOVA indicated that all simulation factors had a statistically significant effect on the power of the RT at the .001 significance level. Table 2 displays the η 2 values for the main effect of each simulation factor, indicating the relative importance of these factors in determining the power of the RT, in descending order.

Table 2 shows that by far the largest amount of variance was explained by the size of the treatment effect ( β 2 ). Of course, this result is to be expected, because the size of the treatment effect ranged from 0 to 4 (in absolute value), which is a very large difference. The large amount of variance explained by the treatment effect size also accounts for the large standard deviations for the power levels of the other main effects (displayed in Tables 4 – 8 in the Appendix). To visualize the effect of the simulation factors on the RT’s power, we plotted the effect of each simulation factor in interaction with the size of the treatment effect ( β 2 ) while averaging the power across all other simulation factors in the simulation study in Fig. 3 . The means and standard deviations of the levels of the main effect for each experimental factor (averaged across all other simulation factors, including the size of the treatment effect) can be found in Tables 4 – 8 in the Appendix.

figure 3

Effects of the simulation factors of the simulation study in interaction with the size of the treatment effect: (1) the number of measurement occasions, (2) the level of autocorrelation, (3) the A phase trend, (4) the B phase trend, and (5) the test statistic used in the randomization test. The proportions of rejections for the conditions in which the treatment effect is zero are the Type I error rates. N = number of measurement occasions, AR = autoregression parameter, β 1 = A phase trend regression parameter, β 3 = B phase trend regression parameter, ES = effect size measure

Panels 1–5 in Fig. 3 show the main effects of the number of measurement occasions, the level of autocorrelation, the size of the A phase trend, the size of the B phase trend, and the effect size measure used, respectively, on the power of the RT. We will summarize the results concerning the main effects for each of these experimental factors in turn.

Number of measurement occasions

Apart from the obvious result that an increase in the number of measurement occasions increases the power of the RT, we can also see that the largest substantial increase in average power occurs when increasing the number of measurement occasions from 30 to 60. In contrast, increasing the number of measurement occasions from 60 to 90, or even from 90 to 120, yields only very small increases in average power.

Level of autocorrelation

The main result for this experimental factor is that the presence of positive autocorrelation in the data decreases the power, whereas the presence of negative autocorrelation increases the power. However, Table 2 shows that the magnitude of this effect is relatively small as compared to the other effects in the simulation study.

Effect size measure

The results show that the ITEI on average yields larger power than does the MD for the types of data patterns that were used in this simulation study.

A phase trend (β 1 )

On average, the power of the randomized AB phase design is reduced when there is an A phase trend in the data, and this reduction increases when the A phase trend gets larger.

B phase trend (β 3 )

The presence of B phase trend in the data reduces the power of the RT, as compared to data without a B phase trend. In addition, the power reduction increases as the B phase trend gets larger. Furthermore, the increase in the reduction of power is larger for downward B phase trends than for upward B phase trends for data that also contain an upward A phase trend. Because the A phase trends in this simulation study were all upward trends, we can conclude that the power reduction associated with the presence of B phase trend is larger when the B phase trend has a direction opposite the direction of the A phase trend than in the situation in which both trends have the same direction. Similarly, it is also evident across all panels of Fig. 3 that the power of the RT is lower for treatment effects that have a direction opposite to the direction of the A phase trend.

Finally, the conditions in Fig. 3 in which the treatment effect is zero show that the manipulation of each experimental factor did not inflate the Type I error rate of the RT above the nominal significance level. However, this result is to be expected, as the RT provides guaranteed nominal Type I error control.

Trend patterns

In this section we will discuss the power differences between the different types of data patterns in the simulation study. In addition, we will pay specific attention to the differences between the MD and the ITEI in the different data patterns, as the ES measure that was used in the RT was the experimental factor that explained the most variance in the ANOVA apart from the size of the treatment effect. Figure 4a contains the power graphs for Data Patterns 1–5, Fig. 4b contains the power graphs for Data Patterns 6–10, and Fig. 4c contains the power graphs for Data Patterns 11–15.

Data patterns with no A phase trend (Data Patterns 1–5): The most important results regarding Data Patterns 1–5 can be summarized in the following bullet points:

For data patterns without any trend (Data Pattern 1), the average powers of the MD and the ITEI are similar.

The average power of the ITEI is substantially larger than the average power of the MD for data patterns with any type of B phase trend (Data Patterns 2–5).

Comparison of Data Patterns 2 and 3 shows that the average power advantage of the ITEI as compared to the MD in data patterns with an upward B phase trend increases as the B phase trend grows larger.

The average power of the MD in Data Patterns 2–5 is very low.

The average power graphs for Data Patterns 1–5 are symmetrical, which means that the results for negative and positive mean level treatment effects are similar.

Data patterns with an A phase trend of .25 (Data Patterns 6–10):

For all five of these data patterns, the ITEI has a large average power advantage as compared to the MD, for both positive and negative treatment effects.

The average powers of both the ITEI and the MD are higher when the treatment effect has the same direction as the A phase trend, as compared to when the effects go in opposite directions.

The average power difference between the MD and the ITEI is larger when the A phase trend and the treatment effect go in opposite directions than when they have the same direction.

When the A phase trend and the B phase trend have the same value (Data Pattern 7), the average power advantage of the ITEI relative to the MD disappears, but only for positive treatment effects.

The average power of the MD is extremely low in nearly all data patterns.

Data patterns with an A phase trend of .50 (Data Patterns 11–15):

In comparison to Data Patterns 6–10, the overall average power drops due to the increased size of the A phase trend (for both the ITEI and the MD and for both positive and negative treatment effects).

For all five data patterns, the ITEI has a large average power advantage over the MD, for both positive and negative treatment effects.

When the A phase trend and the B phase trend have the same value (Data Pattern 13), the average power advantage of the ITEI relative to the MD disappears, but only for positive treatment effects.

The average power of the MD is extremely low for all types of treatment effects in all data patterns (except for Data Pattern 13). In contrast, the ITEI still has substantial average power, but only for positive treatment effects.

figure 4

a Power graphs for the five AB data patterns without an A phase trend. β 1 and β 3 represent the trends in the A and B phases, respectively. b Power graphs for the five AB data patterns with an upward A phase trend of .25. β 1 and β 3 represent the trends in the A and B phases, respectively. c Power graphs for the five AB data patterns with an upward A phase trend of .5. β 1 and β 3 represent the trends in the A and B phases, respectively

The most important results regarding differences between the individual data patterns and between the MD and the ITEI can be summarized as follows:

The presence of A phase trend and/or B phase trend in the data decreases the power of the RT, as compared to data without such trends, and the decrease is proportional to the magnitude of the trend.

Treatment effects that go in the same direction as the A phase trend can be detected with higher power than treatment effects that go in the opposite direction from the A phase trend.

The ITEI yields higher power than does the MD in data sets with trends, especially for large trends and trends that have a direction opposite from the direction of the treatment effect.

An additional result regarding the magnitude of the power in the simulation study is that none of the conditions using 30 measurement occasions reached a power of 80% or more. Also, all conditions that reached a power of 80% or more contained large treatment effects ( β 2 = 4). The analysis of the main effects showed that designs with 90 or 120 measurement occasions only yielded very small increases in power as compared to designs with 60 measurement occasions. Table 3 contains an overview of the average powers for large positive and large negative mean level treatment effects ( β 2 = |4|) for each of the 15 different data patterns with 60 measurement occasions, for both the MD and the ITEI (averaged over the levels of autocorrelation in the data).

Upon inspecting Table 3 , one can see that for detecting differences in mean level (i.e., the simulation conditions using the MD as the test statistic), the randomized AB phase design only has sufficient power for data patterns without any trend (Data Pattern 1) or for data patterns in which the A phase trend and the B phase trend are equal (Data Patterns 7 and 13) and in which the treatment effect is in the same direction as the A phase trend. With respect to detecting immediate treatment effects, one can see that the randomized AB phase design had sufficient power for all the data patterns with no A phase trend included in the simulation study, provided that the treatment effect was large (Data Patterns 1–5). For data patterns with A phase trend, the randomized AB phase design also has sufficient power, provided that the treatment effect is in the same direction as the A phase trend. When the treatment effect is in the opposite direction from the A phase trend, the randomized AB phase design only has sufficient power when both the A phase trend and the B phase trend are small (Data Patterns 6, 7, and 9). It is also important to note that the RT only has sufficient power for large treatment effects.

Discussion and future research

In this article we have argued that randomized AB phase designs are an important part of the methodological toolbox of the single-case researcher. We discussed the advantages and disadvantages of these designs in comparison with more complex phase designs, such as ABA and ABAB designs. In addition, we mentioned some common data-analytical pitfalls when analyzing randomized AB phase designs and discussed how the RT as a data-analytical technique can lessen the impact of some of these pitfalls. We demonstrated the validity of the RT in randomized AB phase designs containing unexpected linear trends and investigated the implications of unexpected linear data trends for the power of the RT in randomized AB phase designs. To cover a large number of potential empirical data patterns with linear trends, we used the model of Huitema and McKean ( 2000 ) for generating data sets. The power was assessed for both the absolute mean phase difference (MD, designed to evaluate differences in level) and the immediate treatment effect index (ITEI, designed to evaluate the immediacy of the effect) as the test statistic in the RT. In addition, the effect of autocorrelation on the power of the RT in randomized AB phase designs was investigated by incorporating residual errors with different levels of autocorrelation into the Huitema–McKean model.

The results showed that the presence of any combination of A phase trend and/or B phase trend reduced the power of the RT in comparison to data patterns without trend. In addition, the results showed that the ITEI yielded substantially higher power in the RT than did the MD for randomized AB phase designs containing linear trend. Autocorrelation only had a small effect on the power of the RT, with positive autocorrelation diminishing the power of the RT and negative autocorrelation increasing its power. Furthermore, the results showed that none of the conditions using 30 measurement occasions reached a power of 80% or more. However, the power increased dramatically when the number of measurement occasions was increased to 60. The main effect of number of measurement occasions showed that the power of randomized AB phase designs with 60 measurement occasions hardly benefits from an increase to 90 or even 120 measurement occasions.

The overarching message of this article is that the randomized AB phase design is a potentially valid experimental design. More specifically, the use of repeated measurements, a deliberate experimental manipulation, and random assignment all increase the probability that a valid inference regarding the treatment effect of an intervention for a single entity can be made. In this respect, it should be noted that the internal validity of an experimental design is also dependent on all plausible rival hypotheses, and that it is difficult to make general statements regarding the validity of a design, regardless of the research context. As such, we recommend that single-case researchers should not reject randomized AB phase designs out of hand, but consider how such designs can be used in a valid manner for their specific purposes.

The results from this simulation study showed that the randomized AB phase design has relatively low power: A power of 80% or more is only reached when treatment effects are large and the design contains a substantial number of measurement occasions. These results echo the conclusions of Onghena ( 1992 ), who investigated the power of randomized AB phase designs for data without trend or autocorrelation. That being said, this simulation study also showed that it is possible to achieve a power of 80% or more for specific data patterns containing unexpected linear trends and/or autocorrelation, at least for large effect sizes.

One possibility for increasing the power of the RT for data sets with trends may be the use of adjusted test statistics that accurately predict the trend (Edgington, 1975b ; Levin et al., 2017 ). Rather than predicting the trend before the data are collected, another option might be to specify an adjusted test statistic after data collection using masked graphs (Ferron & Foster-Johnson, 1998 ).

Recommendations with regard to an appropriate number of measurement occasions for conducting randomized AB phase designs should be made cautiously, for several reasons. First, the manipulation of the treatment effect in this simulation study was very large and accounted for most of the variability in the power. Consequently, the expected size of the treatment effect is an important factor in selecting the number of measurement occasions for the randomized AB phase design. Of course, the size of the treatment effect cannot be known beforehand, but it is plausible that effect size magnitudes vary depending on the specific domain of application. Second, we did not investigate possible interactions between the various experimental factors, because these would be very difficult to interpret. However, these potential interactions might have an effect on the power of different types of data patterns, making it more difficult to formulate general recommendations. Taking the previous disclaimers into account, we can state that randomized AB phase designs in any case should contain more than 30 measurement occasions to achieve adequate power. Note that Shadish and Sullivan ( 2011 ) reported that across a survey of 809 published SCEDs, the median number of measurement occasions was 20, and that 90.6% of the included SCEDs had fewer than 50 data points. It is possible that randomized AB phase designs with fewer than 60 measurement occasions may also have sufficient power in specific conditions we simulated, but we cannot verify this on the basis of the present results. As we previously mentioned, we do not recommend implementing randomized AB phase designs with more than 60 measurement occasions, since the extra practical burden this entails does not outweigh the very small increase in power it yields.

Although we advocate the use of randomization in SCEDs, readers should note that some authors oppose to this practice, as well as the use of RTs, because it conflicts with response-guided experimentation (Joo, Ferron, Beretvas, Moeyaert, & Van den Noortgate, 2017 ; Kazdin, 1980 ). According to this approach, decisions to implement, withdraw, or alter treatments are often based on the observed data patterns during the course of the experiment (e.g., starting the treatment only after the baseline phase has stabilized). Response-guided experimentation conflicts with the use of RTs, because RTs require prespecifying the start of the treatment in a random fashion. In response to this criticism, Edgington ( 1980 ) proposed an RT in which only part of the measurement occasions of the SCE are randomized, thus giving the researcher control over the nonrandomized part.

Some additional remarks concerning the present simulation study are in order. First, although this simulation study showed that the randomized AB phase design has relatively low power, we should mention that multiple randomized AB phase designs can be combined in a multiple-baseline, across-participant design that increases the power of the RT considerably (Onghena & Edgington, 2005 ). More specifically, a simulation study has shown that under most conditions, the power to detect a standardized treatment effect of 1.5 for designs with four participants and a total of 20 measurement occasions per participant is already 80% or more (Ferron & Sentovich, 2002 ). A more recent simulation study by Levin, Ferron, and Gafurov ( 2018 ) investigating several different randomization test procedures for multiple-baseline designs showed similar results. Another option to obtain phase designs with more statistical power would be to extend the basic AB phase design to an ABA or ABAB design. Onghena ( 1992 ) has developed an appropriate randomization test for such extended phase designs.

Second, it is important to realize that the MD and ITEI analyses used in this simulation study quantify two different aspects of the difference between the phases. The MD aims to quantify overall level differences between the A phase and the B phase, whereas the ITEI aims to quantify the immediate treatment effect after the implementation of the treatment. The fact that the power of the RT in randomized AB phase designs is generally higher for the ITEI than for the MD indicates that the randomized AB phase design is mostly sensitive to immediate changes in the dependent variable after the treatment has started. Kratochwill et al. ( 2010 ) argued that immediate treatment effects are more reliable indicators of a functional relation between the outcome variable and the treatment than are gradual or delayed treatment effects. In this sense, the use of a randomized AB phase design is appropriate to detect such immediate treatment effects.

Third, in this article we assumed a research situation in which a researcher is interested in analyzing immediate treatment effects and differences in mean level, but in which unexpected linear trends in the data hamper such analyses. In this context it is important to mention that over the years multiple proposals have been made concerning how to deal with the presence of trends in the statistical analysis of single-case data. These proposals include RTs for predicted trends (Edgington, 1975b ), calculating measures of ES that control for trend (e.g., the percentage of data points exceeding the baseline median; Ma, 2006 ), calculating ESs that incorporate the trend into the treatment effect itself (e.g., Tau-U; Parker, Vannest, Davis, & Sauber, 2011 ), and quantifying trend separately from a mean level shift effect, which is an approach adopted by most regression-based techniques (e.g., Allison & Gorman, 1993 ; Van den Noortgate & Onghena, 2003 ), and also by slope and level change (SLC; Solanas et al., 2010 ), which is a nonparametric technique to isolate the trend from the mean level shift effect in SCEDs. The possibilities to deal with trends in single-case data are numerous and beyond of the scope of the present article.

The present study has a few limitations that we will now mention. First of all, the results and conclusions of this simulation study are obviously limited to the simulation conditions that were included. Because we simulated a large number of data patterns, we had to compromise on the number of levels of some simulation factors in order to keep the simulation study computationally manageable. For example, we only used three different treatment effect sizes (in absolute value) and four different numbers of measurement occasions. Moreover, the incremental differences between the different values of these factors were quite large. Second, this simulation study only considered the 15 previously mentioned data patterns generated from the Huitema–McKean model, featuring constant and immediate treatment effects and linear trends. We did not simulate data patterns with delayed or gradual treatment effects or nonlinear trends. An interesting avenue for future research would be to extend the present simulation study to delayed and/or gradual treatment effects and nonlinear trends. Third, in this simulation study we only investigated randomized AB phase designs. Future simulation studies could investigate the effect of unexpected trends in more complex phase designs, such as ABA and ABAB designs or multiple-baseline designs. Fourth, we only used test statistics designed to evaluate two aspects of single-case data: level differences and the immediacy of the effect. Although these are important indicators of treatment effectiveness, other aspects of the data might provide additional information regarding treatment efficacy. More specifically, data aspects such as variability, nonoverlap, and consistency of the treatment effect must also be evaluated in order to achieve a fuller understanding of the data (Kratochwill et al., 2010 ). In this light, more research needs to be done evaluating the power of the RT using test statistics designed to quantify trend, variability, and consistency across phases. Future research could focus on devising an RT test battery consisting of multiple RTs with different test statistics, each aimed at quantifying a different aspect of the data at hand. In such a scenario, the Type I error rate across multiple RTs could be controlled at the nominal level using multiple testing corrections. A final limitation of this simulation study is that the data were generated using a random-sampling model with the assumption of normally distributed errors. It is also possible to evaluate the power of the RT in a random assignment model (cf. conditional power; Keller, 2012 ; Michiels et al., 2018 ). Future research could investigate whether the results of the present simulation study would still hold in a conditional power framework.

The AB phase design has been commonly dismissed as inadequate for research purposes because it allegedly cannot control for maturation and history effects. However this blanket dismissal of AB phase designs fails to discern between randomized and nonrandomized versions of the design. The present article has demonstrated that the randomized AB phase design is a potentially internally valid experimental design that can be used for assessing the effect of a treatment in a single participant when the treatment is irreversible or cannot be withdrawn due to ethical reasons. We showed that randomized AB phase designs can be analyzed with randomization tests to assess the statistical significance of the mean level changes and immediate changes in the outcome variable by using appropriate test statistics for each type of effect. The results of a simulation study showed that the power with which mean level changes and immediate changes can be evaluated depends on the specific type of data pattern that is analyzed. We concluded that for nearly every data pattern in this simulation study that included an upward A phase trend, a positive treatment effect, and/or a downward or upward B phase trend, it was possible to detect immediate treatment effects with sufficient power using the RT. In any case, randomized AB phase designs should contain more than 30 measurement occasions to provide adequate power in the RT. Researchers should be aware that the randomized AB phase design generally has low power, even for large sample sizes. For this reason, we recommend that researchers use single-case phase designs with more power (such as randomized multiple-baseline designs or a serially replicated randomized AB phase design) whenever possible, as they have a higher statistical-conclusion validity. When an AB phase design is the only feasible option, researchers should consider the benefits of randomly determining the intervention point. It is far better to perform the randomized AB phase design, which can provide tentative information about a treatment effect, than not to perform an SCED study at all.

Allison, D. B., & Gorman, B. S. (1993). Calculating effect sizes for meta-analysis: The case of the single case. Behaviour Research and Therapy , 31 , 621–631.

PubMed   Google Scholar  

Alnahdi, G. H. (2015). Single-subject design in special education: Advantages and limitations. Journal of Research in Special Educational Needs , 15 , 257–265.

Google Scholar  

Barlow, D. H., & Hayes, S. C. (1979). Alternating treatments design: One strategy foßr comparing the effects of two treatments in a single subject. Journal of Applied Behavior Analysis , 12 , 199–210.

PubMed   PubMed Central   Google Scholar  

Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies for studying behavior change (3rd ed.). Boston, MA: Pearson.

Bobrovitz, C. D., & Ottenbacher, K. J. (1998). Comparison of visual inspection and statistical analysis of single-subject data in rehabilitation research. American Journal of Physical Medicine and Rehabilitation 77 , 94–102.

Borckardt, J. J., & Nash, M. R. (2014). Simulation modelling analysis for small sets of single-subject data collected over time. Neuropsychological Rehabilitation , 24 , 492–506.

Bulté, I., & Onghena, P. (2008). An R package for single-case randomization tests. Behavior Research Methods , 40 , 467–478. https://doi.org/10.3758/BRM.40.2.467

Article   PubMed   Google Scholar  

Busk, P. L., & Serlin, R. C. (1992). Meta-analysis for single-case research. In T. R. Kratochwill, J. R. Levin (Eds.), Single-case research design and analysis: New directions for psychology and education (pp. 187–212). Hillsdale, NJ: Erlbaum.

Campbell, D. T. (1969). Reforms as experiments. American Psychologist , 24 , 409–429. https://doi.org/10.1037/h0027982

Article   Google Scholar  

Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi- experimental designs for research. Boston, MA: Houghton Mifflin.

Chambless, D. L., & Ollendick, T. H. (2001). Empirically supported psychological interventions: Controversies and evidence. Annual Review of Psychology , 52 , 685–716.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago, IL: Rand McNally.

de Vries, R. M., & Morey, R. D. (2013). Bayesian hypothesis testing for single-subject designs. Psychological Methods , 18 , 165–185. https://doi.org/10.1037/a0031037

du Prel, J., Hommel, G., Röhrig, B., & Blettner, M. (2009). Confidence interval or p -value? Deutsches Ärzteblatt International , 106 , 335–339.

Dugard, P. (2014). Randomization tests: A new gold standard? Journal of Contextual Behavioral Science , 3 , 65–68.

Dugard, P., File, P., & Todman, J. (2012). Single-case and small-n experimental designs: A practical guide to randomization tests (2nd ed.). New York, NY: Routledge.

Edgington, E. S. (1967). Statistical inference from N = 1 experiments. Journal of Psychology , 65 , 195–199.

Edgington, E. S. (1975a). Randomization tests for one-subject operant experiments. Journal of Psychology , 90 , 57–68.

Edgington, E. S. (1975b). Randomization tests for predicted trends. Canadian Psychological Review , 16 , 49–53.

Edgington, E. S. (1980). Overcoming obstacles to single-subject experimentation. Journal of Educational Statistics , 5 , 261–267.

Edgington, E. S. (1996). Randomized single-subject experimental designs. Behaviour Research and Therapy , 34 , 567–574.

Edgington, E. S., & Onghena, P. (2007). Randomization tests (4th ed.). Boca Raton, FL: Chapman & Hall/CRC.

Ferron, J., & Foster-Johnson, L. (1998). Analyzing single-case data with visually guided randomization tests. Behavior Research Methods, Instruments, & Computers , 30 , 698–706. https://doi.org/10.3758/BF03209489

Ferron, J., & Onghena, P. (1996). The power of randomization tests for single-case phase designs. Journal of Experimental Education , 64 , 231–239.

Ferron, J., & Sentovich, C. (2002). Statistical power of randomization tests used with multiple-baseline designs. Journal of Experimental Education , 70 , 165–178.

Ferron, J., & Ware, W. (1995). Analyzing single-case data: The power of randomization tests. Journal of Experimental Education , 63 , 167–178.

Gabler, N. B., Duan, N., Vohra, S., & Kravitz, R. L. (2011). N -of-1 trials in the medical literature: A systematic review. Medical Care , 49 , 761–768.

Gast, D.L., & Ledford, J.R. (2014). Single case research methodology: Applications in special education and behavioral sciences (2nd ed.).New York, NY: Routledge.

Gottman, J. M., & Glass, G. V. (1978). Analysis of interrupted time-series experiments. In T. R. Kratochwill (Ed.), Single-subject research: Strategies for evaluating change (pp. 197–237). New York, NY: Academic Press.

Hammond, D., & Gast, D. L. (2010). Descriptive analysis of single-subject research designs: 1983–2007. Education and Training in Autism and Developmental Disabilities , 45 , 187–202.

Harrington, M., & Velicer, W. F. (2015). Comparing visual and statistical analysis in single-case studies using published studies. Multivariate Behavioral Research , 50 , 162–183.

Harris, F. N., & Jenson, W. R. (1985). Comparisons of multiple- baseline across persons designs and AB designs with replications: Issues and confusions. Behavioral Assessment , 7 , 121–127.

Harvey, M. T., May, M. E., & Kennedy, C. H. (2004). Nonconcurrent multiple baseline designs and the evaluation of educational systems. Journal of Behavioral Education , 13 , 267–276.

Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2012). A standardized mean difference effect size for single case designs. Research Synthesis Methods , 3 , 324–239.

Heyvaert, M., Moeyaert, M.,Verkempynck, P., Van den Noortgate, W., Vervloet, M., Ugille M., & Onghena, P. (2017). Testing the intervention effect in single-case experiments: A Monte Carlo simulation study. Journal of Experimental Education , 85 , 175–196.

Heyvaert, M., & Onghena, P. (2014). Analysis of single-case data: Randomisation tests for measures of effect size. Neuropsychological Rehabilitation , 24 , 507–527.

Heyvaert, M., Wendt, O., Van den Noortgate, W., & Onghena, P. (2015). Randomization and data-analysis items in quality standards for single-case experimental studies. Journal of Special Education , 49 , 146–156.

Horner, R. H., Swaminathan, H., Sugai, G., & Smolkowski, K. (2012). Considerations for the systematic analysis and use of single-case research. Education & Treatment of Children , 35 , 269–290.

Huitema, B. E., & McKean, J. W. (2000). Design specification issues in time- series intervention models. Educational and Psychological Measurement , 60 , 38–58.

Joo, S.-H., Ferron, J. M., Beretvas, S. N., Moeyaert, M., & Van den Noortgate, W. (2017). The impact of response-guided baseline phase extensions on treatment effect estimates. Research in Developmental Disabilities . https://doi.org/10.1016/j.ridd.2017.12.018

Kazdin, A. E. (1980). Obstacles in using randomization tests in single-case experimentation. Journal of Educational Statistics , 5 , 253–260.

Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings (2nd ed.). New York, NY: Oxford University Press.

Keller, B. (2012). Detecting treatment effects with small samples: The power of some tests under the randomization model. Psychometrika , 2 , 324–338.

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2010). Single-case designs technical documentation. Retrieved from the What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .

Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods , 15 , 124–144. https://doi.org/10.1037/a0017736

Kratochwill, T. R., & Stoiber, K. C. (2000). Empirically supported interventions and school psychology: Conceptual and practical issues: Part II. School Psychology Quarterly , 15 , 233–253.

Leong, H. M., Carter, M., & Stephenson, J. (2015). Systematic review of sensory integration therapy for individuals with disabilities: Single case design studies. Research in Developmental Disabilities , 47 , 334–351.

Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2014). Improved randomization tests for a class of single-case intervention designs. Journal of Modern Applied Statistical Methods , 13 , 2–52.

Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2017). Additional comparisons of randomization-test procedures for single-case multiple-baseline designs: Alternative effect types. Journal of School Psychology , 63 , 13–34.

Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2018). Comparison of randomization-test procedures for single-case multiple-baseline designs. Developmental Neurorehabilitation , 21 , 290–311. https://doi.org/10.1080/17518423.2016.1197708

Levin, J. R., Ferron, J. M., & Kratochwill, T. R. (2012). Nonparametric statistical tests for single-case systematic and randomized ABAB … AB and alternating treatment intervention designs: New developments, new directions. Journal of School Psychology , 50 , 599–624.

Logan, L. R., Hickman, R. R., Harris, S. R., & Heriza, C. B. (2008). Single-subject research design: Recommendations for levels of evidence and quality rating. Developmental Medicine and Child Neurology , 50 , 99–103.

Ma, H. H. (2006). An alternative method for quantitative synthesis of single-subject research: Percentage of data points exceeding the median. Behavior Modification , 30 , 598–617.

Manolov, R., & Onghena, P. (2017). Analyzing data from single-case alternating treatments designs. Psychological Methods . Advance online publication. https://doi.org/10.1037/met0000133

Mansell, J. (1982). Repeated direct replication of AB designs. Journal of Behavior Therapy and Experimental Psychiatry , 13 , 261–262.

Michiels, B., Heyvaert, M., Meulders, A., & Onghena, P. (2017). Confidence intervals for single-case effect size measures based on randomization test inversion. Behavior Research Methods , 49 , 363–381. https://doi.org/10.3758/s13428-016-0714-4

Michiels, B., Heyvaert, M., & Onghena, P. (2018). The conditional power of randomization tests for single-case effect sizes in designs with randomized treatment order: A Monte Carlo simulation study. Behavior Research Methods , 50 , 557–575. https://doi.org/10.3758/s13428-017-0885-7

Michiels, B., & Onghena, P. (2018). Nonparametric meta-analysis for single-case research: Confidence intervals for combined effect sizes. Behavior Research Methods . https://doi.org/10.3758/s13428-018-1044-5

Onghena, P. (1992). Randomization tests for extensions and variations of ABAB single-case experimental designs: A rejoinder. Behavioral Assessment , 14 , 153–171.

Onghena, P. (2005). Single-case designs. In B. Everitt & D. Howell (Eds.), Encyclopedia of statistics in behavioral science (Vol. 4, pp. 1850–1854). Chichester, UK: Wiley.

Onghena, P., & Edgington, E. S. (1994). Randomization tests for restricted alternating treatments designs. Behaviour Research and Therapy , 32 , 783–786.

Onghena, P., & Edgington, E. S. (2005). Customization of pain treatments: Single-case design and analysis. Clinical Journal of Pain , 21 , 56–68.

Onghena, P., Vlaeyen, J. W. S., & de Jong, J. (2007). Randomized replicated single-case experiments: Treatment of pain-related fear by graded exposure in vivo. In S. Sawilowsky (Ed.), Real data analysis (pp. 387–396). Charlotte, NC: Information Age.

Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). Effect size in single-case research: a review of nine nonoverlap techniques. Behavior Modification , 35 , 303–322.

Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single-case research: Tau-U. Behavior Therapy , 42 , 284–299.

Rindskopf, D. (2014). Nonlinear Bayesian analysis for single case designs. Journal of School Psychology , 52 , 179–189.

Rindskopf, D., Shadish, W. R., & Hedges, L. V. (2012). A simple effect size estimator for single-case designs using WinBUGS. Washington DC: Society for Research on Educational Effectiveness.

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders , 67 , 1–13.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York, NY: Houghton Mifflin.

Shadish, W. R., Rindskopf, D. M., & Hedges, L. V. (2008). The state of the science in the meta-analysis of single-case experimental designs . Evidence-Based Communication Assessment and Intervention , 2 , 188–196.

Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods , 43 , 971–980. https://doi.org/10.3758/s13428-011-0111-y

Shadish, W. R., Zuur, A. F., & Sullivan, K. J. (2014). Using generalized additive (mixed) models to analyze single case designs. Journal of School Psychology , 52 , 149–178.

Shamseer, L., Sampson, M., Bukutu, C., Schmid, C. H., Nikles, J., Tate, R., … the CENT Group. (2015). CONSORT extension for reporting N-of-1 trials (CENT) 2015: Explanation and elaboration. British Medical Journal, 350, h1793.

Smith, J. D. (2012). Single-case experimental designs: A systematic review of published research and current standards. Psychological Methods , 17 , 510–550. https://doi.org/10.1037/a0029312

Solanas, A., Manolov, R., & Onghena, P. (2010). Estimating slope and level change in N = 1 designs. Behavior Modification , 34 , 195–218.

Solomon, B. G. (2014). Violations of assumptions in school-based single-case data: Implications for the selection and interpretation of effect sizes. Behavior Modification , 38 , 477–496.

Swaminathan, H., & Rogers, H. J. (2007). Statistical reform in school psychology research: A synthesis. Psychology in the Schools , 44 , 543–549.

Swaminathan, H., Rogers, H. J., & Horner, R. H. (2014). An effect size measure and Bayesian analysis of single-case designs. Journal of School Psychology , 52 , 213–230.

Tate, R. L., Perdices, M., Rosenkoetter, U., Shadish, W., Vohra, S., Barlow, D. H., … Wilson, B. (2016). The Single-Case Reporting guideline In Behavioural interventions (SCRIBE) 2016 statement. Aphasiology, 30, 862–876.

Van den Noortgate, W., & Onghena, P. (2003). Hierarchical linear models for the quantitative integration of effect sizes in single-case research. Behavior Research Methods, Instruments, & Computers , 35 , 1–10. https://doi.org/10.3758/BF03195492

Vohra, S., Shamseer, L., Sampson, M., Bukutu, C., Schmid, C. H., Tate, R., … the CENT Group. (2015). CONSORT extension for reporting N-of-1 trials (CENT) 2015 Statement. British Medical Journal, 350, h1738.

Watson, P. J., & Workman, E. A. (1981). The non-concurrent multiple baseline across-individuals design: An extension of the traditional multiple baseline design. Journal of Behavior Therapy and Experimental Psychiatry , 12 , 257–259.

Ximenes, V. M., Manolov, R., Solanas, A., & Quera, V. (2009). Factors affecting visual inference in single-case designs. Spanish Journal of Psychology , 12 , 823–832.

Download references

Author note

This research was funded by the Research Foundation–Flanders (FWO), Belgium (Grant ID: G.0593.14). The authors assure that all research presented in this article is fully original and has not been presented or made available elsewhere in any form.

Author information

Authors and affiliations.

Faculty of Psychology and Educational Sciences, KU Leuven–University of Leuven, Leuven, Belgium

Bart Michiels & Patrick Onghena

Methodology of Educational Sciences Research Group, Tiensestraat 102, Box 3762, B-3000, Leuven, Belgium

Bart Michiels

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Bart Michiels .

Electronic supplementary material

Appendix: descriptive results (means and standard deviations) of the main effects in the simulation study, rights and permissions.

Reprints and permissions

About this article

Michiels, B., Onghena, P. Randomized single-case AB phase designs: Prospects and pitfalls. Behav Res 51 , 2454–2476 (2019). https://doi.org/10.3758/s13428-018-1084-x

Download citation

Published : 18 July 2018

Issue Date : December 2019

DOI : https://doi.org/10.3758/s13428-018-1084-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Single-case experimental design
  • Interrupted time series design
  • Linear trend
  • Randomization test
  • Power analysis
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Single-Case Experimental Designs: A Systematic Review of Published Research and Current Standards

Justin d. smith.

Child and Family Center, University of Oregon

This article systematically reviews the research design and methodological characteristics of single-case experimental design (SCED) research published in peer-reviewed journals between 2000 and 2010. SCEDs provide researchers with a flexible and viable alternative to group designs with large sample sizes. However, methodological challenges have precluded widespread implementation and acceptance of the SCED as a viable complementary methodology to the predominant group design. This article includes a description of the research design, measurement, and analysis domains distinctive to the SCED; a discussion of the results within the framework of contemporary standards and guidelines in the field; and a presentation of updated benchmarks for key characteristics (e.g., baseline sampling, method of analysis), and overall, it provides researchers and reviewers with a resource for conducting and evaluating SCED research. The results of the systematic review of 409 studies suggest that recently published SCED research is largely in accordance with contemporary criteria for experimental quality. Analytic method emerged as an area of discord. Comparison of the findings of this review with historical estimates of the use of statistical analysis indicates an upward trend, but visual analysis remains the most common analytic method and also garners the most support amongst those entities providing SCED standards. Although consensus exists along key dimensions of single-case research design and researchers appear to be practicing within these parameters, there remains a need for further evaluation of assessment and sampling techniques and data analytic methods.

The single-case experiment has a storied history in psychology dating back to the field’s founders: Fechner (1889) , Watson (1925) , and Skinner (1938) . It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see Morgan & Morgan, 2001 ). In recent years the single-case experimental design (SCED) has been represented in the literature more often than in past decades, as is evidenced by recent reviews ( Hammond & Gast, 2010 ; Shadish & Sullivan, 2011 ), but it still languishes behind the more prominent group design in nearly all subfields of psychology. Group designs are often professed to be superior because they minimize, although do not necessarily eliminate, the major internal validity threats to drawing scientifically valid inferences from the results ( Shadish, Cook, & Campbell, 2002 ). SCEDs provide a rigorous, methodologically sound alternative method of evaluation (e.g., Barlow, Nock, & Hersen, 2008 ; Horner et al., 2005 ; Kazdin, 2010 ; Kratochwill & Levin, 2010 ; Shadish et al., 2002 ) but are often overlooked as a true experimental methodology capable of eliciting legitimate inferences (e.g., Barlow et al., 2008 ; Kazdin, 2010 ). Despite a shift in the zeitgeist from single-case experiments to group designs more than a half century ago, recent and rapid methodological advancements suggest that SCEDs are poised for resurgence.

Single case refers to the participant or cluster of participants (e.g., a classroom, hospital, or neighborhood) under investigation. In contrast to an experimental group design in which one group is compared with another, participants in a single-subject experiment research provide their own control data for the purpose of comparison in a within-subject rather than a between-subjects design. SCEDs typically involve a comparison between two experimental time periods, known as phases. This approach typically includes collecting a representative baseline phase to serve as a comparison with subsequent phases. In studies examining single subjects that are actually groups (i.e., classroom, school), there are additional threats to internal validity of the results, as noted by Kratochwill and Levin (2010) , which include setting or site effects.

The central goal of the SCED is to determine whether a causal or functional relationship exists between a researcher-manipulated independent variable (IV) and a meaningful change in the dependent variable (DV). SCEDs generally involve repeated, systematic assessment of one or more IVs and DVs over time. The DV is measured repeatedly across and within all conditions or phases of the IV. Experimental control in SCEDs includes replication of the effect either within or between participants ( Horner et al., 2005 ). Randomization is another way in which threats to internal validity can be experimentally controlled. Kratochwill and Levin (2010) recently provided multiple suggestions for adding a randomization component to SCEDs to improve the methodological rigor and internal validity of the findings.

Examination of the effectiveness of interventions is perhaps the area in which SCEDs are most well represented ( Morgan & Morgan, 2001 ). Researchers in behavioral medicine and in clinical, health, educational, school, sport, rehabilitation, and counseling psychology often use SCEDs because they are particularly well suited to examining the processes and outcomes of psychological and behavioral interventions (e.g., Borckardt et al., 2008 ; Kazdin, 2010 ; Robey, Schultz, Crawford, & Sinner, 1999 ). Skepticism about the clinical utility of the randomized controlled trial (e.g., Jacobsen & Christensen, 1996 ; Wachtel, 2010 ; Westen & Bradley, 2005 ; Westen, Novotny, & Thompson-Brenner, 2004 ) has renewed researchers’ interest in SCEDs as a means to assess intervention outcomes (e.g., Borckardt et al., 2008 ; Dattilio, Edwards, & Fishman, 2010 ; Horner et al., 2005 ; Kratochwill, 2007 ; Kratochwill & Levin, 2010 ). Although SCEDs are relatively well represented in the intervention literature, it is by no means their sole home: Examples appear in nearly every subfield of psychology (e.g., Bolger, Davis, & Rafaeli, 2003 ; Piasecki, Hufford, Solham, & Trull, 2007 ; Reis & Gable, 2000 ; Shiffman, Stone, & Hufford, 2008 ; Soliday, Moore, & Lande, 2002 ). Aside from the current preference for group-based research designs, several methodological challenges have repressed the proliferation of the SCED.

Methodological Complexity

SCEDs undeniably present researchers with a complex array of methodological and research design challenges, such as establishing a representative baseline, managing the nonindependence of sequential observations (i.e., autocorrelation, serial dependence), interpreting single-subject effect sizes, analyzing the short data streams seen in many applications, and appropriately addressing the matter of missing observations. In the field of intervention research for example, Hser et al. (2001) noted that studies using SCEDs are “rare” because of the minimum number of observations that are necessary (e.g., 3–5 data points in each phase) and the complexity of available data analysis approaches. Advances in longitudinal person-based trajectory analysis (e.g., Nagin, 1999 ), structural equation modeling techniques (e.g., Lubke & Muthén, 2005 ), time-series forecasting (e.g., autoregressive integrated moving averages; Box & Jenkins, 1970 ), and statistical programs designed specifically for SCEDs (e.g., Simulation Modeling Analysis; Borckardt, 2006 ) have provided researchers with robust means of analysis, but they might not be feasible methods for the average psychological scientist.

Application of the SCED has also expanded. Today, researchers use variants of the SCED to examine complex psychological processes and the relationship between daily and momentary events in peoples’ lives and their psychological correlates. Research in nearly all subfields of psychology has begun to use daily diary and ecological momentary assessment (EMA) methods in the context of the SCED, opening the door to understanding increasingly complex psychological phenomena (see Bolger et al., 2003 ; Shiffman et al., 2008 ). In contrast to the carefully controlled laboratory experiment that dominated research in the first half of the twentieth century (e.g., Skinner, 1938 ; Watson, 1925 ), contemporary proponents advocate application of the SCED in naturalistic studies to increase the ecological validity of empirical findings (e.g., Bloom, Fisher, & Orme, 2003 ; Borckardt et al., 2008 ; Dattilio et al., 2010 ; Jacobsen & Christensen, 1996 ; Kazdin, 2008 ; Morgan & Morgan, 2001 ; Westen & Bradley, 2005 ; Westen et al., 2004 ). Recent advancements and expanded application of SCEDs indicate a need for updated design and reporting standards.

Many current benchmarks in the literature concerning key parameters of the SCED were established well before current advancements and innovations, such as the suggested minimum number of data points in the baseline phase(s), which remains a disputed area of SCED research (e.g., Center, Skiba, & Casey, 1986 ; Huitema, 1985 ; R. R. Jones, Vaught, & Weinrott, 1977 ; Sharpley, 1987 ). This article comprises (a) an examination of contemporary SCED methodological and reporting standards; (b) a systematic review of select design, measurement, and statistical characteristics of published SCED research during the past decade; and (c) a broad discussion of the critical aspects of this research to inform methodological improvements and study reporting standards. The reader will garner a fundamental understanding of what constitutes appropriate methodological soundness in single-case experimental research according to the established standards in the field, which can be used to guide the design of future studies, improve the presentation of publishable empirical findings, and inform the peer-review process. The discussion begins with the basic characteristics of the SCED, including an introduction to time-series, daily diary, and EMA strategies, and describes how current reporting and design standards apply to each of these areas of single-case research. Interweaved within this presentation are the results of a systematic review of SCED research published between 2000 and 2010 in peer-reviewed outlets and a discussion of the way in which these findings support, or differ from, existing design and reporting standards and published SCED benchmarks.

Review of Current SCED Guidelines and Reporting Standards

In contrast to experimental group comparison studies, which conform to generally well agreed upon methodological design and reporting guidelines, such as the CONSORT ( Moher, Schulz, Altman, & the CONSORT Group, 2001 ) and TREND ( Des Jarlais, Lyles, & Crepaz, 2004 ) statements for randomized and nonrandomized trials, respectively, there is comparatively much less consensus when it comes to the SCED. Until fairly recently, design and reporting guidelines for single-case experiments were almost entirely absent in the literature and were typically determined by the preferences of a research subspecialty or a particular journal’s editorial board. Factions still exist within the larger field of psychology, as can be seen in the collection of standards presented in this article, particularly in regard to data analytic methods of SCEDs, but fortunately there is budding agreement about certain design and measurement characteristics. A number of task forces, professional groups, and independent experts in the field have recently put forth guidelines; each has a relatively distinct purpose, which likely accounts for some of the discrepancies between them. In what is to be a central theme of this article, researchers are ultimately responsible for thoughtfully and synergistically combining research design, measurement, and analysis aspects of a study.

This review presents the more prominent, comprehensive, and recently established SCED standards. Six sources are discussed: (1) Single-Case Design Technical Documentation from the What Works Clearinghouse (WWC; Kratochwill et al., 2010 ); (2) the APA Division 12 Task Force on Psychological Interventions, with contributions from the Division 12 Task Force on Promotion and Dissemination of Psychological Procedures and the APA Task Force for Psychological Intervention Guidelines (DIV12; presented in Chambless & Hollon, 1998 ; Chambless & Ollendick, 2001 ), adopted and expanded by APA Division 53, the Society for Clinical Child and Adolescent Psychology ( Weisz & Hawley, 1998 , 1999 ); (3) the APA Division 16 Task Force on Evidence-Based Interventions in School Psychology (DIV16; Members of the Task Force on Evidence-Based Interventions in School Psychology. Chair: T. R. Kratochwill, 2003); (4) the National Reading Panel (NRP; National Institute of Child Health and Human Development, 2000 ); (5) the Single-Case Experimental Design Scale ( Tate et al., 2008 ); and (6) the reporting guidelines for EMA put forth by Stone & Shiffman (2002) . Although the specific purposes of each source differ somewhat, the overall aim is to provide researchers and reviewers with agreed-upon criteria to be used in the conduct and evaluation of SCED research. The standards provided by WWC, DIV12, DIV16, and the NRP represent the efforts of task forces. The Tate et al. scale was selected for inclusion in this review because it represents perhaps the only psychometrically validated tool for assessing the rigor of SCED methodology. Stone and Shiffman’s (2002) standards were intended specifically for EMA methods, but many of their criteria also apply to time-series, daily diary, and other repeated-measurement and sampling methods, making them pertinent to this article. The design, measurement, and analysis standards are presented in the later sections of this article and notable concurrences, discrepancies, strengths, and deficiencies are summarized.

Systematic Review Search Procedures and Selection Criteria

Search strategy.

A comprehensive search strategy of SCEDs was performed to identify studies published in peer-reviewed journals meeting a priori search and inclusion criteria. First, a computer-based PsycINFO search of articles published between 2000 and 2010 (search conducted in July 2011) was conducted that used the following primary key terms and phrases that appeared anywhere in the article (asterisks denote that any characters/letters can follow the last character of the search term): alternating treatment design, changing criterion design, experimental case*, multiple baseline design, replicated single-case design, simultaneous treatment design, time-series design. The search was limited to studies published in the English language and those appearing in peer-reviewed journals within the specified publication year range. Additional limiters of the type of article were also used in PsycINFO to increase specificity: The search was limited to include methodologies indexed as either quantitative study OR treatment outcome/randomized clinical trial and NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study.

Study selection

The author used a three-phase study selection, screening, and coding procedure to select the highest number of applicable studies. Phase 1 consisted of the initial systematic review conducted using PsycINFO, which resulted in 571 articles. In Phase 2, titles and abstracts were screened: Articles appearing to use a SCED were retained (451) for Phase 3, in which the author and a trained research assistant read each full-text article and entered the characteristics of interest into a database. At each phase of the screening process, studies that did not use a SCED or that either self-identified as, or were determined to be, quasi-experimental were dropped. Of the 571 original studies, 82 studies were determined to be quasi-experimental. The definition of a quasi-experimental design used in the screening procedure conforms to the descriptions provided by Kazdin (2010) and Shadish et al. (2002) regarding the necessary components of an experimental design. For example, reversal designs require a minimum of four phases (e.g., ABAB), and multiple baseline designs must demonstrate replication of the effect across at least three conditions (e.g., subjects, settings, behaviors). Sixteen studies were unavailable in full text in English, and five could not be obtained in full text and were thus dropped. The remaining articles that were not retained for review (59) were determined not to be SCED studies meeting our inclusion criteria, but had been identified in our PsycINFO search using the specified keyword and methodology terms. For this review, 409 studies were selected. The sources of the 409 reviewed studies are summarized in Table 1 . A complete bibliography of the 571 studies appearing in the initial search, with the included studies marked, is available online as an Appendix or from the author.

Journal Sources of Studies Included in the Systematic Review (N = 409)

Journal Title
45
15
14
14
13
12
12
10
10
9
9
9
9
8
8
8
8
6
6
5
5
4
4
4

Note: Each of the following journal titles contributed 1 study unless otherwise noted in parentheses: Augmentative and Alternative Communication; Acta Colombiana de Psicología; Acta Comportamentalia; Adapted Physical Activity Quarterly (2); Addiction Research and Theory; Advances in Speech Language Pathology; American Annals of the Deaf; American Journal of Education; American Journal of Occupational Therapy; American Journal of Speech-Language Pathology; The American Journal on Addictions; American Journal on Mental Retardation; Applied Ergonomics; Applied Psychophysiology and Biofeedback; Australian Journal of Guidance & Counseling; Australian Psychologist; Autism; The Behavior Analyst; The Behavior Analyst Today; Behavior Analysis in Practice (2); Behavior and Social Issues (2); Behaviour Change (2); Behavioural and Cognitive Psychotherapy; Behaviour Research and Therapy (3); Brain and Language (2); Brain Injury (2); Canadian Journal of Occupational Therapy (2); Canadian Journal of School Psychology; Career Development for Exceptional Individuals; Chinese Mental Health Journal; Clinical Linguistics and Phonetics; Clinical Psychology & Psychotherapy; Cognitive and Behavioral Practice; Cognitive Computation; Cognitive Therapy and Research; Communication Disorders Quarterly; Developmental Medicine & Child Neurology (2); Developmental Neurorehabilitation (2); Disability and Rehabilitation: An International, Multidisciplinary Journal (3); Disability and Rehabilitation: Assistive Technology; Down Syndrome: Research & Practice; Drug and Alcohol Dependence (2); Early Childhood Education Journal (2); Early Childhood Services: An Interdisciplinary Journal of Effectiveness; Educational Psychology (2); Education and Training in Autism and Developmental Disabilities; Electronic Journal of Research in Educational Psychology; Environment and Behavior (2); European Eating Disorders Review; European Journal of Sport Science; European Review of Applied Psychology; Exceptional Children; Exceptionality; Experimental and Clinical Psychopharmacology; Family & Community Health: The Journal of Health Promotion & Maintenance; Headache: The Journal of Head and Face Pain; International Journal of Behavioral Consultation and Therapy (2); International Journal of Disability; Development and Education (2); International Journal of Drug Policy; International Journal of Psychology; International Journal of Speech-Language Pathology; International Psychogeriatrics; Japanese Journal of Behavior Analysis (3); Japanese Journal of Special Education; Journal of Applied Research in Intellectual Disabilities (2); Journal of Applied Sport Psychology (3); Journal of Attention Disorders (2); Journal of Behavior Therapy and Experimental Psychiatry; Journal of Child Psychology and Psychiatry; Journal of Clinical Psychology in Medical Settings; Journal of Clinical Sport Psychology; Journal of Cognitive Psychotherapy; Journal of Consulting and Clinical Psychology (2); Journal of Deaf Studies and Deaf Education; Journal of Educational & Psychological Consultation (2); Journal of Evidence-Based Practices for Schools (2); Journal of the Experimental Analysis of Behavior (2); Journal of General Internal Medicine; Journal of Intellectual and Developmental Disabilities; Journal of Intellectual Disability Research (2); Journal of Medical Speech-Language Pathology; Journal of Neurology, Neurosurgery & Psychiatry; Journal of Paediatrics and Child Health; Journal of Prevention and Intervention in the Community; Journal of Safety Research; Journal of School Psychology (3); The Journal of Socio-Economics; The Journal of Special Education; Journal of Speech, Language, and Hearing Research (2); Journal of Sport Behavior; Journal of Substance Abuse Treatment; Journal of the International Neuropsychological Society; Journal of Traumatic Stress; The Journals of Gerontology: Series B: Psychological Sciences and Social Sciences; Language, Speech, and Hearing Services in Schools; Learning Disabilities Research & Practice (2); Learning Disability Quarterly (2); Music Therapy Perspectives; Neurorehabilitation and Neural Repair; Neuropsychological Rehabilitation (2); Pain; Physical Education and Sport Pedagogy (2); Preventive Medicine: An International Journal Devoted to Practice and Theory; Psychological Assessment; Psychological Medicine: A Journal of Research in Psychiatry and the Allied Sciences; The Psychological Record; Reading and Writing; Remedial and Special Education (3); Research and Practice for Persons with Severe Disabilities (2); Restorative Neurology and Neuroscience; School Psychology International; Seminars in Speech and Language; Sleep and Hypnosis; School Psychology Quarterly; Social Work in Health Care; The Sport Psychologist (3); Therapeutic Recreation Journal (2); The Volta Review; Work: Journal of Prevention, Assessment & Rehabilitation.

Coding criteria amplifications

A comprehensive description of the coding criteria for each category in this review is available from the author by request. The primary coding criteria are described here and in later sections of this article.

  • Research design was classified into one of the types discussed later in the section titled Predominant Single-Case Experimental Designs on the basis of the authors’ stated design type. Secondary research designs were then coded when applicable (i.e., mixed designs). Distinctions between primary and secondary research designs were made based on the authors’ description of their study. For example, if an author described the study as a “multiple baseline design with time-series measurement,” the primary research design would be coded as being multiple baseline, and time-series would be coded as the secondary research design.
  • Observer ratings were coded as present when observational coding procedures were described and/or the results of a test of interobserver agreement were reported.
  • Interrater reliability for observer ratings was coded as present in any case in which percent agreement, alpha, kappa, or another appropriate statistic was reported, regardless of the amount of the total data that were examined for agreement.
  • Daily diary, daily self-report, and EMA codes were given when authors explicitly described these procedures in the text by name. Coders did not infer the use of these measurement strategies.
  • The number of baseline observations was either taken directly from the figures provided in text or was simply counted in graphical displays of the data when this was determined to be a reliable approach. In some cases, it was not possible to reliably determine the number of baseline data points from the graphical display of data, in which case, the “unavailable” code was assigned. Similarly, the “unavailable” code was assigned when the number of observations was either unreported or ambiguous, or only a range was provided and thus no mean could be determined. Similarly, the mean number of baseline observations was calculated for each study prior to further descriptive statistical analyses because a number of studies reported means only.
  • The coding of the analytic method used in the reviewed studies is discussed later in the section titled Discussion of Review Results and Coding of Analytic Methods .

Results of the Systematic Review

Descriptive statistics of the design, measurement, and analysis characteristics of the reviewed studies are presented in Table 2 . The results and their implications are discussed in the relevant sections throughout the remainder of the article.

Descriptive Statistics of Reviewed SCED Characteristics

SubjectsObserver ratingsDiary/EMABaseline observations Method of analysis (%)
M Range%IRR%Mean RangeVisualStatisticalVisual & statisticalNot reported
Research design
 •Alternating condition264.773.341–1784.695.53.88.449.502–3923.17.719.246.2
 •Changing/shifting criterion181.941.061–477.885.70.05.292.932–1027.8
 •Multiple baseline/combined series2837.2918.081–20075.698.17.110.408.842–8921.613.46.455.8
 •Reversal70 6.6410.641–7578.6100.04.311.6913.781–7217.112.95.762.9
 •Simultaneous condition2 850.0100.00.02.0050.050.00.00.0
•Time-series10 26.7835.432–11450.040.010.06.212.593–100.070.030.00.0
 Mixed designs
  •Multiple baseline with reversal126.898.241–3292.9100.07.113.019.593–3314.321.40.064.3
  •Multiple baseline with changing criterion63.171.331–583.380.016.711.009.615–30
  •Multiple baseline with time-series65.001.793–816.7100.050.017.3015.684–420.066.716.716.7
Total of reviewed studies4096.6314.611–20076.097.16.110.229.591–8920.813.97.352.3

Note. % refers to the proportion of reviewed studies that satisfied criteria for this code: For example, the percent of studies reporting observer ratings.

Discussion of the Systematic Review Results in Context

The SCED is a very flexible methodology and has many variants. Those mentioned here are the building blocks from which other designs are then derived. For those readers interested in the nuances of each design, Barlow et al., (2008) ; Franklin, Allison, and Gorman (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992) , among others, provide cogent, in-depth discussions. Identifying the appropriate SCED depends upon many factors, including the specifics of the IV, the setting in which the study will be conducted, participant characteristics, the desired or hypothesized outcomes, and the research question(s). Similarly, the researcher’s selection of measurement and analysis techniques is determined by these factors.

Predominant Single-Case Experimental Designs

Alternating/simultaneous designs (6%; primary design of the studies reviewed).

Alternating and simultaneous designs involve an iterative manipulation of the IV(s) across different phases to show that changes in the DV vary systematically as a function of manipulating the IV(s). In these multielement designs, the researcher has the option to alternate the introduction of two or more IVs or present two or more IVs at the same time. In the alternating variation, the researcher is able to determine the relative impact of two different IVs on the DV, when all other conditions are held constant. Another variation of this design is to alternate IVs across various conditions that could be related to the DV (e.g., class period, interventionist). Similarly, the simultaneous design would occur when the IVs were presented at the same time within the same phase of the study.

Changing criterion design (4%)

Changing criterion designs are used to demonstrate a gradual change in the DV over the course of the phase involving the active manipulation of the IV. Criteria indicating that a change has occurred happen in a step-wise manner, in which the criterion shifts as the participant responds to the presence of the manipulated IV. The changing criterion design is particularly useful in applied intervention research for a number of reasons. The IV is continuous and never withdrawn, unlike the strategy used in a reversal design. This is particularly important in situations where removal of a psychological intervention would be either detrimental or dangerous to the participant, or would be otherwise unfeasible or unethical. The multiple baseline design also does not withdraw intervention, but it requires replicating the effects of the intervention across participants, settings, or situations. A changing criterion design can be accomplished with one participant in one setting without withholding or withdrawing treatment.

Multiple baseline/combined series design (69%)

The multiple baseline or combined series design can be used to test within-subject change across conditions and often involves multiple participants in a replication context. The multiple baseline design is quite simple in many ways, essentially consisting of a number of repeated, miniature AB experiments or variations thereof. Introduction of the IV is staggered temporally across multiple participants or across multiple within-subject conditions, which allows the researcher to demonstrate that changes in the DV reliably occur only when the IV is introduced, thus controlling for the effects of extraneous factors. Multiple baseline designs can be used both within and across units (i.e., persons or groups of persons). When the baseline phase of each subject begins simultaneously, it is called a concurrent multiple baseline design. In a nonconcurrent variation, baseline periods across subjects begin at different points in time. The multiple baseline design is useful in many settings in which withdrawal of the IV would not be appropriate or when introduction of the IV is hypothesized to result in permanent change that would not reverse when the IV is withdrawn. The major drawback of this design is that the IV must be initially withheld for a period of time to ensure different starting points across the different units in the baseline phase. Depending upon the nature of the research questions, withholding an IV, such as a treatment, could be potentially detrimental to participants.

Reversal designs (17%)

Reversal designs are also known as introduction and withdrawal and are denoted as ABAB designs in their simplest form. As the name suggests, the reversal design involves collecting a baseline measure of the DV (the first A phase), introducing the IV (the first B phase), removing the IV while continuing to assess the DV (the second A phase), and then reintroducing the IV (the second B phase). This pattern can be repeated as many times as is necessary to demonstrate an effect or otherwise address the research question. Reversal designs are useful when the manipulation is hypothesized to result in changes in the DV that are expected to reverse or discontinue when the manipulation is not present. Maintenance of an effect is often necessary to uphold the findings of reversal designs. The demonstration of an effect is evident in reversal designs when improvement occurs during the first manipulation phase, compared to the first baseline phase, then reverts to or approaches original baseline levels during the second baseline phase when the manipulation has been withdrawn, and then improves again when the manipulation in then reinstated. This pattern of reversal, when the manipulation is introduced and then withdrawn, is essential to attributing changes in the DV to the IV. However, maintenance of the effects in a reversal design, in which the DV is hypothesized to reverse when the IV is withdrawn, is not incompatible ( Kazdin, 2010 ). Maintenance is demonstrated by repeating introduction–withdrawal segments until improvement in the DV becomes permanent even when the IV is withdrawn. There is not always a need to demonstrate maintenance in all applications, nor is it always possible or desirable, but it is paramount in the learning and intervention research contexts.

Mixed designs (10%)

Mixed designs include a combination of more than one SCED (e.g., a reversal design embedded within a multiple baseline) or an SCED embedded within a group design (i.e., a randomized controlled trial comparing two groups of multiple baseline experiments). Mixed designs afford the researcher even greater flexibility in designing a study to address complex psychological hypotheses, but also capitalize on the strengths of the various designs. See Kazdin (2010) for a discussion of the variations and utility of mixed designs.

Related Nonexperimental Designs

Quasi-experimental designs.

In contrast to the designs previously described, all of which constitute “true experiments” ( Kazdin, 2010 ; Shadish et al., 2002 ), in quasi-experimental designs the conditions of a true experiment (e.g., active manipulation of the IV, replication of the effect) are approximated and are not readily under the control of the researcher. Because the focus of this article is on experimental designs, quasi-experiments are not discussed in detail; instead the reader is referred to Kazdin (2010) and Shadish et al. (2002) .

Ecological and naturalistic single-case designs

For a single-case design to be experimental, there must be active manipulation of the IV, but in some applications, such as those that might be used in social and personality psychology, the researcher might be interested in measuring naturally occurring phenomena and examining their temporal relationships. Thus, the researcher will not use a manipulation. An example of this type of research might be a study about the temporal relationship between alcohol consumption and depressed mood, which can be measured reliably using EMA methods. Psychotherapy process researchers also use this type of design to assess dyadic relationship dynamics between therapists and clients (e.g., Tschacher & Ramseyer, 2009 ).

Research Design Standards

Each of the reviewed standards provides some degree of direction regarding acceptable research designs. The WWC provides the most detailed and specific requirements regarding design characteristics. Those guidelines presented in Tables 3 , ​ ,4, 4 , and ​ and5 5 are consistent with the methodological rigor necessary to meet the WWC distinction “meets standards.” The WWC also provides less-stringent standards for a “meets standards with reservations” distinction. When minimum criteria in the design, measurement, or analysis sections of a study are not met, it is rated “does not meet standards” ( Kratochwill et al., 2010 ). Many SCEDs are acceptable within the standards of DIV12, DIV16, NRP, and in the Tate et al. SCED scale. DIV12 specifies that replication occurs across a minimum of three successive cases, which differs from the WWC specifications, which allow for three replications within a single-subject design but does not necessarily need to be across multiple subjects. DIV16 does not require, but seems to prefer, a multiple baseline design with a between-subject replication. Tate et al. state that the “design allows for the examination of cause and effect relationships to demonstrate efficacy” (p. 400, 2008). Determining whether or not a design meets this requirement is left up to the evaluator, who might then refer to one of the other standards or another source for direction.

Research Design Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Experimental manipulation (independent variable; IV)The independent variable (i.e., the intervention) must be systematically manipulated as determined by the researcherNeed a well-defined and replicable intervention for a specific disorder, problem behavior, or conditionSpecified intervention according to the classification systemSpecified interventionScale was designed to assess the quality of interventions; thus, an intervention is requiredManipulation in EMA is concerned with the sampling procedure of the study (see Measurement and Assessment table for more information)
2. Research designs
 General guidelinesAt least 3 attempts to demonstrate an effect at 3 different points in time or with 3 different phase repetitionsMany research designs are acceptable beyond those mentionedThe stage of the intervention program must be specified (see )The design allows for the examination of cause and effect to demonstrate efficacyEMA is almost entirely concerned with measurement of variables of interest; thus, the design of the study is determined solely by the research question(s)
 Reversal (e.g., ABAB)Minimum of 4 A and B phases(Mentioned as acceptable. See Analysis table for specific guidelines)Mentioned as acceptableN/AMentioned as acceptableN/A
 Multiple baseline/combined seriesAt least 3 baseline conditionsAt least 3 different, successive subjectsBoth within and between subjects
Considered the strongest because replication occurs across individuals
Single-subject or aggregated subjectsMentioned as acceptableN/A
 Alternating treatmentAt least 3 alternating treatments compared with a baseline condition or two alternating treatments compared with each otherN/AMentioned as acceptableN/AMentioned as acceptableN/A
 Simultaneous treatmentSame as for alternating treatment designsN/AMentioned as acceptableN/AMentioned as acceptableN/A
 Changing/shifting criterionAt least 3 different criteriaN/AN/AN/AN/AN/A
 Mixed designsN/AN/AMentioned as acceptableN/AN/AN/A
 Quasi-experimentalN/AN/AN/AMentioned as acceptableN/A
3. Baseline (see also Measurement and Assessment Standards)Minimum of 3 data pointsMinimum of 3 data pointsMinimum of 3 data points, although more observations are preferredNo minimum specifiedNo minimum (“sufficient sampling of behavior occurred pretreatment”)N/A
4. Randomization specifications providedN/AN/AYesYesN/AN/A

Measurement and Assessment Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Dependent variable (DV)
 Selection of DVN/A≥ 3 clinically important behaviors that are relatively independentOutcome measures that produce reliable scores (validity of measure reported)Standardized or investigator-constructed outcomes measures (report reliability)Measure behaviors that are the target of the interventionDetermined by research question(s)
 Assessor(s)/reporter(s)More than one (self-report not acceptable)N/AMultisource (not always applicable)N/AIndependent (implied minimum of 2)Determined by research question(s)
 Interrater reliabilityOn at least 20% of the data in each phase and in each condition

Must meet minimal established thresholds
N/AN/AN/AInterrater reliability is reportedN/A
 Method(s) of measurement/assessmentN/AN/AMultimethod (e.g., at least 2 assessment methods to evaluate primary outcomes; not always applicable)Quantitative or qualitative measureN/ADescription of prompting, recording, participant-initiated entries, data acquisition interface (e.g., diary)
 Interval of assessmentMust be measured repeatedly over time (no minimum specified) within and across different conditions and levels of the IVN/AN/AList time points when dependent measures were assessedSampling of the targeted behavior (i.e., DV) occurs during the treatment periodDensity and schedule are reported and consistent with addressing research question(s)

Define “immediate and timely response”
 Other guidelinesRaw data record provided (represent the variability of the target behavior)
2. Baseline measurement (see also Research Design Standards in )Minimum of 3 data points across multiple phases of a reversal or multiple baseline design; 5 data points in each phase for highest rating

1 or 2 data points can be sufficient in alternating treatment designs
Minimum of 3 data points (to establish a linear trend) No minimum specifiedNo minimum (“sufficient sampling of behavior [i.e., DV] occurred pretreatment”)N/A
3. Compliance and missing data guidelinesN/AN/AN/AN/AN/ARationale for compliance decisions, rates reported, missing data criteria and actions

Analysis Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Visual analysis4-step, 6-variable procedure (based on )Acceptable (no specific guidelines or procedures offered) )N/ANot acceptable (“use statistical analyses or describe effect sizes” p. 389)N/A
2. Statistical analysis proceduresEstimating effect sizes: nonparametric and parametric approaches, multilevel modeling, and regression (recommended)Preferred when the number of data points warrants statistical procedures (no specific guidelines or procedures offered)Rely on the guidelines presented by Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999)Type not specified – report value of the effect size, type of summary statistic, and number of people providing the effect size informationSpecific statistical methods are not specified, only their presence or absence is of interest in completing the scale
3. Demonstrating an effect ABAB - stable baseline established during first A period, data must show improvement during the first B period, reversal or leveling of improvement during the second A period, and resumed improvement in the second B period (no other guidelines offered) N/AN/AN/A
4. Replication N/AReplication occurs across subjects, therapists, or settingsN/A

The Stone and Shiffman (2002) standards for EMA are concerned almost entirely with the reporting of measurement characteristics and less so with research design. One way in which these standards differ from those of other sources is in the active manipulation of the IV. Many research questions in EMA, daily diary, and time-series designs are concerned with naturally occurring phenomena, and a researcher manipulation would run counter to this aim. The EMA standards become important when selecting an appropriate measurement strategy within the SCED. In EMA applications, as is also true in some other time-series and daily diary designs, researcher manipulation occurs as a function of the sampling interval in which DVs of interest are measured according to fixed time schedules (e.g., reporting occurs at the end of each day), random time schedules (e.g., the data collection device prompts the participant to respond at random intervals throughout the day), or on an event-based schedule (e.g., reporting occurs after a specified event takes place).

Measurement

The basic measurement requirement of the SCED is a repeated assessment of the DV across each phase of the design in order to draw valid inferences regarding the effect of the IV on the DV. In other applications, such as those used by personality and social psychology researchers to study various human phenomena ( Bolger et al., 2003 ; Reis & Gable, 2000 ), sampling strategies vary widely depending on the topic area under investigation. Regardless of the research area, SCEDs are most typically concerned with within-person change and processes and involve a time-based strategy, most commonly to assess global daily averages or peak daily levels of the DV. Many sampling strategies, such as time-series, in which reporting occurs at uniform intervals or on event-based, fixed, or variable schedules, are also appropriate measurement methods and are common in psychological research (see Bolger et al., 2003 ).

Repeated-measurement methods permit the natural, even spontaneous, reporting of information ( Reis, 1994 ), which reduces the biases of retrospection by minimizing the amount of time elapsed between an experience and the account of this experience ( Bolger et al., 2003 ). Shiffman et al. (2008) aptly noted that the majority of research in the field of psychology relies heavily on retrospective assessment measures, even though retrospective reports have been found to be susceptible to state-congruent recall (e.g., Bower, 1981 ) and a tendency to report peak levels of the experience instead of giving credence to temporal fluctuations ( Redelmeier & Kahneman, 1996 ; Stone, Broderick, Kaell, Deles-Paul, & Porter, 2000 ). Furthermore, Shiffman et al. (1997) demonstrated that subjective aggregate accounts were a poor fit to daily reported experiences, which can be attributed to reductions in measurement error resulting in increased validity and reliability of the daily reports.

The necessity of measuring at least one DV repeatedly means that the selected assessment method, instrument, and/or construct must be sensitive to change over time and be capable of reliably and validly capturing change. Horner et al. (2005) discusses the important features of outcome measures selected for use in these types of designs. Kazdin (2010) suggests that measures be dimensional, which can more readily detect effects than categorical and binary measures. Although using an established measure or scale, such as the Outcome Questionnaire System ( M. J. Lambert, Hansen, & Harmon, 2010 ), provides empirically validated items for assessing various outcomes, most measure validation studies conducted on this type of instrument involve between-subject designs, which is no guarantee that these measures are reliable and valid for assessing within-person variability. Borsboom, Mellenbergh, and van Heerden (2003) suggest that researchers adapting validated measures should consider whether the items they propose using have a factor structure within subjects similar to that obtained between subjects. This is one of the reasons that SCEDs often use observational assessments from multiple sources and report the interrater reliability of the measure. Self-report measures are acceptable practice in some circles, but generally additional assessment methods or informants are necessary to uphold the highest methodological standards. The results of this review indicate that the majority of studies include observational measurement (76.0%). Within those studies, nearly all (97.1%) reported interrater reliability procedures and results. The results within each design were similar, with the exception of time-series designs, which used observer ratings in only half of the reviewed studies.

Time-series

Time-series designs are defined by repeated measurement of variables of interest over a period of time ( Box & Jenkins, 1970 ). Time-series measurement most often occurs in uniform intervals; however, this is no longer a constraint of time-series designs (see Harvey, 2001 ). Although uniform interval reporting is not necessary in SCED research, repeated measures often occur at uniform intervals, such as once each day or each week, which constitutes a time-series design. The time-series design has been used in various basic science applications ( Scollon, Kim-Pietro, & Diener, 2003 ) across nearly all subspecialties in psychology (e.g., Bolger et al., 2003 ; Piasecki et al., 2007 ; for a review, see Reis & Gable, 2000 ; Soliday et al., 2002 ). The basic time-series formula for a two-phase (AB) data stream is presented in Equation 1 . In this formula α represents the step function of the data stream; S represents the change between the first and second phases, which is also the intercept in a two-phase data stream and a step function being 0 at times i = 1, 2, 3…n1 and 1 at times i = n1+1, n1+2, n1+3…n; n 1 is the number of observations in the baseline phase; n is the total number of data points in the data stream; i represents time; and ε i = ρε i −1 + e i , which indicates the relationship between the autoregressive function (ρ) and the distribution of the data in the stream.

Time-series formulas become increasingly complex when seasonality and autoregressive processes are modeled in the analytic procedures, but these are rarely of concern for short time-series data streams in SCEDs. For a detailed description of other time-series design and analysis issues, see Borckardt et al. (2008) , Box and Jenkins (1970) , Crosbie (1993) , R. R. Jones et al. (1977) , and Velicer and Fava (2003) .

Time-series and other repeated-measures methodologies also enable examination of temporal effects. Borckardt et al. (2008) and others have noted that time-series designs have the potential to reveal how change occurs, not simply if it occurs. This distinction is what most interested Skinner (1938) , but it often falls below the purview of today’s researchers in favor of group designs, which Skinner felt obscured the process of change. In intervention and psychopathology research, time-series designs can assess mediators of change ( Doss & Atkins, 2006 ), treatment processes ( Stout, 2007 ; Tschacher & Ramseyer, 2009 ), and the relationship between psychological symptoms (e.g., Alloy, Just, & Panzarella, 1997 ; Hanson & Chen, 2010 ; Oslin, Cary, Slaymaker, Colleran, & Blow, 2009 ), and might be capable of revealing mechanisms of change ( Kazdin, 2007 , 2009 , 2010 ). Between- and within-subject SCED designs with repeated measurements enable researchers to examine similarities and differences in the course of change, both during and as a result of manipulating an IV. Temporal effects have been largely overlooked in many areas of psychological science ( Bolger et al., 2003 ): Examining temporal relationships is sorely needed to further our understanding of the etiology and amplification of numerous psychological phenomena.

Time-series studies were very infrequently found in this literature search (2%). Time-series studies traditionally occur in subfields of psychology in which single-case research is not often used (e.g., personality, physiological/biological). Recent advances in methods for collecting and analyzing time-series data (e.g., Borckardt et al., 2008 ) could expand the use of time-series methodology in the SCED community. One problem with drawing firm conclusions from this particular review finding is a semantic factor: Time-series is a specific term reserved for measurement occurring at a uniform interval. However, SCED research appears to not yet have adopted this language when referring to data collected in this fashion. When time-series data analytic methods are not used, the matter of measurement interval is of less importance and might not need to be specified or described as a time-series. An interesting extension of this work would be to examine SCED research that used time-series measurement strategies but did not label it as such. This is important because then it could be determined how many SCEDs could be analyzed with time-series statistical methods.

Daily diary and ecological momentary assessment methods

EMA and daily diary approaches represent methodological procedures for collecting repeated measurements in time-series and non-time-series experiments, which are also known as experience sampling. Presenting an in-depth discussion of the nuances of these sampling techniques is well beyond the scope of this paper. The reader is referred to the following review articles: daily diary ( Bolger et al., 2003 ; Reis & Gable, 2000 ; Thiele, Laireiter, & Baumann, 2002 ), and EMA ( Shiffman et al., 2008 ). Experience sampling in psychology has burgeoned in the past two decades as technological advances have permitted more precise and immediate reporting by participants (e.g., Internet-based, two-way pagers, cellular telephones, handheld computers) than do paper and pencil methods (for reviews see Barrett & Barrett, 2001 ; Shiffman & Stone, 1998 ). Both methods have practical limitations and advantages. For example, electronic methods are more costly and may exclude certain subjects from participating in the study, either because they do not have access to the necessary technology or they do not have the familiarity or savvy to successfully complete reporting. Electronic data collection methods enable the researcher to prompt responses at random or predetermined intervals and also accurately assess compliance. Paper and pencil methods have been criticized for their inability to reliably track respondents’ compliance: Palermo, Valenzuela, and Stork (2004) found better compliance with electronic diaries than with paper and pencil. On the other hand, Green, Rafaeli, Bolger, Shrout, & Reis (2006) demonstrated the psychometric data structure equivalence between these two methods, suggesting that the data collected in either method will yield similar statistical results given comparable compliance rates.

Daily diary/daily self-report and EMA measurement were somewhat rarely represented in this review, occurring in only 6.1% of the total studies. EMA methods had been used in only one of the reviewed studies. The recent proliferation of EMA and daily diary studies in psychology reported by others ( Bolger et al., 2003 ; Piasecki et al., 2007 ; Shiffman et al., 2008 ) suggests that these methods have not yet reached SCED researchers, which could in part have resulted from the long-held supremacy of observational measurement in fields that commonly practice single-case research.

Measurement Standards

As was previously mentioned, measurement in SCEDs requires the reliable assessment of change over time. As illustrated in Table 4 , DIV16 and the NRP explicitly require that reliability of all measures be reported. DIV12 provides little direction in the selection of the measurement instrument, except to require that three or more clinically important behaviors with relative independence be assessed. Similarly, the only item concerned with measurement on the Tate et al. scale specifies assessing behaviors consistent with the target of the intervention. The WWC and the Tate et al. scale require at least two independent assessors of the DV and that interrater reliability meeting minimum established thresholds be reported. Furthermore, WWC requires that interrater reliability be assessed on at least 20% of the data in each phase and in each condition. DIV16 expects that assessment of the outcome measures will be multisource and multimethod, when applicable. The interval of measurement is not specified by any of the reviewed sources. The WWC and the Tate et al. scale require that DVs be measured repeatedly across phases (e.g., baseline and treatment), which is a typical requirement of a SCED. The NRP asks that the time points at which DV measurement occurred be reported.

The baseline measurement represents one of the most crucial design elements of the SCED. Because subjects provide their own data for comparison, gathering a representative, stable sampling of behavior before manipulating the IV is essential to accurately inferring an effect. Some researchers have reported the typical length of the baseline period to range from 3 to 12 observations in intervention research applications (e.g., Center et al., 1986 ; Huitema, 1985 ; R. R. Jones et al., 1977 ; Sharpley, 1987 ); Huitema’s (1985) review of 881 experiments published in the Journal of Applied Behavior Analysis resulted in a modal number of three to four baseline points. Center et al. (1986) suggested five as the minimum number of baseline measurements needed to accurately estimate autocorrelation. Longer baseline periods suggest a greater likelihood of a representative measurement of the DVs, which has been found to increase the validity of the effects and reduce bias resulting from autocorrelation ( Huitema & McKean, 1994 ). The results of this review are largely consistent with those of previous researchers: The mean number of baseline observations was found to be 10.22 ( SD = 9.59), and 6 was the modal number of observations. Baseline data were available in 77.8% of the reviewed studies. Although the baseline assessment has tremendous bearing on the results of a SCED study, it was often difficult to locate the exact number of data points. Similarly, the number of data points assessed across all phases of the study were not easily identified.

The WWC, DIV12, and DIV16 agree that a minimum of three data points during the baseline is necessary. However, to receive the highest rating by the WWC, five data points are necessary in each phase, including the baseline and any subsequent withdrawal baselines as would occur in a reversal design. DIV16 explicitly states that more than three points are preferred and further stipulates that the baseline must demonstrate stability (i.e., limited variability), absence of overlap between the baseline and other phases, absence of a trend, and that the level of the baseline measurement is severe enough to warrant intervention; each of these aspects of the data is important in inferential accuracy. Detrending techniques can be used to address baseline data trend. The integration option in ARIMA-based modeling and the empirical mode decomposition method ( Wu, Huang, Long, & Peng, 2007 ) are two sophisticated detrending techniques. In regression-based analytic methods, detrending can be accomplished by simply regressing each variable in the model on time (i.e., the residuals become the detrended series), which is analogous to adding a linear, exponential, or quadratic term to the regression equation.

NRP does not provide a minimum for data points, nor does the Tate et al. scale, which requires only a sufficient sampling of baseline behavior. Although the mean and modal number of baseline observations is well within these parameters, seven (1.7%) studies reported mean baselines of less than three data points.

Establishing a uniform minimum number of required baseline observations would provide researchers and reviewers with only a starting guide. The baseline phase is important in SCED research because it establishes a trend that can then be compared with that of subsequent phases. Although a minimum number of observations might be required to meet standards, many more might be necessary to establish a trend when there is variability and trends in the direction of the expected effect. The selected data analytic approach also has some bearing on the number of necessary baseline observations. This is discussed further in the Analysis section.

Reporting of repeated measurements

Stone and Shiffman (2002) provide a comprehensive set of guidelines for the reporting of EMA data, which can also be applied to other repeated-measurement strategies. Because the application of EMA is widespread and not confined to specific research designs, Stone and Shiffman intentionally place few restraints on researchers regarding selection of the DV and the reporter, which is determined by the research question under investigation. The methods of measurement, however, are specified in detail: Descriptions of prompting, recording of responses, participant-initiated entries, and the data acquisition interface (e.g., paper and pencil diary, PDA, cellular telephone) ought to be provided with sufficient detail for replication. Because EMA specifically, and time-series/daily diary methods similarly, are primarily concerned with the interval of assessment, Stone and Shiffman suggest reporting the density and schedule of assessment. The approach is generally determined by the nature of the research question and pragmatic considerations, such as access to electronic data collection devices at certain times of the day and participant burden. Compliance and missing data concerns are present in any longitudinal research design, but they are of particular importance in repeated-measurement applications with frequent measurement. When the research question pertains to temporal effects, compliance becomes paramount, and timely, immediate responding is necessary. For this reason, compliance decisions, rates of missing data, and missing data management techniques must be reported. The effect of missing data in time-series data streams has been the topic of recent research in the social sciences (e.g., Smith, Borckardt, & Nash, in press ; Velicer & Colby, 2005a , 2005b ). The results and implications of these and other missing data studies are discussed in the next section.

Analysis of SCED Data

Visual analysis.

Experts in the field generally agree about the majority of critical single-case experiment design and measurement characteristics. Analysis, on the other hand, is an area of significant disagreement, yet it has also received extensive recent attention and advancement. Debate regarding the appropriateness and accuracy of various methods for analyzing SCED data, the interpretation of single-case effect sizes, and other concerns vital to the validity of SCED results has been ongoing for decades, and no clear consensus has been reached. Visual analysis, following systematic procedures such as those provided by Franklin, Gorman, Beasley, and Allison (1997) and Parsonson and Baer (1978) , remains the standard by which SCED data are most commonly analyzed ( Parker, Cryer, & Byrns, 2006 ). Visual analysis can arguably be applied to all SCEDs. However, a number of baseline data characteristics must be met for effects obtained through visual analysis to be valid and reliable. The baseline phase must be relatively stable; free of significant trend, particularly in the hypothesized direction of the effect; have minimal overlap of data with subsequent phases; and have a sufficient sampling of behavior to be considered representative ( Franklin, Gorman, et al., 1997 ; Parsonson & Baer, 1978 ). The effect of baseline trend on visual analysis, and a technique to control baseline trend, are offered by Parker et al. (2006) . Kazdin (2010) suggests using statistical analysis when a trend or significant variability appears in the baseline phase, two conditions that ought to preclude the use of visual analysis techniques. Visual analysis methods are especially adept at determining intervention effects and can be of particular relevance in real-world applications (e.g., Borckardt et al., 2008 ; Kratochwill, Levin, Horner, & Swoboda, 2011 ).

However, visual analysis has its detractors. It has been shown to be inconsistent, can be affected by autocorrelation, and results in overestimation of effect (e.g., Matyas & Greenwood, 1990 ). Visual analysis as a means of estimating an effect precludes the results of SCED research from being included in meta-analysis, and also makes it very difficult to compare results to the effect sizes generated by other statistical methods. Yet, visual analysis proliferates in large part because SCED researchers are familiar with these methods and are not only generally unfamiliar with statistical approaches, but lack agreement about their appropriateness. Still, top experts in single-case analysis champion the use of statistical methods alongside visual analysis whenever it is appropriate to do so ( Kratochwill et al., 2011 ).

Statistical analysis

Statistical analysis of SCED data consists generally of an attempt to address one or more of three broad research questions: (1) Does introduction/manipulation of the IV result in statistically significant change in the level of the DV (level-change or phase-effect analysis)? (2) Does introduction/manipulation of the IV result in statistically significant change in the slope of the DV over time (slope-change analysis)? and (3) Do meaningful relationships exist between the trajectory of the DV and other potential covariates? Level- and slope-change analyses are relevant to intervention effectiveness studies and other research questions in which the IV is expected to result in changes in the DV in a particular direction. Visual analysis methods are most adept at addressing research questions pertaining to changes in level and slope (Questions 1 and 2), most often using some form of graphical representation and standardized computation of a mean level or trend line within and between each phase of interest (e.g., Horner & Spaulding, 2010 ; Kratochwill et al., 2011 ; Matyas & Greenwood, 1990 ). Research questions in other areas of psychological science might address the relationship between DVs or the slopes of DVs (Question 3). A number of sophisticated modeling approaches (e.g., cross-lag, multilevel, panel, growth mixture, latent class analysis) may be used for this type of question, and some are discussed in greater detail later in this section. However, a discussion about the nuances of this type of analysis and all their possible methods is well beyond the scope of this article.

The statistical analysis of SCEDs is a contentious issue in the field. Not only is there no agreed-upon statistical method, but the practice of statistical analysis in the context of the SCED is viewed by some as unnecessary (see Shadish, Rindskopf, & Hedges, 2008 ). Traditional trends in the prevalence of statistical analysis usage by SCED researchers are revealing: Busk & Marascuilo (1992) found that only 10% of the published single-case studies they reviewed used statistical analysis; Brossart, Parker, Olson, & Mahadevan (2006) estimated that this figure had roughly doubled by 2006. A range of concerns regarding single-case effect size calculation and interpretation is discussed in significant detail elsewhere (e.g., Campbell, 2004 ; Cohen, 1994 ; Ferron & Sentovich, 2002 ; Ferron & Ware, 1995 ; Kirk, 1996 ; Manolov & Solanas, 2008 ; Olive & Smith, 2005 ; Parker & Brossart, 2003 ; Robey et al., 1999 ; Smith et al., in press ; Velicer & Fava, 2003 ). One concern is the lack of a clearly superior method across datasets. Although statistical methods for analyzing SCEDs abound, few studies have examined their comparative performance with the same dataset. The most recent studies of this kind, performed by Brossart et al. (2006) , Campbell (2004) , Parker and Brossart (2003) , and Parker and Vannest (2009) , found that the more promising available statistical analysis methods yielded moderately different results on the same data series, which led them to conclude that each available method is equipped to adequately address only a relatively narrow spectrum of data. Given these findings, analysts need to select an appropriate model for the research questions and data structure, being mindful of how modeling results can be influenced by extraneous factors.

The current standards unfortunately provide little guidance in the way of statistical analysis options. This article presents an admittedly cursory introduction to available statistical methods; many others are not covered in this review. The following articles provide more in-depth discussion and description of other methods: Barlow et al. (2008) ; Franklin et al., (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992 , 2010 ). Shadish et al. (2008) summarize more recently developed methods. Similarly, a Special Issue of Evidence-Based Communication Assessment and Intervention (2008, Volume 2) provides articles and discussion of the more promising statistical methods for SCED analysis. An introduction to autocorrelation and its implications for statistical analysis is necessary before specific analytic methods can be discussed. It is also pertinent at this time to discuss the implications of missing data.

Autocorrelation

Many repeated measurements within a single subject or unit create a situation that most psychological researchers are unaccustomed to dealing with: autocorrelated data, which is the nonindependence of sequential observations, also known as serial dependence. Basic and advanced discussions of autocorrelation in single-subject data can be found in Borckardt et al. (2008) , Huitema (1985) , and Marshall (1980) , and discussions of autocorrelation in multilevel models can be found in Snijders and Bosker (1999) and Diggle and Liang (2001) . Along with trend and seasonal variation, autocorrelation is one example of the internal structure of repeated measurements. In the social sciences, autocorrelated data occur most naturally in the fields of physiological psychology, econometrics, and finance, where each phase of interest has potentially hundreds or even thousands of observations that are tightly packed across time (e.g., electroencephalography actuarial data, financial market indices). Applied SCED research in most areas of psychology is more likely to have measurement intervals of day, week, or hour.

Autocorrelation is a direct result of the repeated-measurement requirements of the SCED, but its effect is most noticeable and problematic when one is attempting to analyze these data. Many commonly used data analytic approaches, such as analysis of variance, assume independence of observations and can produce spurious results when the data are nonindependent. Even statistically insignificant autocorrelation estimates are generally viewed as sufficient to cause inferential bias when conventional statistics are used (e.g., Busk & Marascuilo, 1988 ; R. R. Jones et al., 1977 ; Matyas & Greenwood, 1990 ). The effect of autocorrelation on statistical inference in single-case applications has also been known for quite some time (e.g., R. R. Jones et al., 1977 ; Kanfer, 1970 ; Kazdin, 1981 ; Marshall, 1980 ). The findings of recent simulation studies of single-subject data streams indicate that autocorrelation is a nontrivial matter. For example, Manolov and Solanas (2008) determined that calculated effect sizes were linearly related to the autocorrelation of the data stream, and Smith et al. (in press) demonstrated that autocorrelation estimates in the vicinity of 0.80 negatively affect the ability to correctly infer a significant level-change effect using a standardized mean differences method. Huitema and colleagues (e.g., Huitema, 1985 ; Huitema & McKean, 1994 ) argued that autocorrelation is rarely a concern in applied research. Huitema’s methods and conclusions have been questioned and opposing data have been published (e.g., Allison & Gorman, 1993 ; Matyas & Greenwood, 1990 ; Robey et al., 1999 ), resulting in abandonment of the position that autocorrelation can be conscionably ignored without compromising the validity of the statistical procedures. Procedures for removing autocorrelation in the data stream prior to calculating effect sizes are offered as one option: One of the more promising analysis methods, autoregressive integrated moving averages (discussed later in this article), was specifically designed to remove the internal structure of time-series data, such as autocorrelation, trend, and seasonality ( Box & Jenkins, 1970 ; Tiao & Box, 1981 ).

Missing observations

Another concern inherent in repeated-measures designs is missing data. Daily diary and EMA methods are intended to reduce the risk of retrospection error by eliciting accurate, real-time information ( Bolger et al., 2003 ). However, these methods are subject to missing data as a result of honest forgetfulness, not possessing the diary collection tool at the specified time of collection, and intentional or systematic noncompliance. With paper and pencil diaries and some electronic methods, subjects might be able to complete missed entries retrospectively, defeating the temporal benefits of these assessment strategies ( Bolger et al., 2003 ). Methods of managing noncompliance through the study design and measurement methods include training the subject to use the data collection device appropriately, using technology to prompt responding and track the time of response, and providing incentives to participants for timely compliance (for additional discussion of this topic, see Bolger et al., 2003 ; Shiffman & Stone, 1998 ).

Even when efforts are made to maximize compliance during the conduct of the research, the problem of missing data is often unavoidable. Numerous approaches exist for handling missing observations in group multivariate designs (e.g., Horton & Kleinman, 2007 ; Ibrahim, Chen, Lipsitz, & Herring, 2005 ). Ragunathan (2004) and others concluded that full information and raw data maximum likelihood methods are preferable. Velicer and Colby (2005a , 2005b ) established the superiority of maximum likelihood methods over listwise deletion, mean of adjacent observations, and series mean substitution in the estimation of various critical time-series data parameters. Smith et al. (in press) extended these findings regarding the effect of missing data on inferential precision. They found that managing missing data with the EM procedure ( Dempster, Laird, & Rubin, 1977 ), a maximum likelihood algorithm, did not affect one’s ability to correctly infer a significant effect. However, lag-1 autocorrelation estimates in the vicinity of 0.80 resulted in insufficient power sensitivity (< 0.80), regardless of the proportion of missing data (10%, 20%, 30%, or 40%). 1 Although maximum likelihood methods have garnered some empirical support, methodological strategies that minimize missing data, particularly systematically missing data, are paramount to post-hoc statistical remedies.

Nonnormal distribution of data

In addition to the autocorrelated nature of SCED data, typical measurement methods also present analytic challenges. Many statistical methods, particularly those involving model finding, assume that the data are normally distributed. This is often not satisfied in SCED research when measurements involve count data, observer-rated behaviors, and other, similar metrics that result in skewed distributions. Techniques are available to manage nonnormal distributions in regression-based analysis, such as zero-inflated Poisson regression ( D. Lambert, 1992 ) and negative binomial regression ( Gardner, Mulvey, & Shaw, 1995 ), but many other statistical analysis methods do not include these sophisticated techniques. A skewed data distribution is perhaps one of the reasons Kazdin (2010) suggests not using count, categorical, or ordinal measurement methods.

Available statistical analysis methods

Following is a basic introduction to the more promising and prevalent analytic methods for SCED research. Because there is little consensus regarding the superiority of any single method, the burden unfortunately falls on the researcher to select a method capable of addressing the research question and handling the data involved in the study. Some indications and contraindications are provided for each method presented here.

Multilevel and structural equation modeling

Multilevel modeling (MLM; e.g., Schmidt, Perels, & Schmitz, 2010 ) techniques represent the state of the art among parametric approaches to SCED analysis, particularly when synthesizing SCED results ( Shadish et al., 2008 ). MLM and related latent growth curve and factor mixture methods in structural equation modeling (SEM; e.g., Lubke & Muthén, 2005 ; B. O. Muthén & Curran, 1997 ) are particularly effective for evaluating trajectories and slopes in longitudinal data and relating changes to potential covariates. MLM and related hierarchical linear models (HLM) can also illuminate the relationship between the trajectories of different variables under investigation and clarify whether or not these relationships differ amongst the subjects in the study. Time-series and cross-lag analyses can also be used in MLM and SEM ( Chow, Ho, Hamaker, & Dolan, 2010 ; du Toit & Browne, 2007 ). However, they generally require sophisticated model-fitting techniques, making them difficult for many social scientists to implement. The structure (autocorrelation) and trend of the data can also complicate many MLM methods. The common, short data streams in SCED research and the small number of subjects also present problems to MLM and SEM approaches, which were developed for data with significantly greater numbers of observations when the number of subjects is fewer, and for a greater number of participants for model-fitting purposes, particularly when there are fewer data points. Still, MLM and related techniques arguably represent the most promising analytic methods.

A number of software options 2 exist for SEM. Popular statistical packages in the social sciences provide SEM options, such as PROC CALIS in SAS ( SAS Institute Inc., 2008 ), the AMOS module ( Arbuckle, 2006 ) of SPSS ( SPSS Statistics, 2011 ), and the sempackage for R ( R Development Core Team, 2005 ), the use of which is described by Fox ( Fox, 2006 ). A number of stand-alone software options are also available for SEM applications, including Mplus ( L. K. Muthén & Muthén, 2010 ) and Stata ( StataCorp., 2011 ). Each of these programs also provides options for estimating multilevel/hierarchical models (for a review of using these programs for MLM analysis see Albright & Marinova, 2010 ). Hierarchical linear and nonlinear modeling can also be accomplished using the HLM 7 program ( Raudenbush, Bryk, & Congdon, 2011 ).

Autoregressive moving averages (ARMA; e.g., Browne & Nesselroade, 2005 ; Liu & Hudack, 1995 ; Tiao & Box, 1981 )

Two primary points have been raised regarding ARMA modeling: length of the data stream and feasibility of the modeling technique. ARMA models generally require 30–50 observations in each phase when analyzing a single-subject experiment (e.g., Borckardt et al., 2008 ; Box & Jenkins, 1970 ), which is often difficult to satisfy in applied psychological research applications. However, ARMA models in an SEM framework, such as those described by du Toit & Browne (2001) , are well suited for longitudinal panel data with few observations and many subjects. Autoregressive SEM models are also applicable under similar conditions. Model-fitting options are available in SPSS, R, and SAS via PROC ARMA.

ARMA modeling also requires considerable training in the method and rather advanced knowledge about statistical methods (e.g., Kratochwill & Levin, 1992 ). However, Brossart et al. (2006) point out that ARMA-based approaches can produce excellent results when there is no “model finding” and a simple lag-1 model, with no differencing and no moving average, is used. This approach can be taken for many SCED applications when phase- or slope-change analyses are of interest with a single, or very few, subjects. As already mentioned, this method is particularly useful when one is seeking to account for autocorrelation or other over-time variations that are not directly related to the experimental or intervention effect of interest (i.e., detrending). ARMA and other time-series analysis methods require missing data to be managed prior to analysis by means of options such as full information maximum likelihood estimation, multiple imputation, or the Kalman filter (see Box & Jenkins, 1970 ; Hamilton, 1994 ; Shumway & Stoffer, 1982 ) because listwise deletion has been shown to result in inaccurate time-series parameter estimates ( Velicer & Colby, 2005a ).

Standardized mean differences

Standardized mean differences approaches include the common Cohen’s d , Glass’s Delta, and Hedge’s g that are used in the analysis of group designs. The computational properties of mean differences approaches to SCEDs are identical to those used for group comparisons, except that the results represent within-case variation instead of the variation between groups, which suggests that the obtained effect sizes are not interpretively equivalent. The advantage of the mean differences approach is its simplicity of calculation and also its familiarity to social scientists. The primary drawback of these approaches is that they were not developed to contend with autocorrelated data. However, Manolov and Solanas (2008) reported that autocorrelation least affected effect sizes calculated using standardized mean differences approaches. To the applied-research scientist this likely represents the most accessible analytic approach, because statistical software is not required to calculate these effect sizes. The resultant effect sizes of single subject standardized mean differences analysis must be interpreted cautiously because their relation to standard effect size benchmarks, such as those provided by Cohen (1988) , is unknown. Standardized mean differences approaches are appropriate only when examining significant differences between phases of the study and cannot illuminate trajectories or relationships between variables.

Other analytic approaches

Researchers have offered other analytic methods to deal with the characteristics of SCED data. A number of methods for analyzing N -of-1 experiments have been developed. Borckardt’s Simulation Modeling Analysis (2006) program provides a method for analyzing level- and slope-change in short (<30 observations per phase; see Borckardt et al., 2008 ), autocorrelated data streams that is statistically sophisticated, yet accessible and freely available to typical psychological scientists and clinicians. A replicated single-case time-series design conducted by Smith, Handler, & Nash (2010) provides an example of SMA application. The Singwin Package, described in Bloom et al., (2003) , is a another easy-to-use parametric approach for analyzing single-case experiments. A number of nonparametric approaches have also been developed that emerged from the visual analysis tradition: Some examples include percent nonoverlapping data ( Scruggs, Mastropieri, & Casto, 1987 ) and nonoverlap of all pairs ( Parker & Vannest, 2009 ); however, these methods have come under scrutiny, and Wolery, Busick, Reichow, and Barton (2010) have suggested abandoning them altogether. Each of these methods appears to be well suited for managing specific data characteristics, but they should not be used to analyze data streams beyond their intended purpose until additional empirical research is conducted.

Combining SCED Results

Beyond the issue of single-case analysis is the matter of integrating and meta-analyzing the results of single-case experiments. SCEDs have been given short shrift in the majority of meta-analytic literature ( Littell, Corcoran, & Pillai, 2008 ; Shadish et al., 2008 ), with only a few exceptions ( Carr et al., 1999 ; Horner & Spaulding, 2010 ). Currently, few proven methods exist for integrating the results of multiple single-case experiments. Allison and Gorman (1993) and Shadish et al. (2008) present the problems associated with meta-analyzing single-case effect sizes, and W. P. Jones (2003) , Manolov and Solanas (2008) , Scruggs and Mastropieri (1998) , and Shadish et al. (2008) offer four different potential statistical solutions for this problem, none of which appear to have received consensus amongst researchers. The ability to synthesize and compare single-case effect sizes, particularly effect sizes garnered through group design research, is undoubtedly necessary to increase SCED proliferation.

Discussion of Review Results and Coding of Analytic Methods

The coding criteria for this review were quite stringent in terms of what was considered to be either visual or statistical analysis. For visual analysis to be coded as present, it was necessary for the authors to self-identify as having used a visual analysis method. In many cases, it could likely be inferred that visual analysis had been used, but it was often not specified. Similarly, statistical analysis was reserved for analytic methods that produced an effect. 3 Analyses that involved comparing magnitude of change using raw count data or percentages were not considered rigorous enough. These two narrow definitions of visual and statistical analysis contributed to the high rate of unreported analytic method, shown in Table 1 (52.3%). A better representation of the use of visual and statistical analysis would likely be the percentage of studies within those that reported a method of analysis. Under these parameters, 41.5% used visual analysis and 31.3% used statistical analysis. Included in these figures are studies that included both visual and statistical methods (11%). These findings are slightly higher than those estimated by Brossart et al. (2006) , who estimated statistical analysis is used in about 20% of SCED studies. Visual analysis continues to undoubtedly be the most prevalent method, but there appears to be a trend for increased use of statistical approaches, which is likely to only gain momentum as innovations continue.

Analysis Standards

The standards selected for inclusion in this review offer minimal direction in the way of analyzing the results of SCED research. Table 5 summarizes analysis-related information provided by the six reviewed sources for SCED standards. Visual analysis is acceptable to DV12 and DIV16, along with unspecified statistical approaches. In the WWC standards, visual analysis is the acceptable method of determining an intervention effect, with statistical analyses and randomization tests permissible as a complementary or supporting method to the results of visual analysis methods. However, the authors of the WWC standards state, “As the field reaches greater consensus about appropriate statistical analyses and quantitative effect-size measures, new standards for effect demonstration will need to be developed” ( Kratochwill et al., 2010 , p.16). The NRP and DIV12 seem to prefer statistical methods when they are warranted. The Tate at al. scale accepts only statistical analysis with the reporting of an effect size. Only the WWC and DIV16 provide guidance in the use of statistical analysis procedures: The WWC “recommends” nonparametric and parametric approaches, multilevel modeling, and regression when statistical analysis is used. DIV16 refers the reader to Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999) for direction in this matter. Statistical analysis of daily diary and EMA methods is similarly unsettled. Stone and Shiffman (2002) ask for a detailed description of the statistical procedures used, in order for the approach to be replicated and evaluated. They provide direction for analyzing aggregated and disaggregated data. They also aptly note that because many different modes of analysis exist, researchers must carefully match the analytic approach to the hypotheses being pursued.

Limitations and Future Directions

This review has a number of limitations that leave the door open for future study of SCED methodology. Publication bias is a concern in any systematic review. This is particularly true for this review because the search was limited to articles published in peer-reviewed journals. This strategy was chosen in order to inform changes in the practice of reporting and of reviewing, but it also is likely to have inflated the findings regarding the methodological rigor of the reviewed works. Inclusion of book chapters, unpublished studies, and dissertations would likely have yielded somewhat different results.

A second concern is the stringent coding criteria in regard to the analytic methods and the broad categorization into visual and statistical analytic approaches. The selection of an appropriate method for analyzing SCED data is perhaps the murkiest area of this type of research. Future reviews that evaluate the appropriateness of selected analytic strategies and provide specific decision-making guidelines for researchers would be a very useful contribution to the literature. Although six sources of standards apply to SCED research reviewed in this article, five of them were developed almost exclusively to inform psychological and behavioral intervention research. The principles of SCED research remain the same in different contexts, but there is a need for non–intervention scientists to weigh in on these standards.

Finally, this article provides a first step in the synthesis of the available SCED reporting guidelines. However, it does not resolve disagreements, nor does it purport to be a definitive source. In the future, an entity with the authority to construct such a document ought to convene and establish a foundational, adaptable, and agreed-upon set of guidelines that cuts across subspecialties but is applicable to many, if not all, areas of psychological research, which is perhaps an idealistic goal. Certain preferences will undoubtedly continue to dictate what constitutes acceptable practice in each subspecialty of psychology, but uniformity along critical dimensions will help advance SCED research.

Conclusions

The first decade of the twenty-first century has seen an upwelling of SCED research across nearly all areas of psychology. This article contributes updated benchmarks in terms of the frequency with which SCED design and methodology characteristics are used, including the number of baseline observations, assessment and measurement practices, and data analytic approaches, most of which are largely consistent with previously reported benchmarks. However, this review is much broader than those of previous research teams and also breaks down the characteristics of single-case research by the predominant design. With the recent SCED proliferation came a number of standards for the conduct and reporting of such research. This article also provides a much-needed synthesis of recent SCED standards that can inform the work of researchers, reviewers, and funding agencies conducting and evaluating single-case research, which reveals many areas of consensus as well as areas of significant disagreement. It appears that the question of where to go next is very relevant at this point in time. The majority of the research design and measurement characteristics of the SCED are reasonably well established, and the results of this review suggest general practice that is in accord with existing standards and guidelines, at least in regard to published peer-reviewed works. In general, the published literature appears to be meeting the basic design and measurement requirement to ensure adequate internal validity of SCED studies.

Consensus regarding the superiority of any one analytic method stands out as an area of divergence. Judging by the current literature and lack of consensus, researchers will need to carefully select a method that matches the research design, hypotheses, and intended conclusions of the study, while also considering the most up-to-date empirical support for the chosen analytic method, whether it be visual or statistical. In some cases the number of observations and subjects in the study will dictate which analytic methods can and cannot be used. In the case of the true N -of-1 experiment, there are relatively few sound analytic methods, and even fewer that are robust with shorter data streams (see Borckardt et al., 2008 ). As the number of observations and subjects increases, sophisticated modeling techniques, such as MLM, SEM, and ARMA, become applicable. Trends in the data and autocorrelation further obfuscate the development of a clear statistical analysis selection algorithm, which currently does not exist. Autocorrelation was rarely addressed or discussed in the articles reviewed, except when the selected statistical analysis dictated consideration. Given the empirical evidence regarding the effect of autocorrelation on visual and statistical analysis, researchers need to address this more explicitly. Missing-data considerations are similarly left out when they are unnecessary for analytic purposes. As newly devised statistical analysis approaches mature and are compared with one another for appropriateness in specific SCED applications, guidelines for statistical analysis will necessarily be revised. Similarly, empirically derived guidance, in the form of a decision tree, must be developed to ensure application of appropriate methods based on characteristics of the data and the research questions being addressed. Researchers could also benefit from tutorials and comparative reviews of different software packages: This is a needed area of future research. Powerful and reliable statistical analyses help move the SCED up the ladder of experimental designs and attenuate the view that the method applies primarily to pilot studies and idiosyncratic research questions and situations.

Another potential future advancement of SCED research comes in the area of measurement. Currently, SCED research gives significant weight to observer ratings and seems to discourage other forms of data collection methods. This is likely due to the origins of the SCED in behavioral assessment and applied behavior analysis, which remains a present-day stronghold. The dearth of EMA and diary-like sampling procedures within the SCED research reviewed, yet their ever-growing prevalence in the larger psychological research arena, highlights an area for potential expansion. Observational measurement, although reliable and valid in many contexts, is time and resource intensive and not feasible in all areas in which psychologists conduct research. It seems that numerous untapped research questions are stifled because of this measurement constraint. SCED researchers developing updated standards in the future should include guidelines for the appropriate measurement requirement of non-observer-reported data. For example, the results of this review indicate that reporting of repeated measurements, particularly the high-density type found in diary and EMA sampling strategies, ought to be more clearly spelled out, with specific attention paid to autocorrelation and trend in the data streams. In the event that SCED researchers adopt self-reported assessment strategies as viable alternatives to observation, a set of standards explicitly identifying the necessary psychometric properties of the measures and specific items used would be in order.

Along similar lines, SCED researchers could take a page from other areas of psychology that champion multimethod and multisource evaluation of primary outcomes. In this way, the long-standing tradition of observational assessment and the cutting-edge technological methods of EMA and daily diary could be married with the goal of strengthening conclusions drawn from SCED research and enhancing the validity of self-reported outcome assessment. The results of this review indicate that they rarely intersect today, and I urge SCED researchers to adopt other methods of assessment informed by time-series, daily diary, and EMA methods. The EMA standards could serve as a jumping-off point for refined measurement and assessment reporting standards in the context of multimethod SCED research.

One limitation of the current SCED standards is their relatively limited scope. To clarify, with the exception of the Stone & Shiffman EMA reporting guidelines, the other five sources of standards were developed in the context of designing and evaluating intervention research. Although this is likely to remain its patent emphasis, SCEDs are capable of addressing other pertinent research questions in the psychological sciences, and the current standards truly only roughly approximate salient crosscutting SCED characteristics. I propose developing broad SCED guidelines that address the specific design, measurement, and analysis issues in a manner that allows it to be useful across applications, as opposed to focusing solely on intervention effects. To accomplish this task, methodology experts across subspecialties in psychology would need to convene. Admittedly this is no small task.

Perhaps funding agencies will also recognize the fiscal and practical advantages of SCED research in certain areas of psychology. One example is in the field of intervention effectiveness, efficacy, and implementation research. A few exemplary studies using robust forms of SCED methodology are needed in the literature. Case-based methodologies will never supplant the group design as the gold standard in experimental applications, nor should that be the goal. Instead, SCEDs provide a viable and valid alternative experimental methodology that could stimulate new areas of research and answer questions that group designs cannot. With the astonishing number of studies emerging every year that use single-case designs and explore the methodological aspects of the design, we are poised to witness and be a part of an upsurge in the sophisticated application of the SCED. When federal grant-awarding agencies and journal editors begin to use formal standards while making funding and publication decisions, the field will benefit.

Last, for the practice of SCED research to continue and mature, graduate training programs must provide students with instruction in all areas of the SCED. This is particularly true of statistical analysis techniques that are not often taught in departments of psychology and education, where the vast majority of SCED studies seem to be conducted. It is quite the conundrum that the best available statistical analytic methods are often cited as being inaccessible to social science researchers who conduct this type of research. This need not be the case. To move the field forward, emerging scientists must be able to apply the most state-of-the-art research designs, measurement techniques, and analytic methods.

Acknowledgments

Research support for the author was provided by research training grant MH20012 from the National Institute of Mental Health, awarded to Elizabeth A. Stormshak. The author gratefully acknowledges Robert Horner and Laura Lee McIntyre, University of Oregon; Michael Nash, University of Tennessee; John Ferron, University of South Florida; the Action Editor, Lisa Harlow, and the anonymous reviewers for their thoughtful suggestions and guidance in shaping this article; Cheryl Mikkola for her editorial support; and Victoria Mollison for her assistance in the systematic review process.

Appendix. Results of Systematic Review Search and Studies Included in the Review

Psycinfo search conducted july 2011.

  • Alternating treatment design
  • Changing criterion design
  • Experimental case*
  • Multiple baseline design
  • Replicated single-case design
  • Simultaneous treatment design
  • Time-series design
  • Quantitative study OR treatment outcome/randomized clinical trial
  • NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study
  • Publication range: 2000–2010
  • Published in peer-reviewed journals
  • Available in the English Language

Bibliography

(* indicates inclusion in study: N = 409)

1 Autocorrelation estimates in this range can be caused by trends in the data streams, which creates complications in terms of detecting level-change effects. The Smith et al. (in press) study used a Monte Carlo simulation to control for trends in the data streams, but trends are likely to exist in real-world data with high lag-1 autocorrelation estimates.

2 The author makes no endorsement regarding the superiority of any statistical program or package over another by their mention or exclusion in this article. The author also has no conflicts of interest in this regard.

3 However, it should be noted that it was often very difficult to locate an actual effect size reported in studies that used statistical analysis. Although this issue would likely have added little to this review, it does inhibit the inclusion of the results in meta-analysis.

  • Albright JJ, Marinova DM. Estimating multilevel modelsuUsing SPSS, Stata, and SAS. Indiana University; 2010. Retrieved from http://www.iub.edu/%7Estatmath/stat/all/hlm/hlm.pdf . [ Google Scholar ]
  • Allison DB, Gorman BS. Calculating effect sizes for meta-analysis: The case of the single case. Behavior Research and Therapy. 1993; 31 (6):621–631. doi: 10.1016/0005-7967(93)90115-B. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alloy LB, Just N, Panzarella C. Attributional style, daily life events, and hopelessness depression: Subtype validation by prospective variability and specificity of symptoms. Cognitive Therapy Research. 1997; 21 :321–344. doi: 10.1023/A:1021878516875. [ CrossRef ] [ Google Scholar ]
  • Arbuckle JL. Amos (Version 7.0) Chicago, IL: SPSS, Inc; 2006. [ Google Scholar ]
  • Barlow DH, Nock MK, Hersen M. Single case research designs: Strategies for studying behavior change. 3. New York, NY: Allyn and Bacon; 2008. [ Google Scholar ]
  • Barrett LF, Barrett DJ. An introduction to computerized experience sampling in psychology. Social Science Computer Review. 2001; 19 (2):175–185. doi: 10.1177/089443930101900204. [ CrossRef ] [ Google Scholar ]
  • Bloom M, Fisher J, Orme JG. Evaluating practice: Guidelines for the accountable professional. 4. Boston, MA: Allyn & Bacon; 2003. [ Google Scholar ]
  • Bolger N, Davis A, Rafaeli E. Diary methods: Capturing life as it is lived. Annual Review of Psychology. 2003; 54 :579–616. doi: 10.1146/annurev.psych.54.101601.145030. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borckardt JJ. Simulation Modeling Analysis: Time series analysis program for short time series data streams (Version 8.3.3) Charleston, SC: Medical University of South Carolina; 2006. [ Google Scholar ]
  • Borckardt JJ, Nash MR, Murphy MD, Moore M, Shaw D, O’Neil P. Clinical practice as natural laboratory for psychotherapy research. American Psychologist. 2008; 63 :1–19. doi: 10.1037/0003-066X.63.2.77. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borsboom D, Mellenbergh GJ, van Heerden J. The theoretical status of latent variables. Psychological Review. 2003; 110 (2):203–219. doi: 10.1037/0033-295X.110.2.203. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bower GH. Mood and memory. American Psychologist. 1981; 36 (2):129–148. doi: 10.1037/0003-066x.36.2.129. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Box GEP, Jenkins GM. Time-series analysis: Forecasting and control. San Francisco, CA: Holden-Day; 1970. [ Google Scholar ]
  • Brossart DF, Parker RI, Olson EA, Mahadevan L. The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification. 2006; 30 (5):531–563. doi: 10.1177/0145445503261167. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Browne MW, Nesselroade JR. Representing psychological processes with dynamic factor models: Some promising uses and extensions of autoregressive moving average time series models. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics: A festschrift for Roderick P McDonald. Mahwah, NJ: Lawrence Erlbaum Associates Publishers; 2005. pp. 415–452. [ Google Scholar ]
  • Busk PL, Marascuilo LA. Statistical analysis in single-case research: Issues, procedures, and recommendations, with applications to multiple behaviors. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc; 1992. pp. 159–185. [ Google Scholar ]
  • Busk PL, Marascuilo RC. Autocorrelation in single-subject research: A counterargument to the myth of no autocorrelation. Behavioral Assessment. 1988; 10 :229–242. [ Google Scholar ]
  • Campbell JM. Statistical comparison of four effect sizes for single-subject designs. Behavior Modification. 2004; 28 (2):234–246. doi: 10.1177/0145445503259264. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carr EG, Horner RH, Turnbull AP, Marquis JG, Magito McLaughlin D, McAtee ML, Doolabh A. Positive behavior support for people with developmental disabilities: A research synthesis. Washington, DC: American Association on Mental Retardation; 1999. [ Google Scholar ]
  • Center BA, Skiba RJ, Casey A. A methodology for the quantitative synthesis of intra-subject design research. Journal of Educational Science. 1986; 19 :387–400. doi: 10.1177/002246698501900404. [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Hollon SD. Defining empirically supported therapies. Journal of Consulting and Clinical Psychology. 1998; 66 (1):7–18. doi: 10.1037/0022-006X.66.1.7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Ollendick TH. Empirically supported psychological interventions: Controversies and evidence. Annual Review of Psychology. 2001; 52 :685–716. doi: 10.1146/annurev.psych.52.1.685. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chow S-M, Ho M-hR, Hamaker EL, Dolan CV. Equivalence and differences between structural equation modeling and state-space modeling techniques. Structural Equation Modeling. 2010; 17 (2):303–332. doi: 10.1080/10705511003661553. [ CrossRef ] [ Google Scholar ]
  • Cohen J. Statistical power analysis for the bahavioral sciences. 2. Hillsdale, NJ: Erlbaum; 1988. [ Google Scholar ]
  • Cohen J. The earth is round (p < .05) American Psychologist. 1994; 49 :997–1003. doi: 10.1037/0003-066X.49.12.997. [ CrossRef ] [ Google Scholar ]
  • Crosbie J. Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology. 1993; 61 (6):966–974. doi: 10.1037/0022-006X.61.6.966. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dattilio FM, Edwards JA, Fishman DB. Case studies within a mixed methods paradigm: Toward a resolution of the alienation between researcher and practitioner in psychotherapy research. Psychotherapy: Theory, Research, Practice, Training. 2010; 47 (4):427–441. doi: 10.1037/a0021181. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dempster A, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977; 39 (1):1–38. [ Google Scholar ]
  • Des Jarlais DC, Lyles C, Crepaz N. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. American Journal of Public Health. 2004; 94 (3):361–366. doi: 10.2105/ajph.94.3.361. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Diggle P, Liang KY. Analyses of longitudinal data. New York: Oxford University Press; 2001. [ Google Scholar ]
  • Doss BD, Atkins DC. Investigating treatment mediators when simple random assignment to a control group is not possible. Clinical Psychology: Science and Practice. 2006; 13 (4):321–336. doi: 10.1111/j.1468-2850.2006.00045.x. [ CrossRef ] [ Google Scholar ]
  • du Toit SHC, Browne MW. The covariance structure of a vector ARMA time series. In: Cudeck R, du Toit SHC, Sörbom D, editors. Structural equation modeling: Present and future. Lincolnwood, IL: Scientific Software International; 2001. pp. 279–314. [ Google Scholar ]
  • du Toit SHC, Browne MW. Structural equation modeling of multivariate time series. Multivariate Behavioral Research. 2007; 42 :67–101. doi: 10.1080/00273170701340953. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fechner GT. Elemente der psychophysik [Elements of psychophysics] Leipzig, Germany: Breitkopf & Hartel; 1889. [ Google Scholar ]
  • Ferron J, Sentovich C. Statistical power of randomization tests used with multiple-baseline designs. The Journal of Experimental Education. 2002; 70 :165–178. doi: 10.1080/00220970209599504. [ CrossRef ] [ Google Scholar ]
  • Ferron J, Ware W. Analyzing single-case data: The power of randomization tests. The Journal of Experimental Education. 1995; 63 :167–178. [ Google Scholar ]
  • Fox J. TEACHER’S CORNER: Structural equation modeling with the sem package in R. Structural Equation Modeling: A Multidisciplinary Journal. 2006; 13 (3):465–486. doi: 10.1207/s15328007sem1303_7. [ CrossRef ] [ Google Scholar ]
  • Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahwah, NJ: Lawrence Erlbaum Associates; 1997. [ Google Scholar ]
  • Franklin RD, Gorman BS, Beasley TM, Allison DB. Graphical display and visual analysis. In: Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahway, NJ: Lawrence Erlbaum Associates, Publishers; 1997. pp. 119–158. [ Google Scholar ]
  • Gardner W, Mulvey EP, Shaw EC. Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin. 1995; 118 (3):392–404. doi: 10.1037/0033-2909.118.3.392. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Green AS, Rafaeli E, Bolger N, Shrout PE, Reis HT. Paper or plastic? Data equivalence in paper and electronic diaries. Psychological Methods. 2006; 11 (1):87–105. doi: 10.1037/1082-989X.11.1.87. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hamilton JD. Time series analysis. Princeton, NJ: Princeton University Press; 1994. [ Google Scholar ]
  • Hammond D, Gast DL. Descriptive analysis of single-subject research designs: 1983–2007. Education and Training in Autism and Developmental Disabilities. 2010; 45 :187–202. [ Google Scholar ]
  • Hanson MD, Chen E. Daily stress, cortisol, and sleep: The moderating role of childhood psychosocial environments. Health Psychology. 2010; 29 (4):394–402. doi: 10.1037/a0019879. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harvey AC. Forecasting, structural time series models and the Kalman filter. Cambridge, MA: Cambridge University Press; 2001. [ Google Scholar ]
  • Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single-subject research to identify evidence-based practice in special education. Exceptional Children. 2005; 71 :165–179. [ Google Scholar ]
  • Horner RH, Spaulding S. Single-case research designs. In: Salkind NJ, editor. Encyclopedia of research design. Thousand Oaks, CA: Sage Publications; 2010. [ Google Scholar ]
  • Horton NJ, Kleinman KP. Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician. 2007; 61 (1):79–90. doi: 10.1198/000313007X172556. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hser Y, Shen H, Chou C, Messer SC, Anglin MD. Analytic approaches for assessing long-term treatment effects. Evaluation Review. 2001; 25 (2):233–262. doi: 10.1177/0193841X0102500206. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huitema BE. Autocorrelation in applied behavior analysis: A myth. Behavioral Assessment. 1985; 7 (2):107–118. [ Google Scholar ]
  • Huitema BE, McKean JW. Reduced bias autocorrelation estimation: Three jackknife methods. Educational and Psychological Measurement. 1994; 54 (3):654–665. doi: 10.1177/0013164494054003008. [ CrossRef ] [ Google Scholar ]
  • Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH. Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association. 2005; 100 (469):332–346. doi: 10.1198/016214504000001844. [ CrossRef ] [ Google Scholar ]
  • Institute of Medicine. Reducing risks for mental disorders: Frontiers for preventive intervention research. Washington, DC: National Academy Press; 1994. [ PubMed ] [ Google Scholar ]
  • Jacobsen NS, Christensen A. Studying the effectiveness of psychotherapy: How well can clinical trials do the job? American Psychologist. 1996; 51 :1031–1039. doi: 10.1037/0003-066X.51.10.1031. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones RR, Vaught RS, Weinrott MR. Time-series analysis in operant research. Journal of Behavior Analysis. 1977; 10 (1):151–166. doi: 10.1901/jaba.1977.10-151. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones WP. Single-case time series with Bayesian analysis: A practitioner’s guide. Measurement and Evaluation in Counseling and Development. 2003; 36 (28–39) [ Google Scholar ]
  • Kanfer H. Self-monitoring: Methodological limitations and clinical applications. Journal of Consulting and Clinical Psychology. 1970; 35 (2):148–152. doi: 10.1037/h0029874. [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Drawing valid inferences from case studies. Journal of Consulting and Clinical Psychology. 1981; 49 (2):183–192. doi: 10.1037/0022-006X.49.2.183. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Mediators and mechanisms of change in psychotherapy research. Annual Review of Clinical Psychology. 2007; 3 :1–27. doi: 10.1146/annurev.clinpsy.3.022806.091432. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care. American Psychologist. 2008; 63 (3):146–159. doi: 10.1037/0003-066X.63.3.146. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Understanding how and why psychotherapy leads to change. Psychotherapy Research. 2009; 19 (4):418–428. doi: 10.1080/10503300802448899. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Single-case research designs: Methods for clinical and applied settings. 2. New York, NY: Oxford University Press; 2010. [ Google Scholar ]
  • Kirk RE. Practical significance: A concept whose time has come. Educational and Psychological Measurement. 1996; 56 :746–759. doi: 10.1177/0013164496056005002. [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR. Preparing psychologists for evidence-based school practice: Lessons learned and challenges ahead. American Psychologist. 2007; 62 :829–843. doi: 10.1037/0003-066X.62.8.829. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR. Single-case designs technical documentation. 2010 Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf . Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .
  • Kratochwill TR, Levin JR. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc; 1992. [ Google Scholar ]
  • Kratochwill TR, Levin JR. Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods. 2010; 15 (2):124–144. doi: 10.1037/a0017736. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Levin JR, Horner RH, Swoboda C. Visual analysis of single-case intervention research: Conceptual and methodological considerations (WCER Working Paper No. 2011-6) 2011 Retrieved from University of Wisconsin–Madison, Wisconsin Center for Education Research website: http://www.wcer.wisc.edu/publications/workingPapers/papers.php .
  • Lambert D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics. 1992; 34 (1):1–14. [ Google Scholar ]
  • Lambert MJ, Hansen NB, Harmon SC. Developing and Delivering Practice-Based Evidence. John Wiley & Sons, Ltd; 2010. Outcome Questionnaire System (The OQ System): Development and practical applications in healthcare settings; pp. 139–154. [ Google Scholar ]
  • Littell JH, Corcoran J, Pillai VK. Systematic reviews and meta-analysis. New York: Oxford University Press; 2008. [ Google Scholar ]
  • Liu LM, Hudack GB. The SCA statistical system. Vector ARMA modeling of multiple time series. Oak Brook, IL: Scientific Computing Associates Corporation; 1995. [ Google Scholar ]
  • Lubke GH, Muthén BO. Investigating population heterogeneity with factor mixture models. Psychological Methods. 2005; 10 (1):21–39. doi: 10.1037/1082-989x.10.1.21. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manolov R, Solanas A. Comparing N = 1 effect sizes in presence of autocorrelation. Behavior Modification. 2008; 32 (6):860–875. doi: 10.1177/0145445508318866. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marshall RJ. Autocorrelation estimation of time series with randomly missing observations. Biometrika. 1980; 67 (3):567–570. doi: 10.1093/biomet/67.3.567. [ CrossRef ] [ Google Scholar ]
  • Matyas TA, Greenwood KM. Visual analysis of single-case time series: Effects of variability, serial dependence, and magnitude of intervention effects. Journal of Applied Behavior Analysis. 1990; 23 (3):341–351. doi: 10.1901/jaba.1990.23-341. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Chair Members of the Task Force on Evidence-Based Interventions in School Psychology. Procedural and coding manual for review of evidence-based interventions. 2003 Retrieved July 18, 2011 from http://www.sp-ebi.org/documents/_workingfiles/EBImanual1.pdf .
  • Moher D, Schulz KF, Altman DF the CONSORT Group. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association. 2001; 285 :1987–1991. doi: 10.1001/jama.285.15.1987. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morgan DL, Morgan RK. Single-participant research design: Bringing science to managed care. American Psychologist. 2001; 56 (2):119–127. doi: 10.1037/0003-066X.56.2.119. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Muthén BO, Curran PJ. General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods. 1997; 2 (4):371–402. doi: 10.1037/1082-989x.2.4.371. [ CrossRef ] [ Google Scholar ]
  • Muthén LK, Muthén BO. Mplus (Version 6.11) Los Angeles, CA: Muthén & Muthén; 2010. [ Google Scholar ]
  • Nagin DS. Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods. 1999; 4 (2):139–157. doi: 10.1037/1082-989x.4.2.139. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • National Institute of Child Health and Human Development. Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4769) Washington, DC: U.S. Government Printing Office; 2000. [ Google Scholar ]
  • Olive ML, Smith BW. Effect size calculations and single subject designs. Educational Psychology. 2005; 25 (2–3):313–324. doi: 10.1080/0144341042000301238. [ CrossRef ] [ Google Scholar ]
  • Oslin DW, Cary M, Slaymaker V, Colleran C, Blow FC. Daily ratings measures of alcohol craving during an inpatient stay define subtypes of alcohol addiction that predict subsequent risk for resumption of drinking. Drug and Alcohol Dependence. 2009; 103 (3):131–136. doi: 10.1016/J.Drugalcdep.2009.03.009. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Palermo TP, Valenzuela D, Stork PP. A randomized trial of electronic versus paper pain diaries in children: Impact on compliance, accuracy, and acceptability. Pain. 2004; 107 (3):213–219. doi: 10.1016/j.pain.2003.10.005. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parker RI, Brossart DF. Evaluating single-case research data: A comparison of seven statistical methods. Behavior Therapy. 2003; 34 (2):189–211. doi: 10.1016/S0005-7894(03)80013-8. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Cryer J, Byrns G. Controlling baseline trend in single case research. School Psychology Quarterly. 2006; 21 (4):418–440. doi: 10.1037/h0084131. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Vannest K. An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy. 2009; 40 (4):357–367. doi: 10.1016/j.beth.2008.10.006. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parsonson BS, Baer DM. The analysis and presentation of graphic data. In: Kratochwill TR, editor. Single subject research. New York, NY: Academic Press; 1978. pp. 101–166. [ Google Scholar ]
  • Parsonson BS, Baer DM. The visual analysis of data, and current research into the stimuli controlling it. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ; England: Lawrence Erlbaum Associates, Inc; 1992. pp. 15–40. [ Google Scholar ]
  • Piasecki TM, Hufford MR, Solham M, Trull TJ. Assessing clients in their natural environments with electronic diaries: Rationale, benefits, limitations, and barriers. Psychological Assessment. 2007; 19 (1):25–43. doi: 10.1037/1040-3590.19.1.25. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2005. [ Google Scholar ]
  • Ragunathan TE. What do we do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health. 2004; 25 :99–117. doi: 10.1146/annurev.publhealth.25.102802.124410. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Raudenbush SW, Bryk AS, Congdon R. HLM 7 Hierarchical Linear and Nonlinear Modeling. Scientific Software International, Inc; 2011. [ Google Scholar ]
  • Redelmeier DA, Kahneman D. Patients’ memories of painful medical treatments: Real-time and retrospective evaluations of two minimally invasive procedures. Pain. 1996; 66 (1):3–8. doi: 10.1016/0304-3959(96)02994-6. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Reis HT. Domains of experience: Investigating relationship processes from three perspectives. In: Erber R, Gilmore R, editors. Theoretical frameworks in personal relationships. Mahwah, NJ: Erlbaum; 1994. pp. 87–110. [ Google Scholar ]
  • Reis HT, Gable SL. Event sampling and other methods for studying everyday experience. In: Reis HT, Judd CM, editors. Handbook of research methods in social and personality psychology. New York, NY: Cambridge University Press; 2000. pp. 190–222. [ Google Scholar ]
  • Robey RR, Schultz MC, Crawford AB, Sinner CA. Single-subject clinical-outcome research: Designs, data, effect sizes, and analyses. Aphasiology. 1999; 13 (6):445–473. doi: 10.1080/026870399402028. [ CrossRef ] [ Google Scholar ]
  • Rossi PH, Freeman HE. Evaluation: A systematic approach. 5. Thousand Oaks, CA: Sage; 1993. [ Google Scholar ]
  • SAS Institute Inc. The SAS system for Windows, Version 9. Cary, NC: SAS Institute Inc; 2008. [ Google Scholar ]
  • Schmidt M, Perels F, Schmitz B. How to perform idiographic and a combination of idiographic and nomothetic approaches: A comparison of time series analyses and hierarchical linear modeling. Journal of Psychology. 2010; 218 (3):166–174. doi: 10.1027/0044-3409/a000026. [ CrossRef ] [ Google Scholar ]
  • Scollon CN, Kim-Pietro C, Diener E. Experience sampling: Promises and pitfalls, strengths and weaknesses. Assessing Well-Being. 2003; 4 :5–35. doi: 10.1007/978-90-481-2354-4_8. [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA. Summarizing single-subject research: Issues and applications. Behavior Modification. 1998; 22 (3):221–242. doi: 10.1177/01454455980223001. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA, Casto G. The quantitative synthesis of single-subject research. Remedial and Special Education. 1987; 8 (2):24–33. doi: 10.1177/074193258700800206. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin; 2002. [ Google Scholar ]
  • Shadish WR, Rindskopf DM, Hedges LV. The state of the science in the meta-analysis of single-case experimental designs. Evidence-Based Communication Assessment and Intervention. 2008; 3 :188–196. doi: 10.1080/17489530802581603. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Sullivan KJ. Characteristics of single-case designs used to assess treatment effects in 2008. Behavior Research Methods. 2011; 43 :971–980. doi: 10.3758/s13428-011-0111-y. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sharpley CF. Time-series analysis of behavioural data: An update. Behaviour Change. 1987; 4 :40–45. [ Google Scholar ]
  • Shiffman S, Hufford M, Hickcox M, Paty JA, Gnys M, Kassel JD. Remember that? A comparison of real-time versus retrospective recall of smoking lapses. Journal of Consulting and Clinical Psychology. 1997; 65 :292–300. doi: 10.1037/0022-006X.65.2.292.a. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shiffman S, Stone AA. Ecological momentary assessment: A new tool for behavioral medicine research. In: Krantz DS, Baum A, editors. Technology and methods in behavioral medicine. Mahwah, NJ: Erlbaum; 1998. pp. 117–131. [ Google Scholar ]
  • Shiffman S, Stone AA, Hufford MR. Ecological momentary assessment. Annual Review of Clinical Psychology. 2008; 4 :1–32. doi: 10.1146/annurev.clinpsy.3.022806.091415. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the EM Algorithm. Journal of Time Series Analysis. 1982; 3 (4):253–264. doi: 10.1111/j.1467-9892.1982.tb00349.x. [ CrossRef ] [ Google Scholar ]
  • Skinner BF. The behavior of organisms. New York, NY: Appleton-Century-Crofts; 1938. [ Google Scholar ]
  • Smith JD, Borckardt JJ, Nash MR. Inferential precision in single-case time-series datastreams: How well does the EM Procedure perform when missing observations occur in autocorrelated data? Behavior Therapy. doi: 10.1016/j.beth.2011.10.001. (in press) [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith JD, Handler L, Nash MR. Therapeutic Assessment for preadolescent boys with oppositional-defiant disorder: A replicated single-case time-series design. Psychological Assessment. 2010; 22 (3):593–602. doi: 10.1037/a0019697. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Snijders TAB, Bosker RJ. Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage; 1999. [ Google Scholar ]
  • Soliday E, Moore KJ, Lande MB. Daily reports and pooled time series analysis: Pediatric psychology applications. Journal of Pediatric Psychology. 2002; 27 (1):67–76. doi: 10.1093/jpepsy/27.1.67. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • SPSS Statistics. Chicago, IL: SPSS Inc; 2011. (Version 20.0.0) [ Google Scholar ]
  • StataCorp. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP; 2011. [ Google Scholar ]
  • Stone AA, Broderick JE, Kaell AT, Deles-Paul PAEG, Porter LE. Does the peak-end phenomenon observed in laboratory pain studies apply to real-world pain in rheumatoid arthritics? Journal of Pain. 2000; 1 :212–217. doi: 10.1054/jpai.2000.7568. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stone AA, Shiffman S. Capturing momentary, self-report data: A proposal for reporting guidelines. Annals of Behavioral Medicine. 2002; 24 :236–243. doi: 10.1207/S15324796ABM2403_09. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stout RL. Advancing the analysis of treatment process. Addiction. 2007; 102 :1539–1545. doi: 10.1111/j.1360-0443.2007.01880.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tate RL, McDonald S, Perdices M, Togher L, Schultz R, Savage S. Rating the methodological quality of single-subject designs and N-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychological Rehabilitation. 2008; 18 (4):385–401. doi: 10.1080/09602010802009201. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thiele C, Laireiter A-R, Baumann U. Diaries in clinical psychology and psychotherapy: A selective review. Clinical Psychology & Psychotherapy. 2002; 9 (1):1–37. doi: 10.1002/cpp.302. [ CrossRef ] [ Google Scholar ]
  • Tiao GC, Box GEP. Modeling multiple time series with applications. Journal of the American Statistical Association. 1981; 76 :802–816. [ Google Scholar ]
  • Tschacher W, Ramseyer F. Modeling psychotherapy process by time-series panel analysis (TSPA) Psychotherapy Research. 2009; 19 (4):469–481. doi: 10.1080/10503300802654496. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. A comparison of missing-data procedures for ARIMA time-series analysis. Educational and Psychological Measurement. 2005a; 65 (4):596–615. doi: 10.1177/0013164404272502. [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. Missing data and the general transformation approach to time series analysis. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics. A festschrift to Roderick P McDonald. Hillsdale, NJ: Lawrence Erlbaum; 2005b. pp. 509–535. [ Google Scholar ]
  • Velicer WF, Fava JL. Time series analysis. In: Schinka J, Velicer WF, Weiner IB, editors. Research methods in psychology. Vol. 2. New York, NY: John Wiley & Sons; 2003. [ Google Scholar ]
  • Wachtel PL. Beyond “ESTs”: Problematic assumptions in the pursuit of evidence-based practice. Psychoanalytic Psychology. 2010; 27 (3):251–272. doi: 10.1037/a0020532. [ CrossRef ] [ Google Scholar ]
  • Watson JB. Behaviorism. New York, NY: Norton; 1925. [ Google Scholar ]
  • Weisz JR, Hawley KM. Finding, evaluating, refining, and applying empirically supported treatments for children and adolescents. Journal of Clinical Child Psychology. 1998; 27 :206–216. doi: 10.1207/s15374424jccp2702_7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weisz JR, Hawley KM. Procedural and coding manual for identification of beneficial treatments. Washinton, DC: American Psychological Association, Society for Clinical Psychology, Division 12, Committee on Science and Practice; 1999. [ Google Scholar ]
  • Westen D, Bradley R. Empirically supported complexity. Current Directions in Psychological Science. 2005; 14 :266–271. doi: 10.1111/j.0963-7214.2005.00378.x. [ CrossRef ] [ Google Scholar ]
  • Westen D, Novotny CM, Thompson-Brenner HK. The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting controlled clinical trials. Psychological Bulletin. 2004; 130 :631–663. doi: 10.1037/0033-2909.130.4.631. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilkinson L The Task Force on Statistical Inference. Statistical methods in psychology journals: Guidelines and explanations. American Psychologist. 1999; 54 :694–704. doi: 10.1037/0003-066X.54.8.594. [ CrossRef ] [ Google Scholar ]
  • Wolery M, Busick M, Reichow B, Barton EE. Comparison of overlap methods for quantitatively synthesizing single-subject data. The Journal of Special Education. 2010; 44 (1):18–28. doi: 10.1177/0022466908328009. [ CrossRef ] [ Google Scholar ]
  • Wu Z, Huang NE, Long SR, Peng C-K. On the trend, detrending, and variability of nonlinear and nonstationary time series. Proceedings of the National Academy of Sciences. 2007; 104 (38):14889–14894. doi: 10.1073/pnas.0701020104. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

COMMENTS

  1. Single Subject Experimental Designs

    When choosing a single-subject experimental design, ABA researchers are looking for certain characteristics that fit their study. First, individuals serve as their own control in single subject research. In other words, the results of each condition are compared to the participant's own data. If 3 people participate in the study, each will ...

  2. Optimizing behavioral health interventions with single-case designs

    Practitioners: practitioners can use single-case designs in clinical practice to help ensure that an intervention or component of an intervention is working for an individual client or group of clients. Policy makers: results from a single-case design research can help inform and evaluate policy regarding behavioral health interventions.

  3. Single-Case Designs

    Single-Case Evaluation Designs. Single-case evaluation methodology is a mainstay of ABA research (Kazdin, 2011) and the basis of many sports related studies (Luiselli, 2011; Martin et al., 2004). The publications we reviewed in this chapter are testimony to the variety of single-case designs available to researchers.

  4. Single-Case Design, Analysis, and Quality Assessment for Intervention

    Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single case studies involve repeated measures, and manipulation of and independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, and external ...

  5. Single-Subject Experimental Design for Evidence-Based Practice

    Design and analysis of single-case research. Erlbaum; Mahwah, NJ: 1996. pp. 119-158. [Google Scholar] Fukkink R. The internal validity of aphasiological single-subject studies. ... Hartman DP, Hall RV. The changing criterion design. Journal of Applied Behavior Analysis. 1976; 9:527-532. [PMC free article] [Google Scholar]

  6. A Meta-Analysis of Single-Case Research on Applied Behavior Analytic

    This systematic review evaluates single-case research design studies investigating applied behavior analytic (ABA) interventions for people with Down syndrome (DS). One hundred twenty-five studies examining the efficacy of ABA interventions on increasing skills and/or decreasing challenging behaviors met inclusion criteria.

  7. PDF Design Options for Home Visiting Evaluation SINGLE CASE DESIGN BRIEF

    Single case design (SCD), often referred to as single subject design, is an evaluation method that can be used to rigorously test the success of an intervention or treatment on a particular case (i.e., a person, school, community) and to also provide evidence about the general effectiveness of an intervention using a relatively small sample ...

  8. PDF SYSTEMATIC REVIEW AND META ANALYSIS OF ABA AND DS 1 A Meta-Analysis of

    Single-Case Research Design and Quality Historically, the most common research designs employed in the field of ABA have been single-case designs (Kennedy, 2005) and, therefore, is especially relevant to reviews of ABA. While there is a long history of tools to evaluate the quality of group design research, only

  9. Single‐case experimental designs: Characteristics, changes, and

    Tactics of Scientific Research (Sidman, 1960) provides a visionary treatise on single-case designs, their scientific underpinnings, and their critical role in understanding behavior. Since the foundational base was provided, single-case designs have proliferated especially in areas of application where they have been used to evaluate interventions with an extraordinary range of clients ...

  10. Applied Behavior Analysis: Single Subject Research Design

    Single case design (SCD), often referred to as single subject design, is an evaluation method that can be used to rigorously test the success of an intervention or treatment on a particular case (i.e., a person, school, community) and to also provide evidence about the general effectiveness of an intervention using a relatively small sample size.

  11. Find Single Subject Research Articles

    Includes suggested databases, search techniques for finding single subject studies, and links to ABA journals. Tips for using OneSearch to search ABA-related journals for articles using single case research design.

  12. PDF Single Case Experimental Design

    2. Students will learn to identify the different types of single case designs and be able to recommend particular designs for use in both research and practice 3. Students will learn the similarities and differences between single case and group designs 4. Students will learn how to draw valid inferences from data including how to assess

  13. Single-case experimental designs to assess intervention effectiveness

    The applied behavior analysis research paradigm and single-subject designs in adapted physical activity research. Adapt Phys Activ Q, 32 (2015), ... Google calendar: a single-case experimental design study of a man with severe memory problems. Neuropsychol Rehabil, 25 (2015), pp. 617-636 [Epub 2014 Sep 29]

  14. Systematic Protocols for the Visual Analysis of Single-Case Research

    Single-case research (SCR) is the predominant methodology used to evaluate causal relations between interventions and target behaviors in applied behavior analysis and related fields such as special education and psychology (Horner et al., 2005; Kazdin, 2011).This methodology focuses on the individual case as the unit of analysis and is well suited to examining the effectiveness of interventions.

  15. A Meta-Analysis of Single-Case Research on Applied Behavior Analytic

    Abstract. This systematic review evaluates single-case research design studies investigating applied behavior analytic (ABA) interventions for people with Down syndrome (DS). One hundred twenty-five studies examining the efficacy of ABA interventions on increasing skills and/or decreasing challenging behaviors met inclusion criteria. The What Works Clearinghouse standards and Risk of Bias in N ...

  16. Single-Subject vs. Group Research Designs

    This blog post will cover D-4 of Section 1 in the BCBA/BCaBA Fifth Edition Task List. You will learn about how to "describe the advantages of single-subject experimental designs compared to group designs" (Behavior Analy...

  17. 10.2 Single-Subject Research Designs

    The most basic single-subject research design is the reversal design, also called the ABA design. During the first phase, A, a baseline is established for the dependent variable. This is the level of responding before any treatment is introduced, and therefore the baseline phase is a kind of control condition.

  18. Meta-analysis of single-case treatment effects on self-injurious

    In examination of the 679 articles, we used the following criteria to select studies or datasets for inclusion: (a) the experimental study used a single-case research design, beginning with a baseline phase that was followed by a treatment phase; (b) the dependent variable was a quantitative measure of SIB (e.g., frequency of head-hitting); (c ...

  19. Randomized single-case AB phase designs: Prospects and pitfalls

    Single-case experimental designs (SCEDs) are increasingly used in fields such as clinical psychology and educational psychology for the evaluation of treatments and interventions in individual participants. The AB phase design, also known as the interrupted time series design, is one of the most basic SCEDs used in practice. Randomization can be included in this design by randomly determining ...

  20. The Family of Single-Case Experimental Designs

    Abstract. Single-case experimental designs (SCEDs) represent a family of research designs that use experimental methods to study the effects of treatments on outcomes. The fundamental unit of analysis is the single case—which can be an individual, clinic, or community—ideally with replications of effects within and/or between cases.

  21. Generality of Findings From Single-Case Designs: It's Not All About the

    Direct replication is the strategy of repeating a study with no procedural changes to assess the reliability of a finding. This can be accomplished in the original study or in a separate study by the original or new researchers. In single-case design research, this type of replication is most apparent in the ABAB design, which includes an ...

  22. Single-Case Experimental Designs: A Systematic Review of Published

    The single-case experiment has a storied history in psychology dating back to the field's founders: Fechner (1889), Watson (1925), and Skinner (1938).It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see ...

  23. Single Case Designs in Psychology Practice

    The ABA design, like other single case designs, allows the client values to be incorporated into the choice of targets and goal setting procedures. 4,5 In the AB design, the intervention is followed by a baseline period.