Jonathan AC Sterne, Miguel A Hernán, Alexandra McAleenan, Barnaby C Reeves, Julian PT Higgins Show
Key Points:
This chapter should be cited as: Sterne JAC, Hernán MA, McAleenan A, Reeves BC, Higgins JPT. Chapter 25: Assessing risk of bias in a non-randomized study. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane, 2022. Available from www.training.cochrane.org/handbook. 25.1 IntroductionCochrane Reviews often include non-randomized studies of interventions (NRSI), as discussed in detail in Chapter 24. Risk of bias should be assessed for each included study (see Chapter 7). The Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool (Sterne et al 2016) is recommended for assessing risk of bias in a NRSI: it provides a framework for assessing the risk of bias in a single result (an estimate of the effect of an experimental intervention compared with a comparator intervention on a particular outcome). Many features of ROBINS-I are shared with the RoB 2 tool for assessing risk of bias in randomized trials (see Chapter 8). Evaluating risk of bias in results of NRSI requires both methodological and content expertise. The process is more involved than for randomized trials, and the participation of both methodologists with experience in the relevant study designs or design features, and health professionals with knowledge of prognostic factors that influence intervention decisions for the target patient or population group, is recommended (see Chapter 24). At the planning stage, the review question must be clearly articulated, and important potential problems in NRSI relevant to the review should be identified. This includes a preliminary specification of important confounders and co-interventions (see Section 25.3.1). Each study should then be carefully examined, considering all the ways in which its results might be put at risk of bias. In this chapter we summarize the biases that can affect NRSI and describe the main features of the ROBINS-I tool. Since the initial version of the tool was published in 2016 (Sterne et al 2016), developments to it have continued. At the time of writing, a new version is under preparation, with variants for several types of NRSI design. The full guidance documentation for the ROBINS-I tool, including the latest variants for different study designs, is available at www.riskofbias.info. 25.1.1 Defining bias in a non-randomized studyWe define bias as the systematic difference between the study results obtained from an NRSI and a pragmatic randomized trial (both with a very large sample size), addressing the same question and conducted on the same participant group, that had no flaws in its conduct. Defined in this way, bias is distinct from issues of indirectness (applicability, generalizability or transportability to types of individuals who were not included in the study; see Chapter 14) and distinct from chance. For example, restricting the study sample to individuals free of comorbidities may limit the utility of its findings because they cannot be generalized to clinical practice, where comorbidities are common. However, such restriction does not bias the results of the study in relation to individuals free of comorbidities. Evaluations of risk of bias in the results of NRSI are thus facilitated by considering each NRSI as an attempt to emulate (mimic) a hypothetical ‘target’ randomized trial (see also Section 25.3.2). This is the hypothetical pragmatic randomized trial that compares the health effects of the same interventions, conducted on the same participant group and without features putting it at risk of bias (Institute of Medicine 2012, Hernán and Robins 2016). Importantly, a target randomized trial need not be feasible or ethical. For example, there would be no problem specifying a target trial that randomized individuals to receive tobacco cigarettes or no cigarettes to examine the effects of smoking, even though such a trial would not be ethical in practice. Similarly, there would be no problem specifying a target trial that randomized multiple countries to implement a ban on smoking in public places, even though this would not be feasible in practice. 25.2 Biases in non-randomized studiesWhen a systematic review includes randomized trials, its results correspond to the causal effects of the interventions studied provided that the trials have no bias. Randomization is used to avoid an influence of either known or unknown prognostic factors (factors that predict the outcome, such as severity of illness or presence of comorbidities) on intervention group assignment. There is greater potential for bias in NRSI than in randomized trials. A key concern is the possibility of confounding (see Section 25.2.1). NRSI may also be affected by biases that are referred to in the epidemiological literature as selection bias (see Section 25.2.2) and information bias (see Section 25.2.3). Furthermore, we are at least as concerned about reporting biases as we are when including randomized trials (see Section 25.2.4). 25.2.1 ConfoundingConfounding occurs when there are common causes of the choice of intervention and the outcome of interest. In the presence of confounding, the association between intervention and outcome differs from its causal effect. This difference is known as confounding bias. A confounding domain (or, more loosely, a ‘confounder’) is a pre-intervention prognostic factor (i.e. a variable that predicts the outcome of interest) that also predicts whether an individual receives one or the other interventions of interest. Some common examples are severity of pre-existing disease, presence of comorbidities, healthcare use, physician prescribing practices, adiposity, and socio-economic status. Investigators measure specific variables (often also referred to as confounders) in an attempt to control fully or partly for these confounding domains. For example, baseline immune function and recent weight loss may be used to adjust for disease severity; hospitalizations and number of medical encounters in the six months preceding baseline may be used to adjust for healthcare use; geographic measures to adjust for physician prescribing practices; body mass index and waist-to-hip ratio to adjust for adiposity; and income and education to adjust for socio-economic status. The confounding domains that are important in the context of particular interventions may vary across study settings. For example, socio-economic status might be an important confounder in settings where cost or having insurance cover affects access to health care, but might not introduce confounding in studies conducted in countries in which access to the interventions of interest is universal and therefore socio-economic status does not influence intervention received. Confounding may be overcome, in principle, either by design (e.g. by restricting eligibility to individuals who all have the same value of the baseline confounders) or – more commonly – through statistical analyses that adjust (‘control’) for the confounder(s). Adjusting for factors that are not confounders, and in particular adjusting for variables that could be affected by intervention (‘post-intervention’ variables), may introduce bias. In practice, confounding is not fully overcome. First, residual confounding occurs when a confounding domain is not measured, is measured with error, or when the relationship between the confounding domain and the outcome or exposure (depending on the analytic approach being used) is imperfectly modelled. For example, in a NRSI comparing two antihypertensive drugs, we would expect residual confounding if pre-intervention blood pressure was measured three months before the start of intervention, but the blood pressures used by clinicians to decide between the drugs at the point of intervention were not available in our dataset. Second, unmeasured confounding occurs when a confounding domain has not been measured at all, or is not controlled for in the analysis. This would be the case if no pre-intervention blood pressure measurements were available, or if the analysis failed to control for pre-intervention blood pressure despite it being measured. Unmeasured confounding can usually not be excluded, because we are seldom certain that we know all the confounding domains. When NRSI are to be included in a review, review authors should attempt to pre-specify important confounding domains in their protocol. The identification of potential confounding domains requires subject-matter knowledge. For example, experts on surgery are best-placed to identify prognostic factors that are likely to be related to the choice of a surgical strategy. We recommend that subject-matter experts be included in the team writing the review protocol, and we encourage the listing of confounding domains in the review protocol, based on initial discussions among the review authors and existing knowledge of the literature. 25.2.2 Selection biasSelection bias occurs when some eligible participants, or some follow-up time of some participants, or some outcome events, are excluded in a way that leads to the association between intervention and outcome in the NRSI differing from the association that would have been observed in the target trial. This phenomenon is distinct from that of confounding, although the term selection bias is sometimes used to mean confounding. Selection biases occur in NRSI either due to selection of participants or follow-up time into the study (addressed in the ‘Bias in selection of participants into the study’ domain), or selection of participants or follow-up time out of the study (addressed in the ‘Bias due to missing data’ domain). Our use of the term ‘selection bias’ is intended to refer only to bias that would arise even if the effect of interest were null, that is, biases that are internal to the study, and not to issues of indirectness (generalizability, applicability or transferability to people who were excluded from the study) (Schünemann et al 2013). Selection bias occurs when selection of participants or follow-up time is related to both intervention and outcome. For example, studies of folate supplementation during pregnancy to prevent neural tube defects in children were biased because they only included mothers and children if children were born alive (Hernán et al 2002). The bias arose because having a live birth (rather than a stillbirth or therapeutic abortion, for which outcome data were not available) is related to both the intervention (because folate supplementation increases the chance of a live birth) and the outcome (because the presence of neural tube defects makes a live birth less likely) (Velie and Shaw 1996, Hernán et al 2002). Selection bias can also occur when some follow-up time is excluded from the analysis. For example, there is potential for bias when prevalent users of an intervention (those already receiving the intervention), rather than incident (new) users are included in analyses comparing them with non-users. This is a type of selection bias that has also been termed inception bias or lead time bias. If participants are not followed from assignment of the intervention (inception), as they would be in a randomized trial, then a period of follow-up has been excluded, and individuals who experienced the outcome soon after starting the intervention will be missing from analyses. Selection bias may also arise because of missing data due to, among other reasons, attrition (loss to follow-up), missed appointments, incomplete data collection and by participants being excluded from analysis by primary investigators. In NRSI, data may be missing for baseline characteristics (including interventions received or baseline confounders), for pre-specified co-interventions, for outcome measurements, for other variables involved in the analysis or a combination of these. Specific considerations for missing data broadly follow those established for randomized trials and described in the RoB 2 tool for randomized trials (see Chapter 8). 25.2.3 Information biasBias may be introduced if intervention status is misclassified, or if outcomes are misclassified or measured with error. Such bias is often referred to as information bias or measurement bias. Errors in classification (or measurement) may be non-differential or differential, and in general we are more concerned about such errors when they are differential. Differential misclassification of intervention status occurs when misclassifications are related to subsequent outcome or to risk of the outcome. Differential misclassification (or measurement error) in outcomes occurs when it is related to intervention status. Misclassification of intervention status is seldom a problem in randomized trials and other experimental studies, because interventions are actively assigned by the researcher and their accurate recording is a key feature of the study. However, in observational studies information about interventions allocated or received must be ascertained. To prevent differential misclassification of intervention status it is important that, wherever possible, interventions are defined and categorized without knowledge of subsequent outcomes. A well-known example of differential misclassification, when knowledge of subsequent outcomes might affect classification of interventions, is recall bias in a case-control study: cases may be more likely than controls to recall potentially important events or report exposure to risk factors they believe to be responsible for their disease. Differential misclassification of intervention status can occur in cohort studies if it is obtained retrospectively. This can happen if information (or availability of information) on intervention status is influenced by outcomes: for example a cohort study in elderly people in which the outcome is dementia, and participants’ recall of past intervention status at study inception was affected by pre-existing mild cognitive impairment. Such problems can be avoided if information about intervention status is collected at the time of the intervention and the information is complete and accessible to those undertaking the NRSI. Bias in measurement of the outcome is often referred to as detection bias. Examples of situations in which such bias can arise are if (i) outcome assessors are aware of intervention status (particularly when assessment of the outcome is subjective); (ii) different methods (or intensities of observation) are used to assess outcomes in the different intervention groups; and (iii) measurement errors are related to intervention status (or to a confounder of the intervention-outcome relationship). Blinding of outcome assessors aims to prevent systematic differences in measurements between intervention groups but is frequently not possible or not performed in NRSI. 25.2.4 Reporting biasConcerns over selection of the reported results from NRSI reflect the same concerns as for randomized trials (see Chapter 7 and Chapter 8, Section 8.7). Selective reporting typically arises from a desire for findings to be newsworthy, or sufficiently noteworthy to merit publication: this could be the case if previous evidence (or a prior hypothesis) is either supported or contradicted. Although there is a lack of empirical evidence of selective reporting in NRSI compared with randomized trials, it is difficult to imagine that the problem is any less serious for NRSI. Many NRSI do not have written protocols, and many are exploratory so – by design – involve inspecting many associations between intervention and outcome. Selection of the reported result will lead to bias if it is based on the P value, magnitude or direction of the intervention effect estimate. Bias due to selection of the outcome measure occurs when an effect estimate for a particular outcome is selected from among multiple measurements, for example when a measurement is made at a number of time points or using multiple scales. Bias due to selection of the analysis occurs when the reported results are selected from intervention effects estimated in multiple ways, such as analyses of both change scores and post-intervention scores adjusted for baseline, or multiple analyses with adjustment for different sets of potential confounders. Finally, there may be selective reporting of a subgroup of participants, selected from a larger NRSI, for which results are reported on the basis of a more interesting finding. The separate issue of bias due to missing results, where non-reporting of study outcomes or whole studies is related to the P value, magnitude or direction of the intervention effect estimate, is addressed outside the framework of the ROBINS-I tool, and is described in detail in Chapter 13. 25.3 The ROBINS-I tool25.3.1 At protocol stage: listing the confounding domains and the possible co-interventionsReview authors planning a ROBINS-I assessment should list important confounding domains in their protocol. Relevant confounding domains are the prognostic factors (predictors of the outcome) that also predict whether an individual receives one or the other intervention of interest. Review authors are also encouraged to list important co-interventions in their protocol. Relevant co-interventions are the interventions or exposures that individuals might receive after or with initiation of the intervention of interest, which are related to the intervention received and which are prognostic for the outcome of interest. Therefore, co-interventions are a type of confounder, which we consider separately to highlight its importance. Important confounders and co-interventions are likely to be identified both through the knowledge of subject-matter experts who are members of the review team, and through initial (scoping) reviews of the literature. Discussions with health professionals who make intervention decisions for the target patient or population groups may also be helpful. Assessment of risk of bias may, for some domains, rely heavily on expert opinion rather than empirical data: this means that consensus may not be reached among experts with different opinions. Nonetheless use of ROBINS-I should help structure discussions about risk of bias and make disagreements explicit. 25.3.2 Specifying a target trial specific to the studyROBINS-I requires that review authors explicitly identify the interventions that would be compared in the hypothetical target trial that the NRSI is trying to emulate (see Section 25.1.1). Often the description of these interventions will require subject-matter knowledge, because information provided by the investigators of the observational study is insufficient to define the target trial. For example, NRSI authors may refer to ‘use of therapy [A],’ which does not directly correspond to the intervention ‘prescribe therapy [A]’ that would be tested in an intention-to-treat analysis of the target trial. Meaningful assessment of risk of bias is problematic in the absence of well-defined interventions. 25.3.3 Specifying the nature of the effect of interestIn the target trial, the effect of interest will be either the effect of assignment to the interventions at baseline, regardless of the extent to which the interventions were received as intended, or the effect of adhering to the interventions as specified in the study protocol (see Chapter 8, Section 8.2.2). Risk of bias will be assessed in relation to one of these effects. The choice of effect of interest is a decision of the review authors. However, it may be influenced by the analyses that produced the NRSI result being assessed, because the result may correspond more closely to one of the effects of interest and would, therefore, be at greater risk of bias with respect to the alternative effect of interest. In a randomized trial, these two effects may be interpreted as the intention-to-treat (ITT) effect and the per protocol effect (see also Chapter 8, Section 8.2.2). Analogues of these effects can be defined for NRSI. For example, the ITT effect can be approximated by the effect of prescribing experimental intervention versus prescribing comparator intervention. When prescription information is not available, the ITT effect can be approximated by the effect of starting the experimental intervention versus starting comparator intervention, which corresponds to the ITT effect in a trial in which participants assigned to an intervention always start the intervention. An analogue of the effect of adhering to the intervention as described in the trial protocol is (starting and) adhering to experimental intervention versus (starting and) adhering to comparator intervention unless medical reasons (e.g. toxicity) indicate discontinuation. For both NRSI and randomized trials, unbiased estimation of the effect of adhering to sustained interventions (interventions that continue over time, such as daily ingestion of a drug intervention) requires appropriate adjustment for prognostic factors (‘time-varying confounders’) that predict deviations from the intervention after the start of follow-up (baseline). Review authors should seek specialist advice when assessing intervention effects estimated using methods that adjust for time-varying confounding. When the effect of interest is that of assignment to the intervention (or starting intervention at baseline), risk-of-bias assessments need not be concerned with post-baseline deviations from intended interventions that reflect the natural course of events. For example, a departure from an allocated intervention that was clinically necessary because of a sudden worsening of the patient’s condition does not lead to bias. The only post-baseline deviation that may lead to bias are the potentially biased actions of researchers arising from the experimental context. Observational studies estimating the effect of assignment to intervention from routine data should therefore have no concerns about post-baseline deviations from intended interventions. By contrast, when the effect of interest is adhering to the intended intervention, risk-of-bias assessments of both NRSI and randomized trials should consider post-baseline deviations from the intended interventions, including lack of adherence and differences in additional interventions (co-interventions) between intervention groups. 25.3.4 Domains of biasThe domains included in ROBINS-I cover all types of bias that are currently understood to affect the results of NRSI. Each domain is mandatory, and no additional domains should be added. Table 25.3.a lists the bias domains covered by the tool for most types of NRSI. Versions of the tool are available, or in development, for several types of NRSI, and the variant selected should be appropriate to the key features of the study being assessed (see latest details at www.riskofbias.info). In common with RoB 2 (Chapter 8, Section 8.2.3), the tool comprises, for each domain:
The signalling questions aim to elicit information relevant to the risk-of-bias judgement for the domain, and work in the same way as for RoB 2 (see Chapter 8, Section 8.2.3). The response options are:
Based on these responses to the signalling questions, the options for a domain-level risk-of-bias judgement are ‘Low’, ‘Moderate’, ‘Serious’ or ‘Critical’ risk of bias, with an additional option of ‘No information’ (see Table 25.3.b). These differ from the risk-of-bias judgements for the RoB 2 tool (Chapter 8, Section 8.2.3). Note that a judgement of ‘Low risk of bias’ corresponds to the absence of bias in a well-performed randomized trial, with regard to the domain being considered. This category thus provides a reference for risk-of-bias assessment in NRSI in particular for the ‘pre-intervention’ and ‘at-intervention’ domains. Because of confounding, we anticipate that only rarely will design or analysis features of a non-randomized study lead to a classification of low risk of bias when studying the intended effects of interventions (on the other hand, confounding may be a less serious concern when studying unintended effects of intervention (Institute of Medicine 2012)). By contrast, since randomization does not protect against post-intervention biases, we expect more overlap between assessments of randomized trials and assessments of NRSI for the post-intervention domains. Nonetheless other features of randomized trials that are usually not feasible in NRSI, such as blinding of participants, health professionals or outcome assessors, may make NRSI more at risk of post-intervention biases. As for RoB 2, a free text box alongside the signalling questions and judgements provides space for review authors to present supporting information for each response. Brief, direct quotations from the text of the study report should be used whenever possible. The tool includes an optional component to judge the direction of the bias for each domain and overall. For some domains, the bias is most easily thought of as being towards or away from the null. For example, suspicion of selective non-reporting of statistically non-significant results would suggest bias away from the null. However, for other domains (in particular confounding, selection bias and forms of measurement bias such as differential misclassification), the bias needs to be thought of as an increase or decrease in the effect estimate to favour either the experimental intervention or comparator compared with the target trial, rather than towards or away from the null. For example, confounding bias that decreases the effect estimate would be towards the null if the true risk ratio were greater than 1, and away from the null if the risk ratio were less than 1. If review authors do not have a clear rationale for judging the likely direction of the bias, they should not attempt to guess it and should leave this response blank. Table 25.3.a Bias domains included in the ROBINS-I tool
Table 25.3.b Reaching a risk-of-bias judgement for an individual bias domain
25.3.5 Reaching an overall risk-of-bias judgement for a resultThe response options for an overall risk-of-bias judgement for a result, across all domains, are the same as for individual domains. Table 25.3.c shows the approach to mapping risk-of-bias judgements within domains to an overall judgement for the outcome. Judging a result to be at a particular level of risk of bias for an individual domain implies that the result has an overall risk of bias at least this severe. For example, a judgement of ‘Serious’ risk of bias within any domain implies that the concerns identified have serious implications for the result overall, irrespective of which domain is being assessed. In practice this means that if the answers to the signalling questions yield a proposed judgement of ‘Serious’ or ‘Critical’ risk of bias, review authors should consider whether any identified problems are of sufficient concern to warrant this judgement for that result overall. If this is not the case, the appropriate action would be to retain the answers to the signalling questions but override the proposed default judgement and provide justification. ‘Moderate’ risk of bias in multiple domains may lead review authors to decide on an overall judgement of ‘Serious’ risk of bias for that outcome or group of outcomes, and ‘Serious’ risk of bias in multiple domains may lead review authors to decide on an overall judgement of ‘Critical’ risk of bias. Once an overall judgement has been reached for an individual study result, this information should be presented in the review and reflected in the analysis and conclusions. For discussion of the presentation of risk-of-bias assessments and how they can be incorporated into analyses, see Chapter 7. Risk-of-bias assessments also feed into one domain of the GRADE approach for assessing certainty of a body of evidence, as discussed in Chapter 14. Table 25.3.c Reaching an overall risk-of-bias judgement for a specific outcome
25.4 Risk of bias in follow-up (cohort) studiesAs discussed in Chapter 24 (Section 24.2), labels such as ‘cohort study’ can be inconsistently applied and encompass many specific study designs. For this reason, these terms are generally discouraged in Cochrane Reviews in favour of using specific features to describe how the study was designed and analysed. For the purposes of ROBINS-I, we define a category of studies, which we refer to as follow-up studies, that refers to studies in which participants are followed up from the start of intervention up to a later time for ascertainment of outcomes of interest. This includes inception cohort studies (in which participants are identified at the start of intervention), non-randomized controlled trials, many analyses of routine healthcare databases, and retrospective cohort studies. The issues covered by ROBINS-I for follow-up studies are summarized in Table 25.4.a. A distinctive feature of a ROBINS-I assessment of follow-up studies is that it addresses both baseline confounding (the most familiar type) and time-varying confounding. Baseline confounding occurs when one or more pre-intervention prognostic factors predict the intervention received at start of follow-up. A pre-intervention variable is one that is measured before the start of interventions of interest. For example, a cohort study comparing two antiretroviral drug regimens for HIV should control for CD4 cell count measured before the start of antiretroviral therapy, because this is strongly prognostic for the outcomes AIDS and death, and is also likely to influence choice of regimen. Baseline confounding is likely to be an issue in most NRSI. In some NRSI, particularly those based on routinely collected data, participants switch between the interventions being compared over time, and the follow-up time from these individuals is divided between the intervention groups according to the intervention received at any point in time. If post-baseline prognostic factors affect the interventions to which the participants switch, then this can lead to time-varying confounding. For example, suppose a study of patients treated for HIV partitions follow-up time into periods during which patients were receiving different antiretroviral regimens and compares outcomes during these periods in the analysis. Post-baseline CD4 cell counts might influence switches between the regimens of interest. When such post-baseline prognostic variables are affected by the interventions themselves (e.g. antiretroviral regimen may influence post-baseline CD4 count), we say that there is treatment-confounder feedback. This implies that conventional adjustment (e.g. Poisson or Cox regression models) is not appropriate as a means of controlling for time-varying confounding. Other post-baseline prognostic factors, such as adverse effects of an intervention, may also predict switches between interventions. Note that a change from the baseline intervention may result in switching to an intervention other than the alternative of interest in the study (i.e. from experimental intervention to something other than the comparator intervention, or from comparator intervention to something other than the experimental intervention). If follow-up time is re-allocated to the alternative intervention in the analysis that produced the result being assessed for risk of bias, then there is a potential for bias arising from time-varying confounding. If follow-up time was not allocated to the alternative intervention, then the potential for bias is considered either (i) under the domain ‘Bias due to deviations from intended interventions’ if interest is in the effect of adhering to intervention and the follow-up time on the subsequent intervention is included in the analysis, or (ii) under ‘Bias due to missing data’ if the follow-up time on the subsequent intervention is excluded from the analysis. Table 25.4.a Bias domains included in the ROBINS-I tool for follow-up studies, with a summary of the issues addressed
25.5 Risk of bias in uncontrolled before-after studies (including interrupted time series)In some studies measurements of the outcome variable are made both before and after an intervention takes place. The measurements may be made on individuals, clusters of individuals, or administrative entities according to the unit of analysis of the study. There may be only one unit, several units or many units. Here, we consider only uncontrolled studies in which all units contributing to the analysis received the (same) intervention. Controlled versions of these studies are covered in Section 25.6. This category of studies includes interrupted time series (ITS) studies (Kontopantelis et al 2015, Polus et al 2017). ITS studies collect longitudinal data measured at an aggregate level (across participants within one or more units), with several measurement times before implementation of the intervention, and several measurement times after implementation of the intervention. These studies might be characterized as uncontrolled, repeated cross-sectional designs, where the population of interest may be defined geographically or through interaction with a health service, and measures of activity or outcomes may include different individuals at each time point. A specific time point known as the ‘interruption’ defines the distinction between ‘before’ (or ‘pre-intervention’) and ‘after’ (or ‘post-intervention’) time points. Specifying the exact time of this interruption can be challenging, especially when an intervention has many phases or when periods of preparation of the intervention may result in progressive changes in outcomes (e.g. when there are debates and processes leading to a new law or policy). The data from an ITS are typically a single time series, and may be analysed using time series methods (e.g. ARIMA models). In an ITS analysis, the ‘comparator group’ is constructed by making assumptions about the trajectory of outcomes had there been no intervention (or interruption), based on patterns observed before the intervention. The intervention effect is estimated by comparing the observed outcome trajectory after intervention with the assumed trajectory had there been no intervention. The category also includes studies in which multiple individuals are each measured before and after receiving an intervention: there may be several pre- and post-intervention measurements. These studies might be characterized as uncontrolled, longitudinal designs (alternatively they may be referred to as repeated measures studies, before-after studies, pre-post studies or reflexive control studies). One special case is a study with a single pre-intervention outcome measurement and a single post-intervention outcome measurement for each of multiple participants. Such a study will usually be judged to be at serious or critical risk of bias because it is impossible to determine whether pre-post changes are due to the intervention rather than other factors. The main issues addressed in a ROBINS-I evaluation of an uncontrolled before-after study are summarized below and in Table 25.5.a. We address issues only for the effect of assignment to intervention, since we do not expect uncontrolled before-after studies to examine the effect of starting and adhering to the intended intervention.
Table 25.5.a Bias domains included in the ROBINS-I tool for (uncontrolled) before-after studies, with a summary of the issues addressed
25.6 Risk of bias in controlled before-after studiesStudies in which: (i) units are non-randomly allocated to a group that receives an intervention or to an alternative group that receives nothing or a comparator intervention; and (ii) at least one measurement of the outcome variable is made in both groups before and after implementation of the intervention are often known as controlled before-after studies (CBAs) (Eccles et al 2003, Polus et al 2017). The comparator group(s) may be contemporaneous or not. This category also includes controlled interrupted time series (CITSs) (Lopez Bernal et al 2018). The units included in the study may be individuals, clusters of individuals, or administrative units. The intervention may be at the level of the individual unit or at some aggregate (cluster) level. Studies may follow the same units over time (sometimes referred to as within-person or within-unit longitudinal designs) or look at (possibly) different units at the different time points (sometimes referred to as repeated cross-sectional designs, where the population of interest may be defined geographically or through interaction with a health service, and may include different individuals over time). A common analysis of CBA studies is a ‘difference in differences’ analysis, in which before-after differences in the outcome (possibly averaged over multiple units) are contrasted between the intervention and comparator groups. The outcome measurements before and after intervention may be single observations, means, or measures of trend or pattern. The assumption underlying such an analysis is that the before-after change in the intervention group is equivalent to the before-after change in the comparator group, except for any causal effects of the intervention; that is, that the pre-post intervention difference in the comparator group reflects what would have happened in the intervention group had the intervention not taken place. The main issues addressed in a ROBINS-I evaluation of a controlled before-after study are summarized below and in Table 25.6.a.
Table 25.6.a Bias domains included in the ROBINS-I tool for controlled before-after studies, with a summary of the issues addressed
25.7 Chapter informationAuthors: Jonathan AC Sterne, Miguel A Hernán, Alexandra McAleenan, Barnaby C Reeves, Julian PT Higgins Acknowledgements: ROBINS-I was developed by a large collaborative group, and we acknowledge the contributions of Jelena Savović, Nancy Berkman, Meera Viswanathan, David Henry, Douglas Altman, Mohammed Ansari, Rebecca Armstrong, Isabelle Boutron, Iain Buchan, James Carpenter, An-Wen Chan, Rachel Churchill, Jonathan Deeks, Roy Elbers, Atle Fretheim, Jeremy Grimshaw, Asbjørn Hróbjartsson, Jemma Hudson, Jamie Kirkham, Evan Kontopantelis, Peter Jüni, Yoon Loke, Luke McGuinness, Jo McKenzie, Laurence Moore, Matt Page, Theresa Pigott, Stephanie Polus, Craig Ramsay, Deborah Regidor, Eva Rehfuess, Hannah Rothstein, Lakhbir Sandhu, Pasqualina Santaguida, Holger Schünemann, Beverley Shea, Sasha Shepperd, Ian Shrier, Hilary Thomson, Peter Tugwell, Lucy Turner, Jeffrey Valentine, Hugh Waddington, Elizabeth Waters, George Wells, Penny Whiting and David Wilson. Funding: Development of ROBINS-I was funded by a Methods Innovation Fund grant from Cochrane and by Medical Research Council (MRC) grant MR/M025209/1. JACS, BCR and JPTH are members of the National Institute for Health Research (NIHR) Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol, the NIHR Collaboration for Leadership in Applied Health Research and Care West (CLAHRC West) at University Hospitals Bristol NHS Foundation Trust, and the MRC Integrative Epidemiology Unit at the University of Bristol. JACS and JPTH received funding from NIHR Senior Investigator awards NF-SI-0611-10168 and NF-SI-0617-10145, respectively. JPTH and AM are funded in part by Cancer Research UK (grant C18281/A19169). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, the Department of Health, the MRC or Cancer Research UK. 25.8 ReferencesEccles M, Grimshaw J, Campbell M, Ramsay C. Research designs for studies evaluating the effectiveness of change and improvement strategies. Quality and Safety in Health Care 2003; 12: 47–52. Hernán MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. American Journal of Epidemiology 2002; 155: 176–184. Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. American Journal of Epidemiology 2016; 183: 758–764. Institute of Medicine. Ethical and Scientific Issues in Studying the Safety of Approved Drugs. Washington (DC): The National Academies Press; 2012. Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D. Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis. BMJ 2015; 350: h2750. Lopez Bernal J, Cummins S, Gasparrini A. The use of controls in interrupted time series studies of public health interventions. International Journal of Epidemiology 2018; 47: 2082–2093. Polus S, Pieper D, Burns J, Fretheim A, Ramsay C, Higgins JPT, Mathes T, Pfadenhauer LM, Rehfuess EA. Heterogeneity in application, design, and analysis characteristics was found for controlled before-after and interrupted time series studies included in Cochrane reviews. Journal of Clinical Epidemiology 2017; 91: 56–69. Schünemann HJ, Tugwell P, Reeves BC, Akl EA, Santesso N, Spencer FA, Shea B, Wells G, Helfand M. Non-randomized studies as a source of complementary, sequential or replacement evidence for randomized controlled trials in systematic reviews on the effects of interventions. Research Synthesis Methods 2013; 4: 49–62. Sterne JAC, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, Henry D, Altman DG, Ansari MT, Boutron I, Carpenter JR, Chan AW, Churchill R, Deeks JJ, Hróbjartsson A, Kirkham J, Jüni P, Loke YK, Pigott TD, Ramsay CR, Regidor D, Rothstein HR, Sandhu L, Santaguida PL, Schünemann HJ, Shea B, Shrier I, Tugwell P, Turner L, Valentine JC, Waddington H, Waters E, Wells GA, Whiting PF, Higgins JPT. ROBINS-I: a tool for assessing risk of bias in non-randomized studies of interventions. BMJ 2016; 355: i4919. Velie EM, Shaw GM. Impact of prenatal diagnosis and elective termination on prevalence and risk estimates of neural tube defects in California, 1989–1991. American Journal of Epidemiology 1996; 144: 473–479. For permission to re-use material from the Handbook (either academic or commercial), please see here for full details. Which is a serious concern with a repeatedWhich of the following possibilities is a serious concern with a repeated-measures study? You will obtain negative values for the difference scores. The results will be influenced by order effects. The mean difference is due to individual differences rather than treatment differences.
Which is a serious concern with a repeatedIn a repeated-measures experiment, each individual participates in one treatment condition and then moves on to a second treatment condition. One of the major concerns in this type of study is that participation in the first treatment may influence the participant's score in the second treatment.
Which of the following describes the effect of increasing the sample size in a repeatedWhich of the following describes the effect of increasing sample size in a repeated-measures design? There is little or no effect on measures of effect size, but the likelihood of rejecting the null hypothesis increases.
Which value is estimated with a confidence interval using the repeatedThe correct answer is B) The value for an unknown population mean. The t-statistic is calculated during a t-test, giving it the name. This attempts to infer the value for some unknown population average.
|