Do sample size calculations in longitudinal orthodontic trials use the advantages of this study design?A meta-epidemiological study
To examine whether optimal calculations of the sample size are being used in longitudinal orthodontic trials. Longitudinal orthodontic trials with a minimum of three time points of outcome assessment published between January 1, 2017, and December 30, 2020, were sourced from a single electronic database. Study characteristics at the level of each trial were undertaken independently and in duplicate. Descriptive statistics and summary values were calculated. Inferential statistics (Fisher's exact test and logistic regression) were applied to detect associations between reporting of a sample size calculation and the study characteristics. A total of 147 trials were analyzed; 75.5% of these trials reported a sample size calculation with none reporting optimal sample size calculation for longitudinal trials. Most of the longitudinal orthodontic trials did not report the correlation and the number of longitudinal measurements in calculating the sample size. An association between reporting of a sample size calculation (yes or no) and the type of journal (orthodontic and non-orthodontic) was detected with higher odds of reporting a sample size calculation in orthodontic journals than in non-orthodontic journals (3.04; 95% confidence interval, 1.4-6.59; P < .01). The findings of this study highlighted that the undertaking of optimal sample size calculations in longitudinal orthodontic trials is being underused. Greater awareness of the variables required for undertaking the correct sample size calculation in these trials is required to reduce suboptimal research practices.ABSTRACT
Objectives
Materials and Methods
Results
Conclusions
INTRODUCTION
Randomized controlled trials (RCTs) are considered the gold standard study design to assess the clinical efficacy and effectiveness of dental and medical health care interventions. For the results of such trials to be considered as trustworthy, the methodology of the study should be clear, transparent, and reproducible.1 A critical step in the design of the RCT is calculating a sample size that provides the study with adequate power to confirm either the presence or absence of a clinically relevant effect.2–4 The importance of trials having an appropriate sample size is further highlighted by the fact that overestimation of the calculated required sample size can lead to additional study costs, resource waste, and potentially result in participants being exposed to ineffective/harmful treatments.5,6 Alternatively, underestimation of the sample size can result in small trials that are considered less reliable and likely to report equivocal results attributed to a lack of study power to detect differences between intervention groups.7,8
To understand the effect of interventions over time within the same individual, longitudinal orthodontic trials involving the repeated measurement of outcomes over several time points are commonly conducted. From a methodological perspective, longitudinal data analysis strengthens any observations of correlations arising from multiple outcome measurements observed within the same patient and crucially helps to differentiate between intraparticipant and interparticipant variability.9 The additional benefits of longitudinal data analysis also include increased study power10 and more efficient management of missing data.11
Typically, the following variables are required to undertake a sample size calculation for a continuous outcome: the mean response for the treatment and control groups, the standard deviation (SD) within each group, the study power (usually set between 80%–90%), and α error (usually assumed as either 1% or 5%).12 However, sample size calculations for longitudinal studies require the consideration of additional variables as the outcome is measured on multiple occasions. These considerations include the number of measurement occasions, the correlation among the repeated measurements, and the research hypothesis to be tested. For example, in a one-group design, the hypothesis is that the change in the outcome is nonzero; in a multigroup design, the hypothesis is that the change across time is different between the groups (a group × time interaction13,14; Table 1).

The Consolidated Standards of Reporting Trials statement aims to enhance transparency in both the reporting and conduct of RCTs.15 It has been reported that recommendations facilitate both the accurate assessment of the quality of the study and the correct interpretation of the results.6 Despite this, concerns regarding the optimal reporting of sample size calculations within the literature have been raised. Within both dental and orthodontic RCTs, the completeness of sample size calculations has been reported at 29.3%16 and 29.5%,12 respectively, and appears to be influenced by certain trial characteristics.16 However, it is unknown whether these same issues exist with the longitudinal design. The aim of this investigation was to assess whether optimal methods have been used for sample size calculations and the completeness of the reporting of the required information for sample size calculations undertaken in longitudinal orthodontic trials.
MATERIALS AND METHODS
Eligibility Criteria
Orthodontic trials published between January 1, 2017, and December 30, 2020, were sourced. An orthodontic trial in this study was defined as either an RCT or a controlled nonrandomized clinical trial (CCT) of longitudinal design with three or more outcome collection time points and with two or more treatment intervention arms. Non-orthodontic studies and studies in a language other than English were excluded.
Search and Selection of Studies
A search of a single electronic database (Medline via PubMed) was undertaken by one author (Dr Mheissen) in August 2021 using a search strategy and filters (Supplemental Appendix 1). Initial screening of potentially eligible studies was performed independently by two authors (Drs Mheissen and Khan). During the selection of articles, two authors (Drs Mheissen and Khan) scrutinized the full text of the selected studies against the inclusion criteria. In the presence of any disagreements, a third author (Prof Pandis) was consulted, and a consensus was reached after discussion.
Data Extraction
All study characteristics were extracted independently by two authors (Drs Mheissen and Khan), whereas the information relating to the sample size calculation was extracted after calibration by a single author (Dr Mheissen) and entered into a prepiloted data-collection sheet (Microsoft Excel®, Redmond, Wash). At the trial level, the following characteristics were extracted: continent of the first author (Europe, Americas, or Asia and other), year of publication, journal type (orthodontic or non-orthodontic), study type (RCT, non-RCT), study design (split mouth, parallel, and crossover), and number of trial arms. The variables required for the sample size calculation for longitudinal studies, which have been previously described,13 are shown in Table 1. In addition, the number of groups used in the calculation, number of repeated measurements, software package used for sample size calculation, statistical tests used for sample size calculation, value of the correlation between repeated measures, and involvement of a statistician were recorded. The involvement of statistician was inferred if there was any mention in the full text of a “statistician” or “data analyst.”
Statistical Analysis
Descriptive statistics were generated for the included studies. Tabulations and inferential statistics using logistic regression were applied to detect associations between the reporting of a sample size calculation (yes or no) and the study characteristics. All statistical analyses were conducted using Stata 16.1 (StataCorp, College Station, Tex).
RESULTS
After excluding the non-orthodontic studies and studies with fewer than three time points, 147 studies were included for full data extraction (Figure 1). In this cohort, 134 (91.2%) were RCTs and 13 (8.8%) were CCTs; 69.4% were parallel designs, whereas 25.6% were split-mouth designs. The greatest number of trials was published in 2019 (33.3%) and in orthodontic specialty journals (61.9 %). The median number of time points was four (range, 3–22). Approximately 90% of the studies reported a 1:1 treatment arm allocation ratio. The median number of authors was five (range, 1–11). The sample size was reported in 75.5% trials (n = 111). In this cohort of 111 trials, 90.1% calculated the sample size based on 5% α error. In 60.4% of these calculations, the selected value for study power was 0.8. The mean difference and SD values were not described in 38.7% of trials. In most trials, the following variables were not reported: number of groups (73.8%), number of measurements (94.6%), software package used (67.6%), statistical tests used for sample size calculation (76.6%), and the value of the correlation between repeated measures (97.3%) (Table 2). Overall, for the study sample, the medians of the effect size (ES), mean difference, and risk difference were 0.55 (interquartile range [IQR], 0.4), 1 (IQR, 4.5), and 0.2 (IQR, 0.23), respectively. The reporting of the involvement of a statistician or statistical advisor was 15.3% and 10.8%, respectively. An association between reporting of a sample size calculation (yes or no) and the type of journal (orthodontic and non-orthodontic) was detected, with the odds of reporting a sample size calculation in orthodontic journals 3.04 times higher compared with non-orthodontic journals (95% confidence interval [CI], 1.4–6.59; P < .01; Table 3). Importantly, none of the trials (0/111) included in this sample were judged to have reported the complete variables to undertake a sample size calculation in longitudinal orthodontic trials.



Citation: The Angle Orthodontist 92, 3; 10.2319/091321-707.1


DISCUSSION
The benefits of assessing the effect of interventions over time within the same individual include an increase in study power,9,10 efficient management of missing data,11 and increased confidence in any observations of correlations arising from multiple outcome measurements. However, in relation to the variables (Table 1)13 required to undertake the sample size calculation in longitudinal orthodontic studies, it appears that this calculation is not evident in any of the trial reports included in this study. This leads to the question of whether optimal study design methods are being employed in these study types.
Of the trials, 75.5% (n = 111) reported the sample size calculation. Interestingly, the reporting of sample size calculation was higher in orthodontic specialty journals. Most of the included trials (76.6%) did not report the test used for data analysis, whereas only 3.6% reported using repeated-measures ANOVA and 0.9% reported using ANCOVA. Both tests are considered suitable analyses in longitudinal study designs. Consistent with previous investigations of sample size calculation reporting in orthodontic trials,12 the most frequent α error value and study power were 0.05 and 0.8, respectively. Of the included trials, 22.5% reported using only a random value of the ES, which is not based on mean or risk difference17; however, increasing the ES value may result in decreasing the sample size.18 As such, the ES should be calculated from a previous study rather than reporting an arbitrary value of ES.
Despite the perceived benefits of longitudinal study designs, sample size calculations for such study designs present specific challenges as repeated measurements over time from the same individual within a study are correlated. Fundamentally, an assumption has to be made regarding the expected correlation pattern in the repeated measurements and a software package has to be used to allow management of such patterns.19 The first consideration in the sample size calculation is the selection of the appropriate hypothesis of the study. Two possible options that can be considered, although not limited to these, are (1) treatment-by-time interaction hypothesis, which tests whether the trend of the response variable across time is the same between the intervention groups, and (2) main effect hypothesis, which tests the effect of the particular predictor variable averaged across other factors.13
The next challenge relates to the variances (or SDs) and the correlations that are expected for the repeated measurements. For instance, for four repeated measurements, four variance or SD values and six correlation values (nc) are required: nc = P × (p – 1) ÷ 2, where p is the number of measures.19 Importantly, to calculate an accurate sample size, these values should be closely matched to the values expected to be observed in the data.20 Although correlations of the measures are crucial in this calculation and ranged between 0.4 to 0.6,21 only three included trials reported constant values of correlation, whereas none of the included studies reported declines of the correlation over time, indicating no use for the rule-based pattern. The rule-based pattern requires two correlation parameters: the base correlation and the decay rate, so that not all software packages are suitable to calculate sample sizes in repeated-measures designs as it is imperative that the software can support these assumptions.13 Examples of software packages that are capable of performing this task include PASS (NCSS statistical software, Kaysville, Utah) and the General Linear Mixed Model Power and Sample Size website (GLIMMPSE), University of Colorado Denver, Colo.22 Interestingly, in this study, only one trial cited the use of an appropriate software package being used in the sample size calculation with no reporting of correlation values. On this basis, it could be inferred that there is a lack of awareness of the implications, requirements, and importance of sample size calculation of longitudinal orthodontic studies.
It is unclear whether the current findings represent a lack of reporting or lack of understanding by researchers of the variables required in the sample size calculation of longitudinal studies. Regarding the lack of reporting, trial authors were not contacted to clarify the sample size calculation undertaken. In addition, journal word count limitations may have also precluded the complete reporting of sample size calculations in longitudinal studies; however, online appendixes can resolve the word count limitation. The importance of evidence-based practice has been emphasized in the literature, but further education, support, and training should be provided, especially during the study design stage to circumvent these issues.23 In the current sample, it appeared that the inclusion of a statistician or methodologist was not routinely employed. This was surprising as the inclusion of a statistician at the trial design stage could be beneficial because this has been correlated with improved reporting of sample size calculations.16 There was evidence that orthodontic studies published in non-orthodontic journals have lower odds of reporting sample size calculations. Perhaps this could be an indication that studies of lower methodological rigor find their way to publication in journals outside orthodontics.
The initial search for relevant studies was undertaken by a single author, which may have introduced selection bias. This is also compounded by the fact that only one database was searched, studies published in a 4-year timeframe were screened, and studies were limited to those published in the English language only. These factors could have resulted in studies that met the eligibility criteria not being identified, hence leading to an underestimation or overestimation of the issue. However, the aim of this investigation was to raise awareness of the problem and not to provide exact estimates of sample size calculation in longitudinal studies. Despite the possible limitations in estimating the problem, the evidence not in favor of optimal sample size calculations in longitudinal studies was overwhelming. To reduce bias, screening of potentially eligible studies, selection of studies, and data extraction were performed independently by two authors with the assistance of a third author to resolve any disagreements.
CONCLUSIONS
-
The findings of this study highlighted that the undertaking of optimal sample size calculations in longitudinal orthodontic trials is being underused.
-
Greater awareness of the variables required for undertaking the correct sample size calculation in these trials is required to reduce suboptimal research practices.

RCT identification flow diagram (N = 147).
Contributor Notes