Reproducibility of Characteristics Assessing the Occlusion of Young Adults
The aim of the present investigation was to analyze the reproducibility in the assessment of six morphological and three functional characteristics included in a new method evaluating the occlusion in young adults. These characteristics comprised coincidence of midlines, overjet, overbite, canine relationship, crossbite, scissors bite, recurrent deviation on opening, guided lateral excursions, and discrepancy between the centric relation and the intercuspal position. The study was conducted in three stages: (1) five observers assessed the occlusions of five volunteers, (2) seven observers assessed nine volunteers, and (3) five observers assessed nine volunteers. Two calibrated orthodontists were used as references. For numerical variables, the nonparametric method for repeated measurements (Friedman's test) was used to test the significance of differences, while the proportion of agreement was calculated for categorical assessments. The results were analyzed using two precision levels: within a measurement unit/the same category and an acceptable/nonacceptable dichotomy. The magnitude of systematic differences was small and of minor clinical importance except in measurements of recurrent deviation on opening. The proportional agreement for acceptance was good in the assessment of overjet, coincidence of midlines, crossbite, scissors bite, open bite, and discrepancy between the centric relation and the intercuspal position. Moderate agreement was achieved in the assessment of overbite, canine relationship, recurrent deviation on opening, and guided lateral excursions. Among the nonacceptable cases, the agreement ranged from poor to good. The results indicated that noncalibrated observers assess categorical characteristics inconsistently.Abstract
INTRODUCTION
Occlusal classifications are descriptive tools used by orthodontists and craniofacial biologists for clinical and research purposes. The usefulness of these classifications has, however, been questioned, mainly because they are found to give inconsistent results.1–3
The reproducibility of classifications has been tested both in clinical settings4–8 and using patient records, such as facial and dental photographs, radiographs, or dental casts.2,9–13 In some studies, clinical data have been combined with data obtained from study models.14,15 Examiners have variously comprised orthodontists,2,3,11–14 orthodontists and other specialists,10 TMD specialists and auxiliary personnel,7,8,16,17 or general practitioners.4,9 In general, the results have shown low consistency in assessments of the tested characteristics,2,4,5,7,11–14,17 but there are findings indicating that specialists can reach an acceptable level of agreement in the assessment of morphological characteristics.3,10 Of functional assessments, on the other hand, only maximal mouth opening has frequently shown high reproducibility.5,6,8,16–20 While some investigators have reported that training and calibration of the examiners results in a high level of agreement,3,7,8,16,17,21 others are sceptical and suggest that these will have only a minor impact on reproducibility.10,14
In Finland, free dental care, including orthodontics, is provided on a population basis up to 18 years of age. The health care system is showing increasing interest in the effectiveness, quality, and efficiency of orthodontic treatment, but there are no satisfactory tools that could be applied in occlusal evaluations. Our research group has been developing a method that could be used to assess the occlusions of young adults when studying the targeting and outcome of orthodontic care. A group of specialists in orthodontics and stomatognathic physiology has selected a set of morphological and functional characteristics that would meet the requirements of the health care system and orthodontic professionals in Finland.22 The aim of the present study was to analyze the reproducibility of the assessment of the selected characteristics.
MATERIALS AND METHODS
The investigation was conducted in three stages. In the first stage, five orthodontists examined five orthodontically treated volunteers. In the second stage, seven observers (three orthodontists and three orthodontically experienced and one inexperienced general practitioner) examined nine orthodontically treated volunteers. In the third stage, five observers (three orthodontists and one experienced and one inexperienced general practitioner) examined a group of nine volunteers including both orthodontically treated and untreated individuals. The examinations were carried out during routine orthodontic follow-up visits or annual dental examinations. In all stages, the volunteers were rated in a random sequence and informed consent was obtained from all of them.
The reproducibility of the assessment of six morphological and three functional characteristics was evaluated. These characteristics were selected using a modified Delphi process. For each characteristic, a group of specialists in orthodontics and stomatognathic physiology had defined a demarcation line for an acceptable–nonacceptable dichotomy. Overbite, canine relationship, crossbite, scissors bite, and guided lateral excursions were assessed categorically, while numerical measurements were taken for the coincidence of the facial midline and midline of the upper dental arch, overjet, recurrent deviation on opening, and discrepancy between the centric relation (CR) and the intercuspal position (ICP) (Table 1). The CR was defined according to Dawson23 as “the relationship of the mandible to the maxilla when the properly aligned condyle–disk assemblies are in the most superior position against the eminentia, irrespective of tooth position or vertical dimension.” Before each stage, all assessment procedures were demonstrated, and detailed instructions were given to the observers. To achieve the CR, a bimanual manipulation technique of the mandible23 was used during the demonstration. However, the use of this technique was not insisted on; the observers were allowed to use their own methods.

Two orthodontists, who participated in all stages of the study, were calibrated for the assessment of the chosen criteria. During a training session, they independently evaluated 20 dental casts. In case of a disagreement, the cast was reevaluated and the source of disagreement was discussed. Thereafter, both observers clinically assessed the occlusions of 20 randomly selected adolescents. The first five adolescents were assessed together and their recordings were excluded from the analyses. Calibration of other observers was not performed.
Statistical analyses
For the numerical variables, the disagreement between the observers concerning each volunteer was quantified by calculating the average of absolute values of the differences between every pair of observers. The percentage of pairs in which the absolute value of the difference was not more than 1 mm was also calculated. Because the comparisons concerned five to seven observers at the same time and because it was not found appropriate to assume that the distributions of the measurements were normal distributions, the nonparametric method for repeated measurements (Friedman's test) was used to test the significance of differences.24 P-values of less than .05 were interpreted as statistically significant.
For categorical assessments, the proportion of agreement was used to avoid the pitfalls inherent in the intraclass correlation and the kappa coefficient.21,25,26 Clinically, it is often relevant to be aware of the agreement for both the acceptable and the nonacceptable classifications, especially if there is a low number of observations in one of the categories. Statistical computing was performed using the SAS System for Windows, release 8.1/2000.
RESULTS
At the dichotomous level, the proportion of agreement for acceptance among all observers ranged from moderate to good, while that for the nonacceptable category varied between poor and perfect (Tables 2 through 5). In the acceptable category, the orthodontists achieved a good level of agreement for all numerical variables (Tables 2 and 4).

Although systematic differences were found in numerical measurements among both orthodontists and general practitioners, these differences were of minor clinical importance except in measurements of recurrent deviation on opening. According to this criterion, only 0–22% of volunteers were found to be within one measurement unit (1 mm) by all observers. Further, the mean of the average differences (calculated from the absolute values of differences) was more than twice that of the other criteria (Table 4). Even the measurements made by the calibrated orthodontists indicated a systematic difference at the level of calibration (P = .04). Their measurements fell within one measurement unit in 55% of all examined volunteers (n = 38).

DISCUSSION
In many Finnish health centers, general practitioners, under the supervision of an orthodontist, carry out screening of malocclusions and simple treatment procedures.27 In these cases, a satisfactory level of agreement between the orthodontists and general practitioners is of importance. All orthodontists participating in our study were familiar with the assessments, and their agreement level was considered to represent the level that could be achieved through training. The accuracy of measuring was set to 1 mm, which was considered adequate for measurements taken directly from the mouth. For a number of reasons, the study was conducted in several stages, with relatively few observers participating in each stage. As the assessment took about 6–7 minutes/observer, we suspected that a larger number of repeated examinations could have affected the volunteers' functional status and distorted the results. Furthermore, the time available for the assessment was limited because it took place during an orthodontic follow-up visit or an annual dental examination. This design made it possible to study samples of both orthodontically treated and untreated occlusions and enabled the inclusion of observers with varying orthodontic backgrounds.
Of all assessments, the widest variability was found in measurements of recurrent deviation on opening. This finding is in line with earlier studies, in which the reproducibility of categorically assessed jaw opening patterns has ranged from poor to good.16–18,20,28 It is possible, however, that the high variation in recurrent deviation on opening does not reflect differences in technical management but rather exemplifies the instability of the characteristic. 8,17,18,28
As in earlier studies,2,3,11,13,16,17 the classification of canine relationship was found to be ambiguous. Given that the sagittal measurements were reproduced with high precision, it is unlikely that the observed discrepancies in canine classification could be assigned to variation in mandibular position. Instead, it is possible that not all observers were familiar with applying the Angle's classification to canines. It is also possible that the observers did not use the same viewing angle when assessing the buccal segment occlusion,29,30 which might explain some of the observed variation. In borderline cases, the differences may have arisen from judgmental variation31 based on differing interpretations of Angle's classes. Practical training, together with clear instructions and well-defined demarcation lines, would probably increase the reproducibility of the classification of this characteristic.
When measured in millimeters, overbite has been shown to have good reproducibility.17 In line with the present results, the agreement in categorical assessments has varied between moderate and good.3,10,13 However, in our study, the percentages of exact agreement (within the same category) indicated a wider variability than was found by Keeling et al.3
CONCLUSIONS
The agreement among all observers concerning the acceptable category was good in the assessment of overjet, coincidence of midlines, crossbite, scissors bite, open bite, and discrepancy between the CR and the ICP. Moderate agreement was achieved in the assessment of overbite, canine relationship, and guided lateral excursions.
In the nonacceptable category, the variability in agreement may partly reflect the low number of observations in this group.
Exact agreement in categorical assessments was highly variable.
The reproducibility of measurements of recurrent deviation on opening was poor, as described by the relatively high mean of the average absolute differences and by the low percentage of pairs within one measurement unit.


Contributor Notes
Corresponding author: Anna-Liisa Svedström-Oristo, Institute of Dentistry, University of Turku, Lemminkäisenkatu 2, FIN 20520 Turku, Finland (anlisve@utu.fi)