Development of Cephalometric Norms Using a Unified Facial and Dental Approach
Objective: To develop a cephalometric determination of anteroposterior skeletal occlusion on the basis of a clinically rational “gold standard” and objectively determined cut points.
Materials and Methods: Pretreatment cephalograms from 10- to 18-year-old Caucasian patients with a normal vertical face dimension were digitized. Facial profile line drawings were judged by orthodontist raters as Class I, II, or III. Subjects who met all inclusion criteria were divided into Class I, Class II, and Class III on the basis of the matched skeletal (facial) and dental occlusion and comprised our gold standard for anteroposterior skeletal occlusions. Cephalometric variables included ANB angle, McNamara analysis, Harvold unit differential, anteroposterior dysplasia index (APDI), and Wits analysis. Half the sample was used to derive skeletal classification norms using receiver operator characteristic (ROC) curves, and half the sample was used to test for diagnostic ability and to compare the diagnoses based on traditional cephalometric norms with the new norms.
Results: Results of the study showed that ANB and McNamara analysis performed well with traditional and ROC-derived norms, whereas Wits, Harvold unit differential, and APDI showed fewer errors in diagnosis with ROC norms compared with traditional norms.
Conclusions: The use of a single set of diagnostic norms for each analysis to distinguish between the skeletal classifications for the 10- to 18-year-age group proved to be highly successful for each of the analyses and performed as well or better than when using traditional norms based on age and sex.Abstract
INTRODUCTION
Cephalometric evaluation of lateral headfilms developed empirically and grew mostly from a dental occlusion perspective,1 thereby resulting in the lack of an objective “gold standard” for anteroposterior skeletal dysplasia. When radiographic lateral skull cephalometry was applied to the facial skeleton, relationships of the craniofacial skeletal structures were quantified on the basis of “standard measurements” or “norms” for normal, acceptable, or ideal occlusions, facial types, or both.2
Contemporary orthodontic standards consider a cephalometric radiograph and its analysis as a necessary diagnostic record.3 However, most diagnoses are made on the basis of extraoral evaluation and occlusion, with little influence from the cephalometric evaluation.4 Few treatment plans are changed by cephalometric evaluation.56 In addition, cephalometric analyses frequently conflicted in diagnosis, such as Class II vs Class III skeletal patterns,7 leading to confusing results when comparing different analyses on the same patient. Changes based on growth and treatment as absolute or relative indications of change also are not concordant, leading to the conclusion that detected changes are as much related to the method of measurement as to the method of treatment.7
Even with attempted improvements,8 these problems raise fundamental questions of the validity and usefulness of cephalometrics as a diagnostic test and expressed the need of a true gold standard to validate cephalometrics in diagnosis. Conventional cephalometric analyses may disagree diagnostically because they look at different aspects of the skeleton, are complicated by other dimensions (eg, vertical), or had different sample characteristics in deriving their norms.
Han and Kim9 evaluated several diagnostic cephalometric appraisals of anteroposterior problems for patients classified by molar relationship. These different occlusal groups were analyzed by receiver operating characteristic (ROC) methodology, a statistical test that showed the diagnostic ability of a test. This was a more objective way to determine what cephalometric measurements were the most accurate. The ROC method detected the trade-offs between sensitivity and specificity for the cephalometric variables and determined a “cut point” (the cephalometric value) that distinguished between Class I and Class II and Class I and Class III skeletal relationships. A cut point was the value for each cephalometric analysis where one skeletal classification changes to another (eg, Class I changes to Class II). This method could determine a cut point between adjacent skeletal patterns that was maximized statistically.10
The purpose of this study was to (1) create a gold standard for cephalometric diagnosis based on clinically relevant features of matching skeletal and dental occlusions with normal vertical relationships, in an age range common for orthodontic treatment that represent the full range of facial types and occlusions; (2) derive ROC curves to objectively develop ideal cut points for five cephalometric analyses; and (3) apply the cut points to an independent sample to verify their usefulness compared with traditional measures.
MATERIALS AND METHODS
Following Institutional Review Board approval, lateral cephalograms of Caucasian patients between the ages of 10 and 18 years were selected from several university archives and private practices. Approximately 3000 cephalograms were screened, and those with concordant skeletal and dental relationships were digitized using Dentofacial Planner©. Reliability of cephalometric landmark identification was determined by the location of 20 cephalometric points on 10 different lateral cephalograms. The distance between each point and a predetermined landmark was measured using to the nearest 0.1 mm at a 2-week interval. Cephalograms were adjusted to the same magnification (8%) where necessary.
A total of 600 computer-generated line drawings of facial profiles were judged by six orthodontists as Class I, Class II, or Class III. The profile evaluations with ≥80% agreement by the orthodontist raters were included in the sample when the vertical facial proportions were normal on the basis of a cephalometric evaluation, using a validated algorithm.1112 Subjects had molar and overjet relationships, as determined by linear cephalometric measurement (see Table 2 for detail), consistent with the consensus skeletal profile assessment. These inclusion criteria resulted in Class I, N = 49; Class II, N = 53; and Class III, N = 53 patients with matching profiles, molar relationships, overjet, and normal skeletal vertical relationships.
The sample was randomly divided into two groups: the first group was used to derive cephalometric cut points (the “generation sample,” Table 1) and the second was used as an independent sample to validate the cut points (the “confirmation sample,” Table 2).


Linear and angular anteroposterior cephalometric analyses were performed:
-
A-N-B—the angle between A point, nasion, and B point (ANB)13;
-
McNamara analysis—the millimeter distance of A point to nasion perpendicular minus millimeter distance of pogonion to nasion perpendicular (MCTOT)14;
-
Wits analysis—the linear distance between A and B point perpendicular to the functional occlusal plane (Wits)15;
-
Anteroposterior dysplasia index (APDI)—the facial angle ± the A-B plane angle ± the palatal plane angle (APDI)16; and
-
Harvold unit differential—the mandibular unit length (temporomandibular joint to gnathion) minus maxillary unit length (temporomandibular joint to anterior nasal spine; Unitdif17).
The cephalometric diagnosis based on traditional norms were adjusted for age and sex when appropriate.14–18 In instances where standard deviation was not listed in the original publication, the standard deviation was estimated.19
Statistical analysis
Interclass correlation coefficient was used to calculate reliability of landmark identification. Means, standard deviation, and 95% confidence intervals were calculated for all cephalometric variables for both the generation and the confirmation samples. ROC curves were generated for both samples by plotting the sensitivity vs 1−specificity for each cephalometric variable, and cut points were determined statistically from the generation sample to maximize the number of correct classifications between Class I and Class II and Class I and Class III. The cut points were then applied to the confirmation sample, and kappa values, confidence intervals, area under ROC curve (AUC), and percentage agreement statistics were calculated to assess their diagnostic utility compared with the gold standard groups.
In addition, diagnoses based on traditional cephalometric norms were determined for the confirmation sample. The number of subjects in the confirmation sample, where the diagnosis using traditional norms or using ROC-determined cut points did not agree with the gold standard group, were calculated and considered “misclassifications.” The number of misclassifications for both the traditional and the ROC-generated cut points were compiled, and an exact binomial test was used to compare number of misclassifications between the two methods.
RESULTS
Reliability of cephalometric landmark identification was found to be good (ICC = 0.85 to 95%, CI = 0.795–0.89). Orthodontist raters in this study evaluated the same profile twice and reliability was good (range 0.70 to 0.92).20 Means, standard deviations, and 95% confidence intervals for all cephalometric variables are shown in Tables 1 and 2 for both the generation and the confirmation samples. Cut points derived from the generation sample distinguishing between Class I and Class II (1/2) and Class I and Class III (1/3) are shown in Table 3. The percentage agreement, kappa values, confidence interval, and AUC are shown for each of the cut points, as applied to the confirmation sample. The number of misclassifications are described and shown in Table 4, for each of the cephalometric variables. The area under the curve for each analysis is demonstrated in the ROC curves shown in Figure 1 (Class 1/2 interface) and Figure 2 (Class 1/3 interface).





Citation: The Angle Orthodontist 76, 4; 10.1043/0003-3219(2006)076[0612:DOCNUA]2.0.CO;2



Citation: The Angle Orthodontist 76, 4; 10.1043/0003-3219(2006)076[0612:DOCNUA]2.0.CO;2
DISCUSSION
Methodological issues: criteria and context
The goal of this study was to establish a clinically sound gold standard for contemporary anteroposterior cephalometric measures and to determine their cut points for classification on the basis of an objective statistical method. The process faced several methodological challenges. First, there is no gold standard for anteroposterior skeletal evaluation because no accepted test exists, which determines the presence or severity of anteroposterior skeletal dysplasia. Second, there needed to be an evaluation of the subjects that demonstrated that there were variations of normal in the anteroposterior and vertical dimensions. It was also important that cephalometric cut points were determined objectively, and the new standards generated must be subject to verification.
Evaluation of malocclusion traditionally has dealt with a range of normality and biological variation and not with “disease” vs “absence of disease,” a method that lacks distinct cutoffs between categories of skeletal classifications. Most practitioners diagnose using facial evaluation, and therefore, a gold standard was established using a skeletal diagnosis based on clinical profile evaluation that was verifiable. Subjects were included who had a profile that an 80% majority of orthodontic raters regarded as normal (Class I) and variations of normal (Class II and III). This increased the probability that the sample represented diverse skeletal categories. We also used clinical dental relationships (molar relationship and overjet), generally used in diagnosis.
Subjects were not included who may have borderline cephalometric values because the gold standard could not be established with certainty. Selection of “clear-cut” skeletal diagnosis for the sample should have made determination of the cut points simpler because the meaningful variables would be clustered at distinct points over a greater range. These methods appear to represent a clinically suitable and definable gold standard.
This study is the first to use ROC methodology on distinct diagnostic groups on the basis of clinical impression of skeletal (facial) and dental relationships. ROC is popular in health science research as a rigorous determination of the diagnostic ability of a test. It shows the limits of a test's ability to accurately discriminate between two different states of health.21 The different “states of health” in this study are the distinctions between skeletal classifications (cut points). Using the ROC calculations to create the division between the gold standards is an unbiased and precise statistical method to distinguish between the states of malocclusion. In this study, areas under ROC curves ranged from 0.86 to 0.98 and showed all cephalometric variables possessed reasonable ability to distinguish Class II and Class III patients from Class I, using the cut points derived from the generation sample.21 The large area under the curves in Figures 1 and 2 provide visual confirmation of this fact while remembering that a perfect test will occupy the entire graph and extend completely into the top left corner of a ROC graph.
The ROC-derived cut points between the gold standard skeletal relationships for the five cephalometric variables were applied to a second independent sample of gold standard subjects from which the cut points were not derived. With respect to the number of misclassifications in the confirmation sample, the ANB and McNamara analysis had similar errors when comparing traditional cephalometric norms' diagnosis and when using new ROC-derived cut points. In addition, neither analysis had any “serious” errors, misdiagnosing a Class II patient as Class III or vice versa. However, using a single cut point to distinguish between the gold standards created in this study performed as well as using age- and sex-related norms.
The Wits analysis had significantly fewer misclassifications, when the ROC-derived cut points were used, compared with traditional norms (P = .04). The cut points were slightly different than the traditional cutoffs, slightly higher for the Class I/II cut point (2.1) and lower for the Class I/III cut point (−4.0). The range was greater for Class I and expanded the normal category farther than the sample reported by Jacobson.15 However, the Wits analysis I/II cut point and I/III cut point did have a high AUC (0.98 and 0.96, respectively).
The APDI showed more overall misdiagnoses using traditional norms than the first three analyses. The traditional APDI had cut points that were significantly different from the ROC analysis, especially between Class I and Class III skeletal occlusions. This might be attributed to the method of sample selection. The APDI was performed on skeletal Class III occlusions on the basis of facial profile, where the traditional standard was based on molar occlusion, which is shown to have a weak relation to skeletal discrepancy.16 The range of values that were observed clinically for the APDI was much higher in this study. The APDI did have a large number of serious errors, four Class II gold standard group patients who were diagnosed as Class III, and one Class III gold standard group patient who was diagnosed as Class II. When the ROC cut points were used instead of the traditional norms, all but one was eliminated.
The Harvold Unit differential cut points between skeletal classifications were also affected by the ROC analysis.17 Traditional cut points for the age-related unit differential varied from 20 to 27 mm for the average Class I patient. Using ROC methodology, the cut point between Class II and Class I was 26.9 mm, which does fall in the Class I range for the unit differential in older adolescents, but is higher than most sample means. The Class III cut point was determined to be 32.7 mm, close to traditional cut points for a Class III diagnosis. These new cut points significantly reduced the number of misclassifications, and serious errors were reduced.
In summary, the ROC analysis improved the Harvold analysis, the APDI, and the Wits analysis at discriminating between the groups of gold standard skeletal occlusions. This method was especially useful in eliminating serious cephalometric diagnostic errors. On the other hand, this is not to imply that the cut points are infallible. For example, the Class I to Class III misclassifications improved, using the ROC method, whereas the Class III to Class I misclassifications worsened. If the cut points were ideal, both would have improved. Further refinement of the technique may reduce this shortcoming.
Because contemporary orthodontics places significant emphasis on treatment planning based on a patient's face rather than a few measurements on a cephalogram, it makes sense that skeletal measurements and the division between the three classifications should be derived, at least in part, from the clinical facial profile. Even with the variety of protrusion and retrusion superimposed on the interarch relationships in the samples, the simple set of cutoffs was successful. The fact that protrusion magnifies any anteroposterior relationships and that variability was accommodated successfully by this method speaks about its value, applicability, and robust nature.
These results demonstrate that instead of using a table of standards, where the cephalometric value changes for age and sex, one can use a set of cut points for each method for Caucasian males and females 10–18 years of age. These cut points will correctly diagnose as well as the traditional methods for two of the five measures testing in this project (ANB and McNamara analysis) and improve significantly for three others (Wits, Harvold unit differential, and APDI). Figure 3 displays an example of the usefulness of this method for a Class II patient. These findings simplify the diagnostic method and lead to fewer contradictions and eliminate confusion in practice during the diagnosis of patients.



Citation: The Angle Orthodontist 76, 4; 10.1043/0003-3219(2006)076[0612:DOCNUA]2.0.CO;2
CONCLUSIONS
-
Using facial profile, molar occlusion, and overjet as a gold standard for Class I, Class II, and Class III skeletal relationships is a verifiable and clinically relevant method.
-
When new cephalometric cut points were derived on the basis of the objective ROC curve method, the Wits, Harvold, and APDI showed improved accuracy compared with diagnosis based on conventional cephalometric norms.
-
The ANB angle and McNamara analysis performed well using both traditional and ROC-generated values in accurately diagnosing the gold standard relationships.
-
The method did show the usefulness and simplicity of using a single set of cephalometric cut points for an adolescent (age 10–18 years) group to distinguish between skeletal classifications.

ROC curve for the Class I and Class II determination. Area under ROC curve (AUC) for ANB angle (—) = 0.954; AUC for McNamara analysis (——) = 0.921; AUC for Wits analysis (–·–·– ) = 0.983; AUC for anteroposterior dysplasia index (–––) = 0.887; and AUC for Harvold differential (-----) = 0.861

ROC curve for the Class I and Class III determination. Area under ROC curve (AUC) for ANB angle (—) = 0.934; AUC for McNamara analysis (——) = 0.940; AUC for Wits analysis (–·–·– ) = 0.967; AUC for anteroposterior dysplasia index (–––) = 0.900; and AUC for Harvold differential (-----) = 0.861

This 13-year-old male has a clearly Class II convex profile, and was judged that way by the gold standard. The traditional cephalometric method results were mixed between Class I and Class II, whereas the ROC method consistently demonstrated Class II results
Contributor Notes
Corresponding author: Dr. Henry Fields, Section of Orthodontics, The Ohio State University, 4088F Postle Hall, 305 W. 12th Ave, PO Box 182357, Columbus, OH 43210 (fields.31@osu.edu)