Prognosis Prediction for Class III Malocclusion Treatment by Feature Wrapping Method
Objective: To use the feature wrapping (FW) method to identify which cephalometric markers show the highest classification accuracy in prognosis prediction for Class III malocclusion and to compare the prediction accuracy between the FW method and conventional statistical methods such as discriminant analysis (DA).
Materials and Methods: The sample set consisted of 38 patients (15 boys and 23 girls, mean age 8.53 ± 1.36 years) who were diagnosed with Class III malocclusion and received both first-phase (orthopedic) and second-phase (fixed orthodontic) treatments. Lateral cephalograms were taken before (T0) and after first-phase treatment (T1) and after second-phase treatment and retention (T2). Based on the measurements taken at the T2 stage, the patients were allocated into good (n = 20) or poor (n = 18) prognosis groups. Forty-six cephalometric variables on T0 lateral cephalograms were analyzed by the FW method to identify key determinants for discriminating between the two groups. Sequential forward search (SFS) algorism and support vector machine (SVM) were used in conjunction with the FW method to improve classification accuracy. To compare the prediction accuracy of the FW method with conventional statistical methods, DA was performed for the same data set.
Results: AB to mandibular plane angle (°) and A to N-perpendicular (mm) were selected as the most accurate cephalometric predictors by both the FW and DA methods. However, classification accuracy was higher with the FW method (97.2%) compared with DA (92.1%), because the FW method with SFS and SVM has a more precise classification algorithm.
Conclusions: The FW method, which uses a learning algorithm, might be an effective alternative to DA for prognosis prediction.Abstract
INTRODUCTION
Skeletal Class III malocclusion occurs because of undergrowth of the maxilla, overgrowth of the mandible, or both.1 Furthermore, if patients also have an anterior crossbite during the growth period, skeletal discrepancies can be worsened.2 Although it is difficult to completely change individual growth patterns,3 first-phase (orthopedic) and/or second-phase (fixed orthodontic) treatments can provide a better treatment outcome in cases in which there is a favorable growth potential (Figure 1A). In contrast, if patients have an unfavorable growth potential, the problem cannot be corrected with first- and/or second-phase treatments (Figure 1B). Therefore, predicting the prognosis for Class III malocclusion at the diagnostic stage is important for choosing an effective treatment plan.



Citation: The Angle Orthodontist 79, 4; 10.2319/071508-371.1
Numerous studies have been performed to identify better methods of prognosis prediction for Class III malocclusion.4–18 Although discriminate analysis (DA) and the logistic regression methods have been used to investigate cephalometric predictors, it is difficult to find such predictors with consensus and high accuracy. In addition, obtaining a sufficient volume of patient data and long-term follow-up results remains a challenge. Thus, a new methodology that can improve prediction accuracy based on a relatively small amount of patient data is needed.
The feature wrapping (FW) method employs a learning algorithm that can evaluate every set of features generated from original features in subjects and select the subset of features that show the highest classification accuracy. FW works by identifying a small subset of necessary and sufficient features that can serve as input for the underlying predictor method.19 In the present study, the sequential forward search (SFS) algorithm and support vector machine (SVM) were used to improve classification accuracy in conjunction with the FW method.20 Indeed, SVM is widely used in a variety of fields in the medical sciences. For example, Bullinger et al21 successfully used SVM to discriminate between healthy control subjects and patients suffering from breast cancer by extracting nucleosides in urine samples. Likewise, Kawai et al22 used SVM to predict the pleiotropic effects of drugs, while Judson et al23 used SVM to classify chemical toxicities.
The purposes of this study were, therefore, to identify which cephalometric markers show the highest classification accuracy in prognosis prediction for Class III malocclusion by the FW method and to compare the prediction accuracies between the FW method and conventional statistical methods such as DA.
MATERIALS AND METHODS
The sample set consisted of 38 patients (15 boys and 23 girls, mean age 8.53 ± 1.36 years) who were diagnosed with Class III malocclusion and received both first-phase (orthopedic) and second-phase (fixed orthodontic) treatment at the Department of Orthodontics, Seoul National University Dental Hospital (Seoul, Korea). The first-phase treatment included orthopedics such as a chin cup or face mask with rapid palatal expansion according to the skeletal pattern. A chin cup or face mask was used for 12 to 14 hours per day with a force of 300 to 500 g per side. After orthopedic treatment, all subjects were treated with fixed appliances and preadjusted brackets. Fixed lingual retainers for the upper and lower anterior teeth, as well as a removable retainer for upper dentition, were used for retention. Demographic data are described in Table 1.

Inclusion criteria for this study was as follows: (1) existence of an anterior crossbite at the initial state that was corrected by first-phase treatment, (2) use of fixed appliance therapy for second-phase treatment, (3) follow-up performed until little craniofacial growth remained, and (4) lack of congenital deformities such as a cleft lip and palate.
Subjects were allocated into two groups according to final occlusal status. The good prognosis group (group 1) consisted of subjects who maintained favorable occlusal status with a normal overbite (>1.4 mm) and overjet (>2 mm). On the contrary, subjects who experienced relapse of the anterior crossbite (overjet <0 mm) were classified into the poor prognosis group (group 2).
Lateral cephalograms (magnification factor = 10%) were taken before (T0) and after first-phase treatment (T1) and after second-phase treatment and retention (T2). The mean treatment time between T0 and T1 was 3.19 ± 1.78 years and between T1 and T2 was 5.97 ± 1.59 years. The mean age between the two groups during the three stages did not differ significantly. Fifteen landmarks and 46 skeletal and dental variables were used in this study (Figure 2; Table 2). Method errors were calculated by Dahlberg's formula,24 ME = Σ d2/2n, where Σ d2 is the sum of the squared differences between the two mean values, and n is the number of double measurements. The method errors for linear and angular measurement were not statistically significant and did not exceed 0.6 mm and 0.8°, respectively, for any variables.



Citation: The Angle Orthodontist 79, 4; 10.2319/071508-371.1

For prognosis prediction, the FW method with the SFS algorithm25 and SVM 222627 and principal component analysis (PCA)28 were used.
PCA was used to examine the characteristics of the patients with good and poor prognosis. To simplify a data set, PCA transforms features in a multivariate data set into salient features that are not correlated with each other. Therefore, features representing patient samples can be reduced to a smaller number of features that are referred to as the principal components (PC). The largest variance for the data set is set as the first axis (the first PC) in the coordinate system. Likewise, the second greatest variance is set as the second axis (the second PC), and so on. Therefore, PCA has the unique capacity of being able to perform an optimal linear transformation that maintains the subspace with the largest variance.
The SFS algorithm for feature selection works as follows25: first, it chooses the single best feature that has the highest classification accuracy. Next, it forms all possible two-dimensional feature vectors that contain the best feature from the first step and chooses the best two-variable feature set that has the highest classification accuracy. This process then continues until a prespecified criterion is met, such as the dimension of the feature vector. The selected cephalometric markers were used to build the optimal classifier for discriminating between the patient samples. After completion of the feature selection step, the selected key markers were applied to the prediction module to generate the optimal classifiers that were the best predictors of prognosis.
The SVM method was used as a classifying and learning algorithm for the development of an FW method to identify cephalometric variables and prediction modules (Appendix). The classification accuracy of the model was estimated as the ratio of the number of correctly classified samples in all of the generated test samples from the leave-one-out cross-validation (LOOCV) procedure, divided by the total number of patient samples (Table 3; equation 1). In LOOCV, one test sample is extracted from a total of n samples. This test sample is then used for computing the classification accuracy of the remaining n − 1 training samples, and this process is repeated n times. By viewing the input data as two sets of vectors in an n-dimensional space, an SVM is able to construct a separating hyperplane in that space, one that maximizes the margin between the two data sets. To calculate the margin, two parallel hyperplanes, one on each side of the separating hyperplane, are pushed up against the two data sets. Intuitively, good separation is achieved by the hyperplane that has the largest distance to the neighboring data points of both classes. A tutorial on SVMs has been produced by Burges,26 and a comparison of SVM to other classifiers has been made by Meyer et al.27

RESULTS
Principal Component Analysis
By applying PCA to the feature matrix of patient samples, the features were projected onto a three-dimensional coordinate system composed of PCs 1, 2, and 3. Figure 3 illustrates the distribution of patient samples on this coordinate system. From this plot, we expected to develop a classifier that could distinguish between good and poor prognosis samples. These two classes can be discriminated more effectively by the FW method.



Citation: The Angle Orthodontist 79, 4; 10.2319/071508-371.1
FW Method
The SVM and SFS algorithms combine all possible two-dimensional feature sets into the best single feature and continue to search for an optimal subset that has the highest classification accuracy. Regarding prediction accuracy for classifying the 38 patient samples, the best accuracy that was achieved with a single feature (AB-MP) was 76.32%, while prediction analysis with two features (AB-MP and A-N perp) produced an accuracy of 97.37% compared with the LOOCV set (Figure 4; Table 4). Therefore, the highest classification accuracy was obtained using only two features, AB-MP and A-N perp.



Citation: The Angle Orthodontist 79, 4; 10.2319/071508-371.1

The results of two-dimensional plotting of AB-MP and A-N perp features (Figure 5) indicated that most patients with good or poor prognosis for Class III malocclusion treatment could be clearly separated based on the values of these features. This finding implies that high classification accuracy was achieved, although a small number of features were used for the FW approach.



Citation: The Angle Orthodontist 79, 4; 10.2319/071508-371.1
To compare our method with a conventional statistical method, stepwise DA was performed for the same data (Table 5). Although the same two variables, AB-MP and A-N perp, were selected from stepwise selection, the classification accuracy was 92.1%, which was lower than for the FW method (97.2%).

DISCUSSION
Several studies on prognosis prediction of treatment for Class III malocclusion have identified various predictors selected from statistical methods.414–1618 Among these predictive variables, Yang and Kim4 presented the Björk sum, gonial angle, and occlusal plane to AB plane angle; Ko et al14 presented the lower incisor to occlusal plane angle and AB to mandibular plane angle, among others; Baccetti et al15 presented the mandibular ramus, cranial base angle, and mandibular plane angle; Moon et al16 presented the AB to mandibular plane angle and A-N perp; and Ghiz et al18 presented the gonial angle and mandibular length. However, despite these studies, a consensus predictor for treatment of Class III malocclusion has been elusive because of the difficulty in the collection of long-term follow-up samples, different grouping criteria, relatively low classification accuracies, and diversified variables.
In this study, cephalometric variables were investigated to build an optimal classifier for discriminating between patient samples. AB-MP and A-N perp were selected as prognosis predictors, which is in accordance with the study by Moon et al.16 AB-MP is a variable used to describe the relationship between the anterior border of the maxillary and mandibular alveolar bone and the mandibular plane (Figure 6). A low AB-MP value indicates that the skeletal pattern is hyperdivergent and that the degree of mandibular prognathism is severe (Figure 7). Therefore, treatment results with a low AB-MP can be predictive of poor prognosis. It also implies that although face mask or chin cup therapy can correct an anterior crossbite, poor prognosis can still be determined according to continuous mandibular growth.



Citation: The Angle Orthodontist 79, 4; 10.2319/071508-371.1



Citation: The Angle Orthodontist 79, 4; 10.2319/071508-371.1
The second predictor, A-N perp, describes the anteroposterior position of the maxilla (Figure 6) and plays a decisive role in prognosis prediction because the anteroposterior position of the maxilla relative to the mandible is important. The percentage of Class III malocclusions due to retrusive point A in Koreans (18%) is lower than in whites (57%).2930 Therefore, A-N perp was selected by a second SFS step following AB-MP (Table 4); however, the importance of A-N perp in samples of white patients will need to be investigated to verify this result.
Conventional statistical methods such as DA can also be used to analyze the data in this study; however, incorporation of new data into DA (Table 5) showed lower accuracy than the SVM method (Table 4). The SVM and SFS algorithms are a set of related and supervised learning methods used for classification and regression. Because they can minimize empirical classification errors and maximize the geometric margin, they are also known as maximum margin classifiers. Therefore, unlike conventional methods, they can produce the same or better classification accuracy when new data are inputted, thereby creating a valuable diagnostic program for prognosis prediction.
The FW method, as described here, may be an effective tool for prognosis prediction of treatment for Class III malocclusion. However, because of several limitations of this study, such as insufficient accumulation of long-term cases and lack of a multicenter analysis for evaluating ethnic differences, further studies are needed before the results of this study can be used in a clinical setting.
CONCLUSIONS
-
AB-MP and A-N perp were selected as the most accurate cephalometric predictors for prognosis prediction of Class III malocclusion by both FW and DA methods.
-
The classification accuracy was higher with the FW method (97.2%) than with DA (92.1%) because of the FW method's sophisticated classification and learning algorithm.
-
FW might be an effective alternative to DA for prognosis prediction.


Lateral cephalograms before the first-phase (orthopedic) treatment (solid line) and after the second-phase (fixed orthodontic) treatment and retention (dotted line). (A) Good prognosis group. (B) Poor prognosis group

Cephalometric landmarks used in this study: 1, sella; 2, nasion; 3, porion; 4, orbitale; 5, articulare; 6, anterior nasal spine; 7, posterior nasal spine; 8, point A; 9, point B; 10, pogonion; 11, menton; 12, gonion; 13, incisal tip of upper central incisor; 14, incisal tip of lower central incisor; 15, point between the tips of the mesiobuccal cusps of all fully erupted upper and lower permanent first molars or deciduous second molars in primary dentition

Three-dimensional principal component analysis plot of the data from the 38 patients. Circles indicate good prognosis; stars, poor prognosis

Classification accuracy for the 38 patients that was achieved with feature wrapping (SVM and SFS). SVM indicates support vector machine; SFS, sequential forward search algorithm

Two-dimensional plot of the 38 patient samples with two selected features, AB to mandibular plane angle (AB-MP, °) and A-N perp (mm). Circle indicates good prognosis; star, poor prognosis; x, values of AB-MP; y, values of A-N perp

The most significant variables. 1, AB-MP (°); 2, A-N Perp (mm)

Relationship between AB-MP and skeletal pattern. If the mandibular plane is steep (A) or point B is anteriorly positioned (B), the patient has a low AB-MP value
Contributor Notes
Corresponding author: Dr Seung-Hak Baek, Department of Orthodontics, School of Dentistry, Dental Research Institute, Seoul National University, Yeonkun-dong #28, Jongro-ku, Seoul, South Korea 110-768 (drwhite@unitel.co.kr)