Does artificial intelligence predict orthognathic surgical outcomes better than conventional linear regression methods?
To evaluate the performance of an artificial intelligence (AI) model in predicting orthognathic surgical outcomes compared to conventional prediction methods. Preoperative and posttreatment lateral cephalograms from 705 patients who underwent combined surgical-orthodontic treatment were collected. Predictors included 254 input variables, including preoperative skeletal and soft-tissue characteristics, as well as the extent of orthognathic surgical repositioning. Outcomes were 64 Cartesian coordinate variables of 32 soft-tissue landmarks after surgery. Conventional prediction models were built applying two linear regression methods: multivariate multiple linear regression (MLR) and multivariate partial least squares algorithm (PLS). The AI-based prediction model was based on the TabNet deep neural network. The prediction accuracy was compared, and the influencing factors were analyzed. In general, MLR demonstrated the poorest predictive performance. Among 32 soft-tissue landmarks, PLS showed more accurate prediction results in 16 soft-tissue landmarks above the upper lip, whereas AI outperformed in six landmarks located in the lower border of the mandible and neck area. The remaining 10 landmarks presented no significant difference between AI and PLS prediction models. AI predictions did not always outperform conventional methods. A combination of both methods may be more effective in predicting orthognathic surgical outcomes.ABSTRACT
Objectives
Materials and Methods
Results
Conclusions
INTRODUCTION
The number of patients who are willing to undergo combined surgical-orthodontic treatment has been increasing.1 Predicting surgical outcomes is crucial for planning treatment and achieving satisfactory results by visualizing postoperative changes. There have been numerous attempts to predict changes after orthognathic surgery for more than half a century (Table 1). At first, the correlation analysis between hard- and soft-tissue changes was applied to predict surgical outcomes, which was a simple one-to-one correspondence ratio.2–5 Still today, numerous commercial programs based upon simple correlation are available in the market for clinical use. Later, various prediction models based on more sophisticated methods, including multiple linear regression (MLR),6–9 partial least squares (PLS),10–13 probabilistic finite element method,14 and sparse PLS, were reported.15 Among the methods, PLS is known for its effectiveness when many variables are present and highly correlated with each other. The computation of PLS involves simple matrix algebra and it can be performed quickly. Previous publications demonstrated superior predictive performance of PLS to MLR in predicting postoperative soft-tissue changes.10–13

Artificial intelligence (AI) has been popular in orthodontics for automatic workflows such as identifying cephalometric landmarks,16–18 image superimposition,19,20 providing subsequent analyses,21 and growth prediction.22,23 A recent growth prediction study applied an AI algorithm based on the TabNet deep neural network (DNN).24 The growth prediction accuracy from this AI outperformed the results from the PLS prediction.23 This AI technology was designed to apply to prediction scenarios involving multiple input and output variables. Since soft-tissue changes after orthognathic surgery can be influenced by various factors such as age, gender, type of surgery, individual response to surgery, individual skeletal configuration, and soft-tissue characteristics, in this complex situation, AI can be a useful tool in predicting postoperative changes by properly handling numerous input and output variables.
This study aimed to evaluate the performance of an AI model in predicting orthognathic surgical outcomes compared to conventional prediction methods.
MATERIALS AND METHODS
Subjects
The institutional review board for the protection of human subjects of the Seoul National University School of Dentistry approved the research protocol (S-D20200036).
The subjects were 705 patients (392 females and 313 males with an average age of 23.4 years) who had undergone orthognathic surgery for correction of skeletal malocclusions at Seoul National University Dental Hospital from January 2002 to December 2022. All the patients were in good health and belonged to the Korean ethnicity. Subjects who had cleft lip and palate, injury, or craniofacial syndrome were excluded from this study. Further characteristics of the subjects are shown in Table 2.

The preoperative lateral cephalograms (T1) were taken close to the time of orthognathic surgery. The postoperative radiographs (T2) were taken immediately after debonding. On a total of 1410 T1 and T2 images from 705 subjects, 78 cephalometric landmarks were manually identified by a single examiner (SJL, with over 33 years of clinical experience). When the examiner and another examiner repeated the manual identification twice on 283 validation images, the intra- and inter-examiner reliability measures were 0.97 ± 1.03 mm and 1.50 ± 1.48 mm, respectively.17
The 78 landmarks consisted of 46 skeletal and 32 soft-tissue landmarks. The reference planes were set with their origin at Sella. The horizontal reference plane was set as Sella-Nasion −7 degrees (Figure 1).



Citation: The Angle Orthodontist 94, 5; 10.2319/111423-756.1
Variables
The predictors were 254 input variables that included age, sex, Angle classification, time after surgery, type of maxillary surgery, type of mandibular surgery, type of genioplasty, type of segmental osteotomy, type of zygomatic surgery, type of paranasal augmentation, preoperative skeletal and soft-tissue characteristics, 154 variables, and the amount of surgical skeletal repositioning during surgery, 90 variables. These 90 variables represented the amount of change in the x and y coordinates of 45 hard tissue landmarks, as shown in Figure 1A.
The outcomes were 64 Cartesian coordinate variables of 32 soft-tissue landmarks after surgery from glabella to the terminal point of the neck (Figure 1B).
Prediction Model Construction
The conventional prediction models were mathematical manipulations. MLR was based on the ordinary least squares. When developing MLR, the stepwise variable selection method based on the Akaike information criterion was applied. The other conventional prediction model, based on the partial least squares algorithm (PLS) combines the merit of the principal component analysis and MLR.25 The PLS model of the present study included 50 PLS components.
The AI algorithm applied in the present study was TabNet with a DNN architecture that was capable of including numerous numbers of input- and output variables.24 To construct the AI-based soft-tissue prediction model, the algorithm was adjusted using Python programming (Python Software Foundation, Wilmington, Delaware). TabNet DNN conditions were tuned with the synthetic minority oversampling technique set at 0.1. The early stopping condition was set to stop training before 10,000 epochs once the model performance no longer improved.
Statistical Analysis
To test and validate a prediction model, it is mandatory to validate the model through new data that was not used during the model-building procedures. To maintain the sample size and ensure the accuracy of prediction, the leave-one-out cross-validation technique (LOOCV) was employed. LOOCV has been demonstrated to be more effective than other validation techniques, such as the classical simple split technique, five-fold, or 10-fold cross-validation methods, particularly in clinical orthodontic research.13,26
At the beginning of LOOCV, a prediction model was formulated by using all subjects except one excluded subject. After constructing the prediction model, a prediction was performed for the excluded subject, calculating a test error for that individual. This procedure was repeated N times to yield the test errors, where N was the whole number of subjects.12,26 For validation purposes, consequently, 705 prediction models were built for each AI, MLR, and PLS prediction method.
To compare the prediction accuracy for the 32 soft-tissue landmarks, the Euclidean distance was calculated between the actual soft-tissue change after surgery and the prediction result for each landmark.
The t-tests with Bonferroni correction were used to compare the prediction accuracy between PLS and AI. To visualize the two-dimensional error patterns, scatterplots with 95% confidence ellipses were depicted.27 All statistical analyses were performed using Language R (Vienna, Austria).
RESULTS
Approximately 95% of 705 patients had Class II or III malocclusion at their first visit. The average elapsed time after orthognathic surgery was 0.9 years. The most frequent types of orthognathic surgery were Le Fort I osteotomy in the maxilla and bilateral sagittal split ramus osteotomy in the mandible. At least one of these two surgeries was conducted on over 80% of the subjects. Additionally, 59.6% of the patients received genioplasty (Table 2).
Figure 2 demonstrates the scatterplots of prediction errors along with 95% confidence ellipses. A smaller ellipse indicates more accurate results.27 Three different scenarios were represented: 1) PLS prediction was more accurate than AI (Figure 2A), 2) there was no statistically significant difference between PLS and AI (Figure 2B), and 3) AI prediction was more accurate than PLS (Figure 2C). From the visual inspection of the scatterplots for all soft-tissue landmarks, MLR demonstrated the poorest predictive performance, showing either a larger size or a more deformed shape of ellipse than PLS and AI.



Citation: The Angle Orthodontist 94, 5; 10.2319/111423-756.1
Table 3 shows pairwise comparisons between the prediction results of PLS and AI. The accuracy of the predictions varied depending on the location of soft-tissue landmarks. Out of the 32 landmarks, PLS showed more accurate results in predicting 16 landmarks from glabella to the upper lip. On the other hand, AI performed better in six landmarks located in the lower border of the mandible and neck area. The remaining 10 landmarks presented no statistically significant different results between AI and PLS prediction models.

The prediction results shown in Figure 2 and Table 3 show many outliers and deviations, respectively. However, those aberrations may not be significant as long as the predicted positions fall within the profile curves. As shown in Figure 3, the soft-tissue prediction results are depicted to compare them with the actual changes after surgery. The soft-tissue landmarks from glabella to the terminal point on the lower neck were connected by applying the natural cubic spline function so that those soft-tissue landmarks could represent a smooth curve. Although the prediction results were distant from real soft-tissue changes in some areas, AI was particularly effective in predicting soft-tissue curves in the lower mandible and neck region (Figure 3).



Citation: The Angle Orthodontist 94, 5; 10.2319/111423-756.1
DISCUSSION
The purpose of this study was to evaluate the performance of an AI model in predicting orthognathic surgical outcomes compared to conventional prediction methods. The present study was inspired by recent research that developed individualized facial growth prediction models, where AI showed effectiveness in predicting the facial changes of growing children.22,23 In this study, AI was expected to outperform conventional statistical methods such as MLR or PLS when predicting surgical outcomes. However, the results were different from what was envisaged. Among 32 soft-tissue landmarks, AI predicted better in only six outcome variables. Contrary to expectations, PLS performed better in predicting half of the total soft-tissue landmarks. Previously, while predicting facial growth, PLS showed more accurate predictions in nine out of the total of 78 landmarks, primarily located in the cranial base. According to Moon et al., statistical methods based on mathematical manipulation such as PLS or MLR may be more effective than AI when predicting craniofacial growth on landmarks with low variability.22 In this study, AI was found to be more accurate in predicting soft-tissue changes in the lower mandible and neck region, which are areas that typically exhibit significant variability after surgery. These areas may show inherent variability even with a slight postural change or without any surgical procedures.
Training an AI-based prediction model took more than 6 days, while the PLS-based model took less than 10 minutes. However, unlike the time-consuming training and model-building procedures used to develop the AI model, the prediction itself involves only relatively simple calculations. Consequently, after the prediction model was built, the predictions were made in only a few milliseconds. Once an AI model is developed, its prediction time is negligible despite the longer development time. Since AI model-building time required significantly longer than PLS, accordingly, it might seem reasonable that the prediction results of the AI-based model would be more accurate. However, as previously described, PLS was more successful than AI when predicting landmarks with less variability. Additional studies may be needed in the future to clarify a more accurate algorithm in terms of predictive performance. Since each algorithm expressed different strengths according to the variability of landmarks, a hybrid approach applying the two separate prediction models differently depending on the landmarks to be predicted may be a more viable option, rather than simply choosing a sole method. This simultaneous application of both algorithms as needed might offer an answer to various prediction problems.
One of the strengths of the present study was that, as of April 2024, it is the first AI study to use the TabNet DNN algorithm to predict orthognathic surgical outcomes. Additionally, this study included the greatest number of subjects, 705, as shown in Table 1, compared to the 620 subjects in the study by Veltkamp et al. (2002).8 This larger sample size might have contributed to improved prediction accuracy.
A limitation of the current study was that AI was not capable of explaining how the results were obtained. In comparison, conventional statistical models could provide the relationships via coefficient estimates and loading matrices. This may be why AI is sometimes referred to as a black box. Another limitation was that the AI prediction results could be different if other algorithms had been used instead of the TabNet DNN algorithm.24 Relying on cephalograms was another weak point of this study. However, it is also true that computed tomography is not commonly obtained during a patient’s first visit. Still, lateral cephalograms are routinely used to diagnose the need for orthognathic surgery. Although three-dimensional images may offer a more realistic visualization, the lateral profile line from a cephalogram is often viewed as a simpler way in practice.
Although AI may be thought to be a recent device for which many orthodontists see a need, AI by itself may not be the ultimate solution, at least in predicting orthognathic surgical outcomes. The initial expectation was that AI could be an adaptable solution for various challenges and complex issues in clinical orthodontics. However, this study discovered that AI predictions might not always be as reliable as expected in certain areas. If this is true, AI may not always outperform traditional statistical methods, especially when there is low variability and/or a clear cause-and-effect relationship. One such scenario could be predicting changes in the soft-tissue profile after orthodontic treatment. Unlike growth prediction scenarios, there is a clearer cause-and-effect relationship between dentoalveolar and soft-tissue changes in orthodontic treatment.28 Additionally, the changes in soft-tissue after orthodontic treatment are not as variable as the changes following orthognathic surgical procedures. Consequently, it is cautiously anticipated that AI-based prediction models might not be as effective as methods based on MLR or PLS in predicting changes in the soft-tissue profile after orthodontic treatment.28 This could be an interesting topic for future AI research in orthodontics.
CONCLUSIONS
-
AI effectively predicted soft-tissue curves in the lower mandible and neck region, which are typically characterized by wide variability after surgery. However, PLS presented superior predictions in more areas. Consequently, a combination of AI and conventional methods seemed to be a more effective way of predicting orthognathic surgical outcomes.

Reference planes and cephalometric landmarks used in the present study. (A) Skeletal landmarks are shown in capital letters. (B) Soft-tissue landmarks are presented in lowercase letters.

Scatterplots and 95% confidence ellipses of prediction errors for soft-tissue landmarks: (A) superior labial sulcus; (B) lower lip; (C) cervical point. The larger points at the center of each ellipse represent the mean or bias of the smaller-dotted error points enclosed by the ellipse.

Real-case examples illustrating actual soft-tissue changes after orthognathic surgery and the corresponding prediction results. There is a mismatch between the outline curves and the soft-tissue profile line due to the outline being based on the lateral cephalometric image, while the lateral photographs were superimposed for illustrative purposes. In general, AI predictions are more accurate than PLS predictions in the lower border of the mandible and neck curve expression.
Contributor Notes