Editorial Type:
Article Category: Research Article
 | 
Online Publication Date: 10 May 2024

Orthodontic treatment outcome predictive performance differences between artificial intelligence and conventional methods

,
,
,
,
,
, and
Page Range: 557 – 565
DOI: 10.2319/111823-767.1
Save
Download PDF

ABSTRACT

Objectives

To evaluate an artificial intelligence (AI) model in predicting soft tissue and alveolar bone changes following orthodontic treatment and compare the predictive performance of the AI model with conventional prediction models.

Materials and Methods

A total of 1774 lateral cephalograms of 887 adult patients who had undergone orthodontic treatment were collected. Patients who had orthognathic surgery were excluded. On each cephalogram, 78 landmarks were detected using PIPNet-based AI. Prediction models consisted of 132 predictor variables and 88 outcome variables. Predictor variables were demographics (age, sex), clinical (treatment time, premolar extraction), and Cartesian coordinates of the 64 anatomic landmarks. Outcome variables were Cartesian coordinates of the 22 soft tissue and 22 hard tissue landmarks after orthodontic treatment. The AI prediction model was based on the TabNet deep neural network. Two conventional statistical methods, multivariate multiple linear regression (MMLR) and partial least squares regression (PLSR), were each implemented for comparison. Prediction accuracy among the methods was compared.

Results

Overall, MMLR demonstrated the most accurate results, while AI was least accurate. AI showed superior predictions in only 5 of the 44 anatomic landmarks, all of which were soft tissue landmarks inferior to menton to the terminal point of the neck.

Conclusions

When predicting changes following orthodontic treatment, AI was not as effective as conventional statistical methods. However, AI had an outstanding advantage in predicting soft tissue landmarks with substantial variability. Overall, results may indicate the need for a hybrid prediction model that combines conventional and AI methods.

INTRODUCTION

With the advent of high-speed computer technology, the use of artificial intelligence (AI) in research has become popular. With a wealth of new AI literature emerging, orthodontics is no exception to the rapid influx of AI. Recently, several studies have used AI to predict facial soft tissue changes following orthodontic treatment. However, these AI studies seem to simply repeat analyses using AI, which have already been analyzed using conventional statistical methods (Table 1).1–3 Since AI requires significant computing resources even for simple tasks, it must demonstrate superior effectiveness over traditional methods to justify its use. If an AI model does not perform better than conventional methods, it is impractical and unnecessarily costly. To determine its practicality, it might be essential to compare the accuracy of AI predictions with traditional methods.

Table 1. Summary of Orthodontic Prediction Research Using Artificial Intelligence
Table 1.

To develop a clinically applicable method for predicting smooth soft tissue curves, it is necessary to analyze multiple predictor and outcome (response) variables of the soft tissue landmarks.4,5 Multivariate multiple linear regression (MMLR), which produces the ordinary least squares estimator, is one of the conventional methods available to do so.6 However, MMLR has limitations when there are numerous variables and those variables are significantly correlated. Partial least squares regression (PLSR) applies dimensional reduction latent modeling and has preferably been used to provide more accurate prediction results after combined surgical-orthodontic treatment5,7–10 or in predicting facial growth changes.11

In developing AI models, deep-learning algorithms based on convolutional neural network (CNN) architecture have been the most popular. One of the latest CNNs, TabNet deep neural network (DNN), has been used to develop an individualized facial growth prediction model12 and to predict soft tissue changes following orthognathic surgery.13 TabNet DNN can model complex nonlinear relationships, incorporating multiple predictors and outcome variables.14 Previously, AI showed effectiveness when automatically identifying cephalometric landmarks and subsequent analyses15–20 and for predicting facial growth in growing children.12 Contrary to the initial assumption that AI could provide a universal solution to diverse challenges, AI has not always been effective. For example, when predicting soft tissue changes following orthognathic surgery, the AI prediction was not as effective as the conventional PLSR method, particularly when predicting areas with small surgical changes.13

The purpose of this study was to develop and evaluate an AI model for predicting changes following orthodontic treatment. The specific aim was to compare the predictive performance of the AI prediction model with conventional prediction methods, MMLR and PLSR.

MATERIALS AND METHODS

Subjects

The institutional review board for the protection of human subjects of the Seoul National University School of Dentistry approved the research protocol (S-D20200036).

The subjects were 887 patients (604 females and 283 males; mean age = 24.2 ± 8.5 years) who had undergone orthodontic treatment at the Department of Orthodontics, Seoul National University Dental Hospital, Seoul, Korea, from January 2013 to December 2022. The inclusion criteria were (1) females aged > 15 years, males > 17 years, to exclude subjects during major growth spurts, and (2) treated with comprehensive orthodontic treatment using fixed appliances. The exclusion criteria were (1) a history of orthognathic surgery and (2) the presence of craniofacial syndromes.

Predictors and Outcome Variables

For all patients, lateral cephalograms taken before (T1) and after (T2) orthodontic treatment were collected. On 1774 images from 887 patients, 46 skeletal and 32 soft tissue landmarks were identified using automated landmark detection software (Ceppro, DDH Inc, Seoul, Korea) based on the PIPNet algorithm by Jin et al. (2021).21

The prediction models comprised 132 predictors and 88 outcome variables (Figure 1). The predictor variables were demographics (age, sex), clinical (treatment time, premolar extraction), and Cartesian coordinates of the 64 anatomic landmarks.

Figure 1.Figure 1.Figure 1.
Figure 1. The experimental design.

Citation: The Angle Orthodontist 94, 5; 10.2319/111823-767.1

Landmarks were chosen to reflect orthodontic treatment changes of the incisors (16 variables), the molars (24 variables), the soft tissue from subnasale to the terminal point of the neck (44 variables), the alveolar bone (12 variables), and the hard tissue of the mandible (32 variables).

The outcome variables included Cartesian coordinates of the 22 soft tissue landmarks from subnasale to the terminal point of the neck, 6 alveolar bone landmarks, and 16 skeletal landmarks on the mandible that could undergo changes according to orthodontic tooth movement (Figure 2; Table 2).

Figure 2.Figure 2.Figure 2.
Figure 2. The reference planes and 78 cephalometric landmarks used in this study: (A) pretreatment image, skeletal landmarks in capital letters; (B) posttreatment image, soft tissue landmarks in lowercase letters.

Citation: The Angle Orthodontist 94, 5; 10.2319/111823-767.1

Table 2. Comparison of Orthodontic Treatment Prediction Errors (mm) from Multivariate Multiple Linear Regression (MMLR), Partial Least Squares Regression (PLSR) Method, and the TabNet Artificial Intelligence (AI) Algorithma
Table 2.

AI Prediction Model

The AI prediction model applied in the present study was based on the TabNet DNN algorithm by Arik and Pfister (2021).14 The training and testing were performed using Python (Python Software Foundation, Wilmington, Del) on a desktop computer run on Ubuntu 22.04 LTS of Linux distribution.

To develop an optimal AI prediction model, various AI training circumstances (also called hyperparameters) were tested, and the optimal conditions were selected by comparing prediction errors of numerous combinations of training hyperparameters. Regarding the early stopping number of training epochs, 50, 100, 1000, and 10,000 were tested. Subsequently, the AI model trained through 10,000 epochs was selected as the optimal AI model (Figure 3A).

Figure 3.Figure 3.Figure 3.
Figure 3. Searching for optimal artificial intelligence (AI) model training conditions by comparing 95% confidence ellipses of the AI prediction errors at the upper lip and lower lip: (A) according to the number of training epochs; (B) according to the amount of oversampling.

Citation: The Angle Orthodontist 94, 5; 10.2319/111823-767.1

The oversampling method based on the synthetic minority oversample technique (SMOTE)22 was implemented. SMOTE values of 0.05, 0.1, 0.2, and 0.3 were tested. However, the results from varying values of SMOTE did not show significant differences (Figure 3B).

Two Conventional Statistical Prediction Models

MMLR is based on the ordinary least squares estimator. The stepwise variable selection method was used in constructing the MMLR prediction model.

PLSR combines the benefits of principal component analysis and MMLR through dimensional reduction latent modeling.23 The PLSR prediction model with 40 latent variables was selected.

Validation and Evaluation of Predictive Performance

To validate the prediction models and to avoid overfitting, the leave-one-out cross-validation, which is known for its superiority compared with other test/validation methods, was used.24 During the validation process, 2661 prediction models were built, with each algorithm (AI, MMLR, and PLSR) having 887 models that excluded one subject during model building.

To compare predictive performance, analysis of variance was conducted. Scatterplots with 95% confidence ellipses were used to evaluate the prediction errors in two dimensions.25 Changes in the 22 skeletal and 22 soft tissue landmarks after orthodontic treatment were connected using spline curves overlaid on real patient photos and cephalometric images (Figure 1).

RESULTS

Among 887 patients, 31% had undergone premolar extraction treatment; 53.6%, 34.8%, and 11.6% had Class I, II, and III malocclusions, respectively. The mean treatment duration was 32 months.

The pooled average prediction errors of the 44 anatomical landmarks were 1.69 mm, 1.74 mm, and 2.12 mm from the MMLR, PLSR, and AI prediction methods, respectively.

Overall, MMLR demonstrated the most accurate results in all of the alveolar bone and skeletal landmarks. However, AI demonstrated superiority over MMLR and PLSR in predicting 5 among 22 soft tissue landmarks, all of which were landmarks on the face below menton to the terminal point of the neck that had been poorly predicted by MMLR and PLSR (Table 2).

From the point of view of statistical significance, MMLR showed more accurate results than PLSR in 14 landmarks. However, when the prediction errors were evaluated in two dimensions, the differences between MMLR and PLSR did not illustrate clinically significant differences (Figure 4). In certain areas, the differences between the AI and conventional methods were noticeable. Figure 4 illustrates several representative scatterplots of prediction errors where AI demonstrated greater prediction errors, such as at the upper lip, lower lip, soft tissue point B, soft tissue pogonion, and soft tissue menton. However, when predicting the cervical point, AI showed fewer prediction errors than the conventional methods (Figure 4).

Figure 4.Figure 4.Figure 4.
Figure 4. The prediction errors in several soft tissue landmarks obtained from the multivariate multiple linear regression (MMLR, green), partial least squares regression (PLSR, blue), and artificial intelligence (AI, red) prediction methods. In general, AI showed the least accurate prediction results except for the cervical point, where AI showed the smallest ellipses.

Citation: The Angle Orthodontist 94, 5; 10.2319/111823-767.1

To provide real case examples, the predicted outcomes were overlaid with the actual changes following orthodontic treatment. Figure 5A displays an orthodontic patient treated with four premolar extractions, which demonstrated that MMLR and PLSR are more accurate than AI when predicting changes in the alveolar process and lip curves. Figure 5B is a treatment case with an open-bite resolved by counterclockwise autorotation of the mandible through the intrusion of the maxillary posterior teeth, which illustrated poor accuracy by the AI model compared with MMLR and PLSR for the prediction of rotational movement of the mandible. Similar variations in the predictive outcomes were observed in all patients, particularly in the lip region and chin tip (Figure 5C,D). In predicting soft tissue curves below soft tissue menton, AI outperformed MMLR and PLSR for all patients (Figure 5).

Figure 5.Figure 5.Figure 5.
Figure 5. Comparison between actual changes after orthodontic treatment and prediction results according to multivariate multiple linear regression (MMLR), partial least squares regression (PLSR), and artificial intelligence (AI) prediction methods in patients with (A) Class III anterior crossbite, (B) Class II open-bite, (C) Class I open-bite, and (D) Class II open-bite.

Citation: The Angle Orthodontist 94, 5; 10.2319/111823-767.1

DISCUSSION

Although recent AI studies have shown promising features of AI in predicting changes following orthodontic treatment,1–3 no studies compared the predictive performance of AI with conventional statistical methods to determine whether AI was superior enough to deserve the spotlight over conventional methods in this area. In addition, previous literature was based on a relatively small sample size with a limited number of outcome variables. So far, the present study appears to be the first to compare the predictive performance of an AI prediction model with conventional prediction methods for orthodontic treatment outcomes, using the largest sample size ever. This study showed that AI was not effective in predicting changes after orthodontic treatment, except for the neck area. So far, it could be conjectured that AI might not be as effective as conventional methods.

Based on literature reviews, AI has not always been effective. According to Hwang et al. (2020),19 when detecting cephalometric landmarks, AI was poorer than human examiners in accurately identifying the nose tip, the nasal bone tip, incisal edges, and incisal root tips. These landmarks had a relatively clear form and shape that could be visually pinpointed with ease. Recently, according to Moon et al. (2024),12 when predicting facial growth, AI was the most accurate in 63 out of 78 landmarks (81%). However, when predicting cranial base landmarks, AI showed poorer results than PLSR. AI also did not outperform PLSR in the study by Park et al. (2024)13 on predicting changes after orthognathic surgery, AI only provided the most accurate results for 6 out of 32 landmarks (18.8%). In the present study, the percentage of the cases for which AI showed the most accurate results decreased to 5 of 44 landmarks (11.4%), as summarized in Table 3. This suggests that, when changes were limited, variations were minimal, or a clear cause-and-effect relationship existed, conventional statistical prediction methods were more effective than AI. Although predicting changes after orthodontic treatment is complex, orthodontic treatment changes are not as significant as those resulting from natural facial growth over time or orthognathic surgical procedures.

Table 3. Comparative Overview of Predictive Performance Results According to the Number of Subjects and Variables Included in the Experimental Design Among the Multivariate Multiple Linear Regression (MMLR), Partial Least Squares Regression (PLSR), and Artificial Intelligence (AI) Prediction Models
Table 3.

AI does have some disadvantages. First, AI cannot explain how it arrives at solutions, unlike conventional linear regression analysis that can interpret and estimate the relationships between predictor and outcome variables. Since the internal operations of AI cannot be fully understood or interpreted, in this sense, AI could be deemed a black box.13 Second, developing an AI model requires significant computation resources and time, ranging from several weeks to months, depending on the sample size and computer specifications.

The large sample size of this study may have influenced the finding that MMLR was more effective than PLSR. Without exception, all of the previous growth prediction studies11,12 and orthognathic surgery prediction studies5,7–10,13 showed MMLR to have a poorer predictive performance than PLSR. With some caution, it is surmised that this phenomenon might have been related to the ratio of the number of subjects (n) to the number of predictors (p), namely, the n/p ratio. PLSR is advantageous in the case of small n and large p situations,23 whereas MMLR models would be more robust when the n/p ratio was greater than 5.26 In those studies, the n/p ratios of growth prediction and orthognathic surgery prediction studies were 2.55 and 2.78, respectively. The consequence was that PLSR models were better methods than MMLR models. In contrast, the current study had a larger sample size and fewer variables than previous studies, resulting in an n/p ratio of 6.72. As shown in Table 3, this might have led to the finding that MMLR was the more accurate method. However, further clinical studies or simulations are required to validate this hypothesis.

On a similar note, AI seemed to have shown better predictive performance when the n/p ratio was low, ie, when there were limited numbers of subjects but many predictor variables. This may imply that, in the case of three-dimensional (3D) studies which involve 3 to 4 times more variables than 2D study formulations, AI is likely to play a more meaningful role than conventional statistical methods.4,15 Orthodontic treatment prediction results will become more sophisticated in the future as more 3D information is collected.

CONCLUSIONS

  • When predicting changes following orthodontic treatment, AI was not as effective as the conventional statistical methods, suggesting that AI might not always be the best option for predicting everything.

  • However, this does not necessarily mean that conventional methods should be applied. The strength of the AI prediction method was apparent in predicting the soft tissue changes in the neck, whereas traditional methods had poorly predicted changes in that area.

  • Applying multiple methods catered to the anatomic features and variability of response variables may be a viable option to improve predictive performance.

ACKNOWLEDGMENTS

A part of the data presented in the current study was included as a part of a doctoral dissertation (SJC). This study was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant No. HI22C1518), and partly supported by grants (03-2022-0046 and 05-2023-0027) from the SNUDH Research Fund.

DISCLOSURE

The authors report there are no conflicts of interest that should be disclosed.

REFERENCES

  • 1.

    Park YS, Choi JH, Kim Y, et al. Deep learning-based prediction of the 3D postorthodontic facial changes. J Dent Res. 2022;101:13721379.

  • 2.

    Tanikawa C, Yamashiro T. Development of novel artificial intelligence systems to predict facial morphology after orthognathic surgery and orthodontic treatment in Japanese patients. Sci Rep. 2021;11:15853.

  • 3.

    Park JH, Kim Y-J, Kim J, et al. Use of artificial intelligence to predict outcomes of nonextraction treatment of Class II malocclusions. Semin Orthod. 2021;27:8795.

  • 4.

    Kang TJ, Eo SH, Cho H, Donatelli RE, Lee SJ. A sparse principal component analysis of Class III malocclusions. Angle Orthod. 2019;89:768774.

  • 5.

    Suh HY, Lee HJ, Lee YS, Eo SH, Donatelli RE, Lee SJ. Predicting soft tissue changes after orthognathic surgery: the sparse partial least squares method. Angle Orthod. 2019;89:910916.

  • 6.

    Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction . New York, NY: Springer; 2009.

  • 7.

    Yoon KS, Lee HJ, Lee SJ, Donatelli RE. Testing a better method of predicting postsurgery soft tissue response in Class II patients: a prospective study and validity assessment. Angle Orthod. 2015;85:597603.

  • 8.

    Lee YS, Suh HY, Lee SJ, Donatelli RE. A more accurate soft-tissue prediction model for Class III 2-jaw surgeries. Am J Orthod Dentofacial Orthop. 2014;146:724733.

  • 9.

    Lee HJ, Suh HY, Lee YS, et al. A better statistical method of predicting postsurgery soft tissue response in Class II patients. Angle Orthod. 2014;84:322328.

  • 10.

    Suh HY, Lee SJ, Lee YS, et al. A more accurate method of predicting soft tissue changes after mandibular setback surgery. J Oral Maxillofac Surg. 2012;70:e553–562.

  • 11.

    Moon JH, Kim MG, Hwang HW, Cho SJ, Donatelli RE, Lee SJ. Evaluation of an individualized facial growth prediction model based on the multivariate partial least squares method. Angle Orthod. 2022;92:705713.

  • 12.

    Moon JH, Shin HK, Lee JM, et al. Comparison of individualized facial growth prediction models based on the partial least squares and artificial intelligence. Angle Orthod. 2024;94:207205.

  • 13.

    Park JA, Moon JH, Lee JM, et al. Does artificial intelligence predict orthognathic surgical outcomes better than conventional linear regression methods? Angle Orthod. 2024 in press DOI:10.2319/111423-756.1

  • 14.

    Arik SÖ, Pfister T. TabNet: attentive interpretable tabular learning. Proceedings of the AAAI Conference on Artificial Intelligence. 2021;35:66796687.

  • 15.

    Ghowsi A, Hatcher D, Suh H, et al. Automated landmark identification on cone-beam computed tomography: accuracy and reliability. Angle Orthod. 2022;92:642654.

  • 16.

    Hwang HW, Moon JH, Kim MG, Donatelli RE, Lee SJ. Evaluation of automated cephalometric analysis based on the latest deep learning method. Angle Orthod. 2021;91:329335.

  • 17.

    Moon JH, Hwang HW, Yu Y, Kim MG, Donatelli RE, Lee SJ. How much deep learning is enough for automatic identification to be reliable? Angle Orthod. 2020;90:823830.

  • 18.

    Moon JH, Hwang HW, Lee SJ. Evaluation of an automated superimposition method for computer-aided cephalometrics. Angle Orthod. 2020;90:390396.

  • 19.

    Hwang HW, Park JH, Moon JH, et al. Automated identification of cephalometric landmarks: part 2—might it be better than human? Angle Orthod. 2020;90:6976.

  • 20.

    Park JH, Hwang HW, Moon JH, et al. Automated identification of cephalometric landmarks: part 1—comparisons between the latest deep-learning methods YOLOV3 and SSD. Angle Orthod. 2019;89:903909.

  • 21.

    Jin H, Liao S, Shao L. Pixel-in-pixel net: towards efficient facial landmark detection in the wild. Int J Comput Vis. 2021;129:31743194.

  • 22.

    Fernández A, Garcia S, Herrera F, Chawla NV. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res. 2018;61:863905.

  • 23.

    Kim K, Lee SJ, Eo SH, Cho SJ, Lee JW. Modified partial least squares method implementing mixed-effect model. Commun Stat Appl Methods. 2023;30:6573.

  • 24.

    Donatelli RE, Lee SJ. How to test validity in orthodontic research: a mixed dentition analysis example. Am J Orthod Dentofacial Orthop. 2015;147:272279.

  • 25.

    Moon JH, Lee JM, Park JA, Suh HY, Lee SJ. Reliability statistics every orthodontist should know. Semin Orthod. 2024;30:4549.

  • 26.

    Norman GR, Streiner DL. Biostatistics: The Bare Essentials .

    St. Louis, MO
    :
    Mosby Year Book
    ; 1994.

Copyright: © 2024 by The EH Angle Education and Research Foundation, Inc.
Figure 1.
Figure 1.

The experimental design.


Figure 2.
Figure 2.

The reference planes and 78 cephalometric landmarks used in this study: (A) pretreatment image, skeletal landmarks in capital letters; (B) posttreatment image, soft tissue landmarks in lowercase letters.


Figure 3.
Figure 3.

Searching for optimal artificial intelligence (AI) model training conditions by comparing 95% confidence ellipses of the AI prediction errors at the upper lip and lower lip: (A) according to the number of training epochs; (B) according to the amount of oversampling.


Figure 4.
Figure 4.

The prediction errors in several soft tissue landmarks obtained from the multivariate multiple linear regression (MMLR, green), partial least squares regression (PLSR, blue), and artificial intelligence (AI, red) prediction methods. In general, AI showed the least accurate prediction results except for the cervical point, where AI showed the smallest ellipses.


Figure 5.
Figure 5.

Comparison between actual changes after orthodontic treatment and prediction results according to multivariate multiple linear regression (MMLR), partial least squares regression (PLSR), and artificial intelligence (AI) prediction methods in patients with (A) Class III anterior crossbite, (B) Class II open-bite, (C) Class I open-bite, and (D) Class II open-bite.


Contributor Notes

 Graduate Student (PhD), Department of Orthodontics, Seoul National University, Seoul, Korea.
 Private Practice, Cheonan, Korea.
 Research Scientist, AI Research Center, DDH Inc, Seoul, Korea.
 Clinical Lecturer, Department of Orthodontics, Seoul National University Dental Hospital, Seoul, Korea.
 Private Practice, West Palm Beach, Florida, USA.
 Professor, Department of Orthodontics and Dental Research Institute, Seoul National University School of Dentistry, Seoul, Korea.
Corresponding author: Dr Shin-Jae Lee, Professor, Department of Orthodontics and Dental Research Institute, Seoul National University School of Dentistry, Jongro-Gu, Seoul 03080, Korea (e-mail: nonext.shinjae@gmail.com)
Received: 01 Nov 2023
Accepted: 01 Mar 2024
  • Download PDF