Comparison of individualized facial growth prediction models using artificial intelligence and partial least squares based on the Mathews growth collection
ABSTRACT
Objectives
To develop facial growth prediction models using artificial intelligence (AI) under various conditions, and to compare performance of these models with each other as well as with the partial least squares (PLS) growth prediction model.
Materials and Methods
Longitudinal lateral cephalograms from 33 subjects in the Mathews growth collection were utilized. A total of 1257 pairs of before and after growth lateral cephalograms were included. In each image, 46 hard and 32 soft tissue landmarks were manually identified. Growth prediction models were constructed using a deep learning method based on TabNet deep neural network and partial least squares (PLS) method. Prediction accuracies of the two methods were compared.
Results
On average, artificial intelligence (AI) showed 0.61 mm less prediction error than PLS. Among the 77 predicted landmarks, AI was more accurate than PLS in 60 landmarks. When comparing AI models with varying numbers of training epochs, those with higher epochs yielded more accurate predictions. Overall, PLS and AI exhibited greater prediction errors for soft tissue and mandibular landmarks compared to hard tissue and maxillary landmarks. However, AI showed a smaller increase in prediction error in areas with greater variability.
Conclusions
AI proved to be a valuable growth prediction method, with clinically acceptable prediction errors averaging 1.49 mm for 45 hard tissue landmarks and 1.71 mm for 32 soft tissue landmarks. PLS accurately predicted landmarks with low variability. However, AI generally outperformed PLS, particularly for landmarks in the lower part of the craniofacial structure and soft tissue, where uncertainty is considerable.
INTRODUCTION
Understanding and predicting timing, pattern, and amount of human facial growth greatly impacts the effectiveness and efficiency of orthodontic treatment.1 Although some patients benefit from early intervention, others may miss critical windows, necessitating surgery as the only viable option. Additionally, some patients receive multiple rounds of treatment as they outgrow initial results or experience relapse. Ideally, with unlimited resources, prolonged treatments could yield optimal outcomes. Nevertheless, the best available scientific evidence should be used to determine the most effective and efficient treatment options.
Historical growth prediction methods2–10 provided general guidelines but were not always precise for individual variations. Despite efforts to understand and predict growth and development, the subjectivity in predicting dentofacial growth remains a challenge, as highlighted in Dr. Bishara’s article in 2000.11 Although growth prediction has been an important subject in orthodontics, few publications have addressed craniofacial growth prediction in the past 20 years.12 The precise determination of future growth magnitude, direction, and resulting facial changes continues to be uncertain, with most treatment planning relying on the subjective assessment of orthodontists.
Accurate growth prediction is challenging due to its complexity and the influence of genetic and environmental factors, which cause individual variations.13–15 As an attempt to account for these individual variations, recent statistical methods, such as discriminant analysis,16,17 multiple linear regression analysis,18 Bayes’ theorem,19,20 and nonlinear growth models,21,22 have included age and gender in growth prediction. The multivariate partial least squares regression method (PLS) has been utilized in growth prediction for its ability to manage a large number of intercorrelated individual attributes and has shown improved prediction accuracy.12 Recently, there has been growing interest in applying artificial intelligence (AI) to solve complex problems in orthodontics. There have been attempts to use AI in cephalometric landmark detection, automatic image superimposition, orthodontic diagnosis, and growth prediction.23–30
Although advances in technology allow for extensive computational analysis of large datasets, developing a robust growth prediction model remains challenging, as collecting longitudinal growth records solely for research purposes is often not feasible, especially when patients are not undergoing treatment. Therefore, the American Association of Orthodontists Foundation (AAOF) Craniofacial Growth Legacy Collection serves as an invaluable resource, providing longitudinal records of growing adolescents.31
Given the critical role of growth in successful treatment, it is imperative to base growth predictions on the best available scientific knowledge. This study aimed to develop facial growth prediction models using the Mathews collection from the AAOF growth collections, based on AI. Another goal was to compare the performance of the AI model with a prediction model that utilized the PLS method, which is one of the most recently implemented statistical approaches.
MATERIALS AND METHODS
Subjects
The institutional review board for the protection of human subjects of University of The Pacific reviewed and approved the research protocol (#2023-28). Subjects of this study were collected from one of the AAOF Growth Collections, the Mathews collection, which includes 36 subjects, primarily of European descent. Three subjects were excluded due to missing images, resulting in a final sample of 33 subjects who received annual cephalograms resulting in at least five timepoints, yielding 1257 before and after growth pair data.
Cephalometrics
Cephalometric images of the subjects, taken annually, were processed digitally to enhance quality for reliable landmark identification. Fiducials, pixel information, and magnification factors were considered to resize the images. A total of 78 anatomical landmarks (Table 1) were identified manually by a single examiner (SJL) with 33 years of clinical orthodontic experience (Figure 1). A Cartesian coordinate system was constructed using Sella as the origin, resulting in 77 landmarks for prediction. The horizontal reference plane was established by drawing a line 7° downward from the Sella-Nasion plane.


Citation: The Angle Orthodontist 95, 3; 10.2319/082124-687.1

Variables
The prediction model incorporated 159 predictor variables and 154 response variables. Predictor variables included individual characteristics such as age, gender, Angle classification, growth observation interval, and the x and y coordinates of 77 anatomic landmarks from the starting timepoint. The x and y coordinates of 77 anatomic landmarks from a later timepoint were used as response variables.
AI and PLS Prediction Models
The leave-one-out cross-validation (LOOCV) was employed to calculate test errors.32 The TabNet Deep Neural Network (DNN) by Arik and Pfister33 was chosen as the base model. The original TabNet DNN architecture was modified using Python programming (Python Software Foundation, Wilmington, Delaware, USA). Different numbers of training epochs of 100 and 1000 were used to compare performance while exploring options, to save computational resources.34 The PLS prediction model12 was implemented using the open-source programming language, R.
Statistical Analysis
Prediction errors were calculated using Euclidean distances between actual growth and predicted outcomes for specific landmarks. To compare the prediction accuracy of PLS and AI, t-tests adjusted for multiple comparisons using the Bonferroni correction were used. Scatterplots with 95% confidence ellipses were created to represent the pattern of prediction errors visually.35
RESULTS
Table 2 presents the characteristics of the subjects at the time of growth observation. The average observation period was 8.5 years, with a mean starting age of 7.4. Ninety-four percent of subjects had radiographs taken more than five times, whereas about 27.3% had radiographs taken more than 10 times. When the proportion of malocclusion was considered, 51.5% of the subjects had Class I, 48.5% of patients had Class II malocclusions, and no patient presented a Class III molar relationship at initial examination.

The performance of the developed models was evaluated based on prediction errors. Among AI models developed under various conditions, the model with an early stopping condition at 1000 training epochs was chosen to calculate the AI prediction errors. On average, AI presented more accurate prediction with 0.61 mm smaller error than that of the PLS model. The average error for 45 hard tissue landmarks with the PLS prediction model was 1.87 mm, whereas the AI prediction error averaged 1.49 mm. For 32 soft tissue landmarks, the errors averaged 2.63 mm for PLS and 1.71 mm for AI. Among the 77 predicted landmarks, the AI-based prediction model showed better prediction accuracy for 60 landmarks (Table 3). The PLS-based prediction model was more accurate for 13 landmarks (Nasion, Porion, Orbitale, Basion, Articulare, Condylion, Ramus tip, Pterygomaxillary fissure, Pterygoid point, PNS, glabella, glabella-nasion contour point, and cheekpoint). There was no statistical difference in four landmarks (Nasal bone tip, Key ridge contour smoothing point 1, soft-tissue nasion, inferior tip of nasal bone). Overall, both methods showed greater prediction errors for soft tissue and mandibular landmarks compared to hard tissue and maxillary landmarks. However, the AI method demonstrated a smaller increase in error for areas with more variability.

The pattern of growth prediction errors for representative landmarks are shown in Figure 2. For hard tissue landmarks, PLS demonstrated better prediction accuracy in 13 landmarks including Nasion, Porion, Orbitale, Basion, and Condylion (Figure 2A). Generally, AI demonstrated significantly more accurate results than PLS, with AI model accuracy improving with more training epochs (Figure 2B). Among the soft tissue landmarks, only glabella, glabella-nasion contour point, and cheek point were better predicted by PLS, whereas all other landmarks were more accurately predicted by AI (Figure 2C). Overall, PLS exhibited better or comparable prediction performance than the AI method in the upper part of the craniofacial structure, whereas AI outperformed PLS in the lower part of the craniofacial structure and in soft tissue.


Citation: The Angle Orthodontist 95, 3; 10.2319/082124-687.1
Comparisons of actual growth and prediction outcomes based on the AI and PLS methods for real case examples are shown in Figure 3. The soft tissue landmarks from glabella to the terminal point of the lower neck were connected by applying the natural cubic spline function so that those landmarks could represent a smooth curve. Although both prediction results deviated from the actual profile after growth, AI-based predictions generally appeared to be closer to the actual profile.


Citation: The Angle Orthodontist 95, 3; 10.2319/082124-687.1
DISCUSSION
Research on growth prediction has not been actively conducted for about two decades.12 The complexity of predicting craniofacial growth with significant individual variation might have contributed to the lack of active research in this area. However, recent advancements in high-performance computing capable of handling the large computational demands of sophisticated algorithms have enabled the inclusion of large number of variables to develop more customized growth prediction models. This study used TabNet, one of the DNN algorithms,33 to address challenges in growth prediction. The results indicated that growth prediction remains challenging, as larger error values were observed for specific landmarks in some subjects. Nevertheless, this study utilized the best available growth data and current technologic advancements to identify methods that were more effective in predicting individual growth patterns.
Overall, AI predicted growth more accurately than PLS. However, accuracy varied according to the landmarks being predicted. Regarding performance in predicting various landmarks, the PLS was comparable to AI in landmarks with minor growth variations, whereas AI was more accurate in areas with significant variability, consistent with findings from a previous study.29 Among the 77 cephalometric landmarks, the PLS-based prediction demonstrated higher accuracy in 13 landmarks, primarily cranial base landmarks such as Nasion, Porion, and Basion. Additionally, PLS was more accurate in predicting Articulare and Condylion, aligning with expected outcomes since the positions of these mandibular landmarks are determined by the cranial base. Although statistically significant, the difference in errors between the two methods for these 13 landmarks averaged 0.45 mm, all being less than 1 mm. On the other hand, AI demonstrated greater prediction accuracy for 78% of the landmarks, including most of the landmarks in the lower part of the craniofacial structure and soft tissue. Soft tissue growth is more challenging to predict than skeletal growth, due to the influence of unpredictable factors such as posture and tonicity. This trend suggests that the development of AI growth prediction models can be beneficial for areas with greater variability in which predictions are more challenging.
In terms of model training, higher training epochs led to more accurate predictions with AI. This study was set to 100 and 1000 epochs, although previous publications reported up to 10,000 sessions for AI training.34 However, if 10,000 sessions had been included, a single computation could have taken several months.34 Figure 4 shows that, while the prediction error decreases as the number of sessions increases from 100 to 1000 (Figure 4A), the decrease in error was not significant beyond 800 to 1000 sessions (Figure 4B). As no substantial improvement in prediction performance was expected beyond 1000 sessions, this study chose 1000 epochs to achieve acceptable accuracy with a reasonable input of resources.


Citation: The Angle Orthodontist 95, 3; 10.2319/082124-687.1
The errors from the models developed in this study were smaller than the errors of previously developed models.29 In a prior study using longitudinal data from 410 subjects, yielding 679 pairs of before and after growth data, PLS exhibited errors 2.11 mm greater than those of the AI method, which showed an average error of 2.78 mm.29 In this study, 33 subjects from longitudinal craniofacial growth records, resulting in 1257 pairs of before and after growth data, were included. PLS showed errors 0.61 mm greater than those of AI, which had an average error of 1.58 mm. Given that the inter-examiner error of cephalometric tracing was reported to be 1.5 ± 1.5 mm,24 the errors from this study were considered clinically acceptable. However, in some subjects or landmarks, the predicted landmarks still showed larger errors (Figure 2), partly due to the inherent nature of landmark location. These deviations may not be significant as long as the predicted landmark positions fall within the traced lines. In Figure 3A, despite a larger AI prediction error of 5.84 mm in soft tissue menton, the predicted lower mandibular soft tissue profile remained close to the actual profile.
Currently, collecting longitudinal growth data is challenging due to ethical concerns. Meanwhile, the AAOF Craniofacial Growth Legacy Collection compiles nine of the 11 recognized longitudinal collections of craniofacial growth records in the United States and Canada.31 Presently, approximately 20,000 digital images from 842 subjects are available on the AAOF website, which could facilitate further development of growth prediction methods using AI. The use of growth collections in this paper involved a significantly larger number of pairs of growth data than previous studies, offering a better representation of the general population. However, this study only included Class I and Class II subjects, whereas growth of Class III subjects is expected to be more challenging to predict due to increased mandibular growth. This limitation can be addressed by incorporating additional growth collections. Additionally, the AAOF growth collections predominantly consist of subjects of European descent, and the effects of different ethnicities on prediction accuracy still needs to be explored.
CONCLUSIONS
AI has shown to be an effective growth prediction method, with clinically acceptable prediction errors averaging 1.49 mm for 45 hard tissue landmarks, and 1.71 mm for 32 soft tissue landmarks.
Among AI prediction models, those with increased training epochs showed improved prediction performance, but there was no significant improvement beyond 1000 epochs.
AI generally outperformed PLS, particularly for landmarks in the lower part of the craniofacial structure and soft tissue, where uncertainty is considerable.

Longitudinal serial growth data source: the University of the Pacific Mathews Growth Study on the AAOF Craniofacial Growth Legacy Collection website, https://www.aaoflegacycollection.org/aaof_collection.html?id = UOPMathews (A). Four fiducial points and 78 cephalometric landmarks that were manually identified using a computer vision annotation tool (B).

Scatter plots presenting errors and 95% confidence ellipses for the three prediction models. Green, PLS; Blue, AI developed from the number of training epochs 100; Red, AI developed from the number of training epochs 1000. (A) Hard tissue landmarks better predicted by the PLS model; (B) Hard tissue landmarks better predicted by the AI models; (C) Soft tissue landmarks.

Example of profile predictions for patients included in the study. White, initial; Yellow, actual profile after growth; Red, PLS prediction; Blue AI. (A) a male subject from 10 y 7 mo to 15 y 0 mo; (B) a female subject from 9 y 4 mo to 14 y 9 mo; (C) a female subject from 6 y 8 mo to 12 y 1 mo.

Scatter plots presenting errors and 95% confidence ellipses for prediction models from different training epochs. (A) prediction error decreases as the number of sessions increases from 100, 200, 400 to 1000; (B) prediction error does not decrease significantly beyond 800 sessions.
Contributor Notes