Editorial Type:
Article Category: Research Article
 | 
Online Publication Date: 24 Apr 2012

Linear measurements using virtual study models
A systematic review

,
,
,
,
,
, and
Page Range: 1098 – 1106
DOI: 10.2319/110311-681.1
Save
Download PDF

Abstract

Objective:

To perform a systematic review of the literature to assess the reliability and validity of linear measurements using virtual vs plaster study models.

Materials and Methods:

A search strategy was developed for four online databases, and references were further hand searched for studies additional papers. Three researchers determined the eligibility of papers by applying specific selection criteria and ultimately selected 17 papers. Grouped by virtual model acquisition type and the number of landmarks used in a given measurement, the data were weighted by sample size and analyzed in terms of the reliability and validity of linear measurements.

Results:

The intrarater reliability was high for two-landmark and >two-landmark linear measurements performed on laser-acquired models or cone-beam computed tomography (CBCT)–acquired models and were similar to measurements on plaster models. Validity was high for two-landmark and >two-landmark linear measurements comparing laser-acquired models or CBCT-acquired models to plaster study models, and the weighted mean differences were clinically insignificant. Agreement of measurements was excellent, with less variability than correlation. Acquisition type had no perceived influences on reliability and validity. More than two-landmark measures tended to have higher mean differences than two-landmark measures.

Conclusions:

Virtual study models are clinically acceptable compared with plaster study models with regard to intrarater reliability and validity of selected linear measurements.

INTRODUCTION

A key process in diagnosis and treatment planning in dentistry is the study model analysis (SMA). In performing a SMA, common diagnostic parameters are measured on dental models, such as overjet, intermolar width, and arch perimeter. The current gold standard for SMA involves plaster casts measured with calipers. In recent decades, three-dimensional (3D) virtual study models have made headway into dentistry.

Available literature on 3D virtual dental study models has largely focused on those acquired by laser,115 while others have investigated holographic scanning,16 stereophotogammetry capture,17 and, more recently, cone-beam computed tomography (CBCT).1820

Numerous studies have investigated the validity and reliability of linear measurements made on plaster vs virtual study models, but a systematic review has not been performed to collectively summarize their conclusions. To our knowledge, the only systematic review on virtual study models by Fleming et al.21 summarized assessments of validity but not reliability. Demonstrated reliability in repeated measurements within virtual models and plaster separately are necessary before interpreting validity between the two modalities.

The aims of this study were to perform a systematic review of the literature to assess the validity and reliability of linear measurements using virtual vs plaster dental study models, grouping our analysis by virtual model acquisition type and the number of landmarks used in a given measurement.

MATERIALS AND METHODS

The PICO22 search strategy was adopted for this study, and the resulting search string was tailored for PubMed (from 1966 to May 16, 2010) and adapted with no limits for the following online databases: OVID Medline, OVID–All EBM Reviews, and Lilacs. The PubMed search was later updated to December 8, 2011.

Eligibility of selected articles was determined in four phases. Selection of articles at each stage was performed by three researchers. Discrepancies between researchers' assessment of eligibility were verbally discussed, and final selections were agreed on by majority vote. All non-English papers selected at each stage were appropriately translated.

In phase 1 of the selection process, from the electronic database results, the titles and abstracts were screened with the following selection criterion: main focus on the assessment of linear measurements in 3D virtual models of the human dentition.

In phase 2 of the selection process, the full articles from those studies selected in phase 1 were retrieved where possible, and the following selection criteria were applied: validity and reliability measures provided, gold standard measurements taken from plaster casts, and the study with a minimum sample size of 10.

In phase 3 of the selection process, the reference lists from the selected articles in phase 2 were screened with the same selection criteria as phase 1.

In phase 4 of the selection process, the retrievable articles from phase 3 were assessed with the same selection criteria as phase 2.

In this systematic review, the important measures were reliability and validity. Reliability refers to the consistency with which a measurement can be made, and this was assessed by reports of mean difference, agreement (intraclass correlation coefficient, [ICC]), and correlation (Pearson's correlation coefficient [PCC]) of repeated measures using virtual and plaster models. Validity refers to the ability to truly measure what is intended, and this was also assessed using measures of mean difference, agreement (ICC), and correlation (PCC) between virtual and plaster models.

Relevant data were tabulated in a spreadsheet using Excel 2007 (Microsoft, Redmond, Wash). For both validity and reliability, the data were weighted by sample size and analyzed by descriptive statistics.

Weighted means allowed us to pool the results from studies that had relatively lower sample sizes while allowing those studies with higher sample sizes to contribute more to the findings of this systematic review. In the calculation of weighted mean differences, as an example, individual mean differences multiplied by their respective sample sizes, as reported in the study, were added together and then divided by the total sum of the associated sample sizes. Weighted ICC and weighted PCC were calculated in a similar manner.

Of the selected articles, interrater reliability1,8,15,19 was uncommonly reported, so only intrarater reliability2,3,5,8,9,13,1720 in terms of mean differences, ICC, and PCC were tabulated. Other reported measures of reliability,4,6,7,1012,14,16 such as standard deviations, random error, or statements confirming tests of repeated measurements, were also accepted but not summarized. Furthermore, because reliability is always within a single modality (ie, within plaster models or virtual models alone), weighted mean differences were calculated by first converting reported differences into absolute values.

The parameters summarized in this systematic review were, by inspection, the most commonly reported of the selected articles. The parameters that could not be categorized under one of the commonly reported linear parameters, but were nonetheless reported in the literature, were noted but not summarized in this paper.

In this systematic review, we set clinically relevant thresholds for mean differences for two-landmark linear measurements at 0.5 mm and at 2.0 mm for linear measurements based on more than two landmarks.2,5,8

Data for all virtual study models were grouped to investigate any differences between virtual model acquisition types. The collected data were also grouped to investigate differences between two-landmark and >two-landmark linear measurements.

RESULTS

A flow chart of the selection process is illustrated in Figure 1. The initial search strategy revealed 278 potential articles from electronic databases, and 59 articles were chosen based on the titles and abstracts; subsequently, 20 were selected after reading the entire articles. From these 20 articles, 238 unique references were identified, from which 62 retrievable articles were screened, but ultimately no additional articles were selected from the hand searches. Three articles that were ultimately excluded9,16,17 had initially satisfied the selection criteria at each phase. One study assessed virtual models of neonatal cleft palate9 patients without any erupted teeth. Another study investigated virtual models acquired by holographic scanning,16 but the paper was published two decades ago. Similarly, the study on models acquired by sterophotogammetry17 has not been revisited for almost a decade. The updated PubMed search of December 8, 2011, identified an additional three potential abstracts. Of these, one would be rejected because it did not fulfill the final selection criteria,23 and two were in Chinese, and although we attempted to contact the authors, we were not able to obtain copies of these articles.24,25 A final total of 17 articles were selected for this review.

Figure 1. Flow chart of the selection process.Figure 1. Flow chart of the selection process.Figure 1. Flow chart of the selection process.
Figure 1 Flow chart of the selection process.

Citation: The Angle Orthodontist 82, 6; 10.2319/110311-681.1

Intrarater reliability for both plaster (Table 1) and laser-acquired (Table 2) study models were reported for all of the common two-landmark and >two-landmark measurements. All weighted mean differences were less than 0.5 for the two-landmark parameters and less than 1.5 mm for the >two-landmark parameters. For repeated measurements in plaster, ICC values were about .85 for all two-landmark parameters and greater than .98 for crowding; similarly, PCC values were greater than .91 for two-landmark parameters and greater than .96 for arch perimeter. For repeated measurements in laser-acquired models, ICC values were near .99. Although the intrarater reliability data for CBCT-acquired models will not be presented in a table because of insufficient comparative data, ICC values from two studies19,20 were greater than .80, and PCC values from the third study18 were well above .90, which suggested good agreement and excellent correlation of repeated measures.

Table 1 Intrarater, Plaster Study Models: Mean Difference, Agreement, and Correlation Values Weighted by Sample Size With Standard Deviations (Where Possible to Calculate) Shown for Most Commonly Reported Parametersa
Table 1
Table 2 Intrarater, Laser-Acquired Virtual Models: Mean Difference, Agreement, and Correlation Values Weighted by Sample Size With Standard Deviations (Where Possible to Calculate) Shown for Most Commonly Reported Parametersa
Table 2

The validity of commonly reported linear parameters subgrouped by two-landmark and >two-landmark measurements between plaster and specific acquisition types, laser acquired or CBCT acquired, are presented in Tables 3 and 4, respectively.

Table 3 Validity, Laser-Acquired vs Plaster: Mean Difference, Agreement, and Correlation Values Weighted by Sample Size With Standard Deviations (Where Possible to Calculate) Shown for Most Commonly Reported Parametersa
Table 3
Table 4 Validity, CBCT-Acquired vs Plaster: Mean Difference, Agreement, and Correlation Values Weighted by Sample Size With Standard Deviations (Where Possible to Calculate) Shown for Most Commonly Reported Parametersa
Table 4

For laser-acquired study models, the mean differences compared with plaster study models were well below 0.5 mm for two-landmark measures and less than 1 mm for >two-landmark measures. Most parameters were reported in terms of ICC with weighted values that tended to be greater than .90.

The virtual study models acquired by CBCT scanning had mean differences compared with plaster study models of less than 0.5 mm for two-landmark measures. None of the articles included in this systematic review reported mean differences for >two-landmark measures. Although none of the articles reported ICC values, weighted PCC values from one study18 ranged from .62 to .99.

DISCUSSION

Virtual study models acquired by laser scanning represented 14 of the 17 selected articles, while those acquired by CBCT scanning were reported in the remaining three. The number of good-quality studies on laser-acquired study models is remarkable, but emerging approaches using CBCT show promise. However, two19,20 of the selected studies using CBCT still required impressions, so errors may be replicated1 as the process goes from the mouth to alginate impressions and finally to virtual models. The reliability and validity of newer approaches that generate virtual study models from direct CBCT scans of the patient's mouth compared with the gold standard plaster models have yet to be reported.

This systematic review and the one by Fleming et al.21 selected 17 articles each. However, slight differences in our selection criteria resulted in our studies selecting only nine articles58,10,11,13,15,20 in common. We chose to focus on quantitative linear measurements only; therefore, we had rejected some of the articles that Fleming chose to include because they focused on PAR,26 ABO,2729 or ICON30 scores, which are qualitative ordinal measures. We also rejected an article31 that Fleming accepted because we found no reports on reliability of repeated measurements. Of the articles that Fleming chose to exclude, we chose to accept two studies that used artificial occlusal setups1,14 since they are assessments of linear measurements nonetheless and another study that placed marking points on the casts in black pen2 since those points did not affect the parameters that we chose to summarize. Finally, our search strategy selected an additional five relevant articles3,4,12,18,19 as of May 2010 that were not mentioned by Fleming's systematic review, three4,12,19 of which were published by the time their search was conducted in January 2010.

By inspection, the most commonly reported two-landmark linear parameters were overjet; overbite; maxillary and mandibular mesiodistal tooth sizes from first molar to first molars, inclusive; as well as maxillary and mandibular intermolar and intercanine widths. The commonly reported >two-landmark linear parameters were maxillary and mandibular arch perimeter and crowding, as well as Bolton anterior and Bolton overall discrepancies.

A full study model analysis should also involve categorical parameters, such as Angle's classification, but good-quality studies incorporating these were infrequently reported. Future studies should investigate the reliability and validity of categorical parameters.

Reliability

Intrarater reliability of repeated measures on plaster study models as well as virtual study models for two-landmark measures showed clinically insignificant mean differences at the 0.5-mm threshold, while both agreement and correlation were good to excellent for the parameters that were reported. For >two-landmark measures, mean differences were below the 2-mm threshold, indicating clinically insignificant differences in repeated measures as well as excellent agreement and correlation. Intrarater reliability, then, was good to excellent for virtual study models, and the same can be said for plaster as the differences in repeated measurements of both two-landmark and >two-landmark linear parameters were judged to be clinically insignificant.

Validity

The validity of virtual compared with plaster study models for all two-landmark and >two-landmark linear parameters showed clinically insignificant mean differences. This agrees with the findings of Fleming et al.,21 who reported that virtual models offer a high degree of validity when compared with direct measurement on plaster models. Compared with plaster, for two-landmark parameters, there was excellent agreement using laser-acquired models, while correlation using CBCT-acquired models ranged from poor to excellent. In contrast, Fleming did not summarize agreement in terms of ICC or PCC values.

Overjet, overbite, and all tooth width measurements from first molar to first molar using laser-acquired study models were clinically insignificant compared with plaster, but the negative weighted mean differences suggested a tendency toward larger measurements on plaster models. Intermolar and intercanine distances on laser-acquired models, however, had a tendency toward smaller measurements on plaster, but again, the weighted mean differences were clinically insignificant. Similarly, differences in arch perimeter, crowding, and Bolton measurements were clinically insignificant. Agreement for all two-landmark measures and arch crowding were excellent.

Compared with the compiled data from articles on laser-acquired study models, which had combined sample sizes that ranged from 100 to 204 per parameter, the data on CBCT-acquired study models had relatively smaller sample sizes that ranged from 15 to 40. As observed with laser-acquired study models, the weighted mean differences were all negative, indicating a tendency toward larger measurements on plaster, but this finding had no clinical relevance. Correlation of CBCT-acquired study models compared with plaster was poor for mesiodistal measurements of teeth 1-5 and 4-1, moderate for teeth 1-4, 1-6, 2-5, 3-1, 3-3, 3-4, 4-3, and good or better for all remaining two-landmark and arch perimeter measures. There was no obvious explanation for this variation in correlation.

Influence of Acquisition Type on Reliability and Validity

There were no perceived differences in intrarater reliability and validity across the various acquisition types. The variation in correlation for two-landmark measures from CBCT-acquired models was the only inconsistent finding, but further independent studies are required to confirm this. Aside from this possibly anomalous finding, overall the mean differences were clinically insignificant, and the correlation and agreement were good to excellent. These findings were consistent across laser-acquired and CBCT-acquired virtual models compared with plaster.

Influence of the Number of Landmarks in a Measurement on Validity and Reliability

In magnitude, there was a tendency for the reliability and validity of two-landmark measures to have smaller mean differences than >two-landmark measures, regardless of acquisition type. For example, for the two-landmark parameters, repeated tooth width measurements in plaster showed less than 0.1-mm absolute difference, while overjet, overbite, and intermolar and intercanine distances had double the absolute differences but less than 0.2 mm. For >two-landmark parameters, differences in arch perimeter, crowding, and Bolton discrepancies ranged higher than 0.2 mm, up to 0.7 mm. Although these findings were not clinically significant, this pattern for increasing absolute difference relative to the number of landmarks could be detected by inspection for repeated measurements in laser-acquired models as well.

CONCLUSIONS

  • The intrarater reliability was high for two-landmark and >two-landmark linear measurements performed on laser-acquired models or CBCT-acquired models and similar to measurements on plaster models.

  • The validity was high for two-landmark and >two-landmark linear measurements comparing laser-acquired models or CBCT-acquired models to plaster study models, and the weighted mean differences were clinically insignificant.

  • Agreement of measurements was excellent with less variability than correlation.

  • Acquisition type had no perceived influences on reliability and validity.

  • >Two-landmark measures tended to have higher mean differences than two-landmark measures.

  • Virtual study models are clinically acceptable compared with plaster study models with regard to intrarater reliability and validity of selected linear measurements.

REFERENCES

  • 1

    Alcan, T.
    ,
    C.Ceylanoğlu
    , and
    B.Baysal
    . The relationship between digital model accuracy and time-dependent deformation of alginate impressions.Angle Orthod2009. 79:3036.

  • 2

    Asquith, J.
    ,
    T.Gillgrass
    , and
    P.Mossey
    . Three-dimensional imaging of orthodontic models: a pilot study.Eur J Orthod2007. 29:517522.

  • 3

    Bootvong, K.
    ,
    Z.Liu
    ,
    C.McGrath
    , et al. Virtual model analysis as an alternative approach to plaster model analysis: reliability and validity.Eur J Orthod2010. 32:589595.

  • 4

    Cha, B. K.
    ,
    J. I.Choi
    ,
    P. G.Jost-Brinkmann
    , and
    Y. M.Jeong
    . Applications of three-dimensionally scanned models in orthodontics.Int J Comput Dent2007. 10:4152.

  • 5

    Goonewardene, R. W.
    ,
    M. S.Goonewardene
    ,
    J. M.Razza
    , and
    K.Murray
    . Accuracy and validity of space analysis and irregularity index measurements using digital models.Aust Orthod J2008. 24:8390.

  • 6

    Horton, H. M.
    ,
    J. R.Miller
    ,
    P. R.Gaillard
    , and
    B. E.Larson
    . Technique comparison for efficient orthodontic tooth measurements using digital models.Angle Orthod2009. 80:254261.

  • 7

    Keating, A. P.
    ,
    J.Knox
    ,
    R.Bibb
    , and
    A. I.Zhurov
    . A comparison of plaster, digital and reconstructed study model accuracy.J Orthod2008. 35:191201.

  • 8

    Mullen, S. R.
    ,
    C. A.Martin
    ,
    P.Ngan
    , and
    M.Gladwin
    . Accuracy of space analysis with emodels and plaster models.Am J Orthod Dentofacial Orthop2007. 132:346352.

  • 9

    Oosterkamp, B. C.
    ,
    W. J.van der Meer
    ,
    M.Rutenfrans
    , and
    P. U.Dijkstra
    . Reliability of linear measurements on a virtual bilateral cleft lip and palate model.Cleft Palate Craniofac J2006. 43:519523.

  • 10

    Quimby, M. L.
    ,
    K. W. L.Vig
    ,
    R. G.Rashid
    , and
    A. R.Firestone
    . The accuracy and reliability of measurements made on computer-based digital models.Angle Orthod2004. 74:298303.

  • 11

    Santoro, M.
    ,
    S.Galkin
    ,
    M.Teredesai
    ,
    O. F.Nicolay
    , and
    T. J.Cangialosi
    . Comparison of measurements made on digital and plaster models.Am J Orthod Dentofacial Orthop2003. 124:101105.

  • 12

    Sjogren, A. P.
    ,
    J. E.Lindgren
    , and
    J. A.Huggare
    . Orthodontic study cast analysis—reproducibility of recordings and agreement between conventional and 3D virtual measurements.J Digit Imaging2010. 23:482492.

  • 13

    Stevens, D. R.
    ,
    C.Flores-Mir
    ,
    B.Nebbe
    ,
    D. W.Raboud
    ,
    G.Heo
    , and
    P. W.Major
    . Validity, reliability, and reproducibility of plaster vs digital study models: comparison of peer assessment rating and Bolton analysis and their constituent measurements.Am J Orthod Dentofacial Orthop2006. 129:794803.

  • 14

    Zilberman, O.
    ,
    J. A.Huggare
    , and
    K. A.Parikakis
    . Evaluation of the validity of tooth size and arch width measurements using conventional and three-dimensional virtual orthodontic models.Angle Orthod2003. 73:301306.

  • 15

    Leifert, M. F.
    ,
    M. M.Leifert
    ,
    S. S.Efstratiadis
    , and
    T. J.Cangialosi
    . Comparison of space analysis evaluations with digital models and plaster dental casts.Am J Orthod Dentofacial Orthop2009. 136:16e116e4.

  • 16

    Miras, D.
    and
    F. G.Sander
    . The accuracy of holograms compared to other model measurements [in German].Fortschr Kieferorthop1993. 54:203217.

  • 17

    Bell, A.
    ,
    A. F.Ayoub
    , and
    P.Siebert
    . Assessment of the accuracy of a three-dimensional imaging system for archiving dental study models.J Orthod2003. 30:219223.

  • 18

    El-Zanaty, H. M.
    ,
    A. R.El-Beialy
    ,
    A. M.Abou El-Ezz
    ,
    K. H.Attia
    ,
    A. R.El-Bialy
    , and
    Y. A.Mostafa
    . Three-dimensional dental measurements: an alternative to plaster models.Am J Orthod Dentofacial Orthop2010. 137:259265.

  • 19

    Naidu, D.
    ,
    J.Scott
    ,
    D.Ong
    , and
    C. T.Ho
    . Validity, reliability and reproducibility of three methods used to measure tooth widths for bolton analyses.Aust Orthod J2009. 25:97103.

  • 20

    Watanabe-Kanno, G. A.
    ,
    J.Abrão
    ,
    H.Miasiro Junior
    ,
    A.Sánchez-Ayala
    , and
    M. O.Lagravère
    . Reproducibility, reliability and validity of measurements obtained from Cecile3 digital models.Braz Oral Res2009. 23:288295.

  • 21

    Fleming, P. S.
    ,
    V.Marinho
    , and
    A.Johal
    . Orthodontic measurements on digital study models compared with plaster models: a systematic review.Orthod Craniofacial Res2011. 14:116.

  • 22

    Sutherland, S. E.
    Evidence-based dentistry: part I. Getting started. J Can Dent Assoc 2001. 67:204206.

  • 23

    Tarazona, B.
    ,
    J. M.Llamas
    ,
    R.Cibrian
    ,
    J. L.Gandia
    , and
    V.Paredes
    . A comparison between dental measurements taken from CBCT models and those taken from a digital method.Eur J OrthoMarch 22, 2011, Epub ahead of print.

  • 24

    Hu, X. Y.
    ,
    X. G.Pan
    ,
    W. L.Gao
    , and
    Y. M.Xiao
    . The reliability and accuracy of the digital models reconstructed by cone-beam computed tomography.Shanghai Kou Qiang Yi Xue2011. 20:512516.

  • 25

    Wu, J. C.
    ,
    J. N.Huang
    , and
    X. J.Xu
    . A pilot study on the accuracy of digital model reconstructed by cone beam CT.Shanghai Kou Qiang Yi Xue2010. 19:456469.

  • 26

    Mayers, M.
    ,
    A. R.Firestone
    ,
    R.Rashid
    , and
    K. W.Vig
    . Comparison of peer assessment rating (PAR) index scores of plaster and computer-based digital models.Am J Orthod Dentofacial Orthop2005. 128:431434.

  • 27

    Costalos, P. A.
    ,
    K.Sarraf
    ,
    T. J.Cangialosi
    , and
    S.Efstratiadis
    . Evaluation of the accuracy of digital model analysis for the American Board of Orthodontics objective grading system for dental casts.Am J Orthod Dentofacial Orthop2005. 128:624629.

  • 28

    Okunami, T. R.
    ,
    B.Kusnoto
    ,
    E.BeGole
    ,
    C. A.Evans
    ,
    C.Sadowsky
    , and
    S.Fadavi
    . Assessing the American Board of Orthodontics objective grading system: digital vs plaster dental casts.Am J Orthod Dentofacial Orthop2007. 131:5156.

  • 29

    Hildebrand, J. C.
    ,
    J. M.Palomo
    ,
    L.Palomo
    ,
    M.Sivik
    , and
    M.Hans
    . Evaluation of a software program for applying the American Board of Orthodontics objective grading system to digital casts.Am J Orthod Dentofacial Orthop2008. 133:283289.

  • 30

    Veenema, A. C.
    ,
    C.Katsaros
    ,
    S. C.Boxum
    ,
    E. M.Bronkhorst
    , and
    A. M.Kuijpers-Jagtman
    . Index of complexity, outcome and need scored on plaster and digital models.Eur J Orthod2009. 31:281286.

  • 31

    Redlich, M.
    ,
    T.Weinstock
    ,
    Y.Abed
    ,
    R.Schneor
    ,
    Y.Holdstein
    , and
    A.Fischer
    . A new system for scanning, measuring and analyzing dental casts based on a 3D holographic sensor.Orthod Craniofac Res2008. 11:9095.

Copyright: The EH Angle Education and Research Foundation, Inc.
Figure 1
Figure 1

Flow chart of the selection process.


Contributor Notes

Corresponding author: Dr Paul W Major, Faculty of Medicine and Dentistry, Room 5-478, Edmonton Clinic Health Academy, University of Alberta, 11405-87 Avenue, Edmonton, Alberta, Canada, T6G 1C9 (e-mail: major@ualberta.ca)
Received: 01 Nov 2011
Accepted: 01 Mar 2012
  • Download PDF