Cite this asWälter A, Möltner A, Böckers A, Rüttermann S, Gerhardt Szép S (2018) Video-based assessment of practical operative skills for Undergraduate dental students. Trends Comput Sci Inf Technol 3(1): 005-014. DOI: 10.17352/tcsit.000007
Introduction: The aim of this study is to evaluate, within the scope of an experimental design, to what extent the assessment of two different settings of prepared cavities, based on video sequences, containing digital analysis tools of the prepCheck software, as well as to what extent they deviate from one another and are reliable.
Materials and Methods: For this prospective, single-centred, experimental study, 60 examination cavities related to a ceramic inlay preparation were assessed by four trainers in two different settings (A: video film versus B: video film plus an analogue model assessment) by using a standard checklist. The examined parameters contained: the 1. preparation / outer edges, 2. surface & smoothness / inner edges, 3. width & depth, 4. slide-in direction, 5. outer contact positioning and 6. overall grade on a Likert scale of 1 = ‘excellent’, 2 = ‘very good’, 3 = ‘good’, 4 = ‘satisfactory’ to 5 = ‘unsatisfactory’. An evaluation questionnaire with 33 items was additionally addressed to the concept of application of a digital-analytic software. The statistical analysis, using SAS 9.2 (SAS Institute Inc., Cary, USA, PROC MIXED) and R (Version 2.15, Package lme4) concerned the reliability, inter-rater correlation and significant factors at a p of 0.05.
Results: The assessment of the individual criteria and overall grade of the control group (A) were, on average, lower (i.e. better) than in the study group (B), yet with the exception of the ‘outer contact positioning’, without conclusive statistical significance. The reliability lay at an average of α=0.83 (A) and α=0.79 (B). The maximum reliability of the criteria ‘preparation edge’, ‘surface’, ‘width & depth’ as well as ‘overall grade’ were reasonable in the assessment mode, with α > 0.7. The inter Video-based Assessment 3 rater correlation was at an average of 0.43 < r < 0.74 higher in assessment mode A than B that comprised 0.35 < r < 0.60.
Conclusion: The current examination shows an average reliability in the assessment mode A that exceeds the requirements for practical examination (α ≥ 0.6) and also fulfils the general requirements for ‘high-stake’ examinations of α ≥ 0.8.
Video-Based Assessment; Dental; Objective Evaluation; Practical Skills; Checklist; Performance; Operative Dentistry; OSPE; Undergraduate degree
The practical development of skills, i.e. the process of gaining expertise in procedures and techniques required for operative dentistry, incorporates a fundamental part of any study in dentistry. In this way, the aim of the sixth semester of the dental education is also, in scope of the phantom course of operative dentistry, to optimally prepare students for the treatment of patients. Above all in regards to cavity preparation, which is one of the basic competencies required by any dentist later in their career, the students were unsure of themselves at first. These, above all include parameters such as cavity depth, i.e. width, surface smoothness and the cavity edge form . Deviating assessments provided by different trainers may lead to frustration and confusion among students . Assistance should be provided by means of modern media, such as computer-generated digital analysis tools [3-5], of which the implementation in the dental curriculum was supported in many different ways [5-8]. To achieve this, the services provided by students are monitored and assessed by trainers at various times, both formatively and summatively. Ideally, therefore, all of the assessments should incorporate all characteristics of reliability, validity, responsibility, flexibility, comprehension, implementation ability and relevance [9, 10].
The assessment of practical skills in dental and medical schools requires considerable time and effort on part of a supervising faculty [1, 11, 12]. The live assessment of these skills poses a significant problem of resources to dentistry schools and results in complications arising in the execution and scheduling of their daily activities . Dental literature, on the other hand, acknowledges the need for the objective assessment of skills in operative training [1, 10]. Structured grading systems, such as the Objective Structured Practical Examinations (OSPE) or OSCEs (Objective Structured Clinical Examinations) were specifically designed to reduce subjectivity [1,10,13,14]. In order to fulfil the general requirements of ‘high-stake’ examinations, a specific number of examiners as well as checklists, should be implemented in the assessment of cavities in an OSPE-design 1.
A major disadvantage of requiring live assessment is the substantial demand on time and resources involved in getting several staff members to observe and assess students’ performance 1. As an alternative, supervisors could use videos that reduce some of the logistical overhead . Video-based assessment allow raters to be blind to certain aspects of the performance, such as the identity of trainees, that may otherwise engender bias in rating [11, 12, 15-23]. It further facilitates a more detailed review of a learner’s performance and provides additional time for the rater to fully focus on the performance of a trainee . In addition, videos can be reviewed several times, by several different raters. Finally, trainees can review video recordings themselves and thus be given the opportunity of enhancing their learning through debriefing methods .
Such video-based assessments have already been implemented and evaluated in various surgical subject areas [14, 15, 17, 21, 22]. Most of them conclude that videobased assessment provides an efficient and reliable educational environment with satisfactory rater consistency and evidence for validity [12, 14, 17-20, 23].
The type of video assessment for cavity preparations in dental medicine most suitable for ‘high-stake’ examinations, has not been clarified in any work of literature, up to now. It is also not clear, whether the video assessment of dental cavities (in a simulation model) alone, i.e. their additionally-regarded manual-analogue component, may possibly imply a difference in grading. The additional consideration of models also leads to a higher demand in time and personnel, as this must be carried out individually and assessed by the examiner him or herself. This setting, namely that each model must be individually assessed by examiners and consequently evaluated within a unanimous procedure, where a final grade is allocated, describes the current stand in the type of situation for grading. Optimally, we assume that three to four examiners are required here .
To address this gap, this study aims to compare two different settings for the videobased practical operative skills evaluation, including an analysis tool. In an experimental design, it should be evaluated to what extent different assessments of prepared cavities based on sequences of videos containing digital analysis tools, deviate from one another and the reliability that each possesses. In the study control group (Part A), examiners assessed examination activities, which they observed in a video that illustrated various parameters of a digital analysis tool. Finally, the examiners received the opportunity in Part B (study group), to additionally regard the real examination activity model themselves and modify their previously-provided video assessment. Two main research questions should be answered for the examination:
1. Do the various modes of assessment used in both examined settings (control and study group) affect their reliability?
2. What influence do the different modes of assessing the examined settings (control and study group) have on the overall assessment of study participants (trainers)?
In addition, we were interested in the evaluation of study participants, with regards to the application concept of the digitally-analysed software and study procedure.
This is a prospective, single-centered experimental study conducted at the Goethe University, in Frankfurt. The application for ethical approval was received on 9th August 2015 and approved by the Ethics Committee of the Department of Medicine at the Goethe Universität Frankfurt-on-Main on 9th September 2015, holding the reference number 302/2015. The examination period was provided as being between 7th October 2015 and 13th October 2016.
The criteria for selecting suitable study participators (examiners) was determined in the run-up. Their inclusion criteria included belonging to the department of operative dentistry and the fact that they proved to have little or no experience of the PrepCheck software (had only worked with it up to ten times). In preparation of the study, the trainers were first prepared for the assessment scenario through two trainthe-teacher events and their evaluation skills calibrated. The exact time frame is represented in (Table 1). For the experimental study, 60 cavity preparations were assessed by 4 trainers in two different assessment modes (Part A and B). The exact time frame is represented in (Table 2).
The cavity preparations were scanned using the CEREC-Omnicam (Dentsply Sirona, York, USA) that in the scope of an obligatory and summative OSPE examination of students in their 6th study semester, was submitted as examination material (in the winter semester 2013/2014 and summer semester 2014) and graded by the trainers.
This dealt with distal occlusion preparations for ceramic inlays in premolar teeth. These examinations lay three semesters behind at the time of the study, so that the study participants (examiners) had no memory of either the grades provided, nor of the students whose examination papers these represented.
The assessment of the cavities occurred by means of checklists comprised by Schmitt et al. 2016 1, in support of the study. These incorporated five items (1. preparation edge / outer edges, 2. surface & smoothness / inner edges, 3. width & depth, 4. slide-in direction, 5. outer contact positioning and 6. overall grade). The individual assessments (Table 1) were indicated on a Likert Scale of 1 = excellent, 2 = very good, 3 = good, 4. = satisfactory to 5 = unsatisfactory (Table 3, Figures 1-7). After completion of the assessments, the examiners were questioned on general matters (n = 3), by means of an evaluation questionnaire containing 33 items, such as age, gender, teaching experience, the application concept of the digital-analytical software (n = 17), individual assessment preferences (n = 3), study procedure (n = 10) (Tables 6 and 7). Freely-composed commentaries rounded off the evaluation questionnaire.
By means of the Wilcoxon-Matched-Pairs-Test using the Bonferroni version, a case number of n=60 was determined from the results of a preceding train-the-teacher event at α=0.0125 and a probability of P(X+X’>0)=0.25, in order to guarantee a power of 80% for four trainers.
The cavities were randomly allocated to both groups (Parts A and B) of the experiment. The randomisation took place by entering coded models into an online randomizer (https://www.random.org).
The composition of the video of the digitalised teeth was created in the so-called analysis mode of the prepCheck software (Dentsply Sirona, York, USA) and followed by the free-of-charge programme ‘Screencast-O-Matic’ (Softonic International, Barcelona, Spain. Version 2.0). The duration of the individual videos encompassed 122 seconds on average, while they portrayed six different settings that were selected in the run-up of the prepCheck software (Dentsply Sirona, York, USA) (Table 3). A beamer and a screen, as well as a connection to a laptop were required for the videos. The environment of the room for both scenarios (Part A and B) is represented in (Figures 8,9). In Part B, the participants (examiners) agreed on unanimous assessment conditions, in regards to the enlargement aids used (2.7 x with light).
For Part A (control group with a prepCheck video), the participants could enter their assessment questionnaires, while the video was played (Image 1). For this, they had maximum 120 seconds time. The plastic tooth 15 to be assessed that was built-in to a simulation model, indicated an occlusal width of approx. 0.7 x 0.9 cm. The size of the tooth on the screen comprised an average of approx. 50 x 70 cm, which encompassed an approx. 75 x enlargement. For Part B (study group with prepCheck video + consequent models), the preparations to be assessed were maintained in models (tooth model, KaVo Dental GmbH, Biberach, Germany) on a table (Figure 2). At every seat, basic dental utensils (a mirror, probe) were provided that included a lead pencil and cotton wool buds. The examiners used the model with the corresponding reference number and the filled-in checklist with the corresponding individual assessment from Setting A. They examined the already available individual grade and modified these, where necessary. For the assessment, the teeth could be taken out of the models and the preparation edges marked with a lead pencil, where necessary. This was meant to assist in more easily recognising undesired bevels during the preparation. Before completion of the assessment time and prior to their being passed onto the next examiner, the cavities had to be cleaned with a moist cotton wool bud. Maximum 120 seconds was foreseen for the assessment of each model. In the background, a count-down timer ran above the beamer that could be viewed by all participants (Figure 2).
The case number calculation took place in co-operation with the Institute of Biostatics and Mathematical Modelling, in Frankfurt-on-Main. The assessment of the results occurred by means of the statistic programmes SAS 9.2 (SAS Institute Inc., Cary, USA, PROC MIXED) and R (Version 2.15, Package lme4). Basic data was retrieved and an analysis of the similarity of the mean values carried out between the observers (ANOVA for dependent observations, as the same models were used).
Finally, the inter-correlations of the assessments of the four raters were calculated among each other. For the comparison of the assessments in Part A and B, each of the four observer ratings were determined and both parts (A and B) tested using a ttest for paired samples. In order to determine the overall reliability of both scenarios, the six single-assessment parameters were complemented by a further ‘mean’ variable. In addition, a test was carried out to determine the differences between both alpha values for Parts A and B, followed by the reliability test for the ‘mean’ of the grades of both scenarios.
The statistical assessment was carried out in co-operation with the Competence Centre for Examinations in Medicine, Baden-Württemberg of the Medical Faculty, Heidelberg.
The descriptive, statistical assessment of the individual assessment providing the mean value, standard deviation, median, minimum and maximum, as well as the calculation of the reliability, took place simultaneously for all criteria (‘mean’), i.e. separately from one another in regards to the ‘preparation edge/outer edges’, ‘surface & smoothness/inner edges’, ‘slide-in direction’, ‘outer contact positioning’, ‘width & depth’ and ‘overall grade’ (Table 4). In conclusion, the following results can be summarised in the following way: the assessments of the individual criteria and overall grade were in the control group on average lower (i.e. better) than in the study group (prepCheck video + consequent model), however with one exception that showed no statistical significance. For the assessment of the parameter ‘outer contact positioning’, the alpha significantly rose from 0.56 (Part A) to 0.74 for Part B. The results of the inter-rater correlations are outlined in (Table 4).
All distributed assessment and evaluation questionnaires were returned after being filled-in. The exclusion rate lay at 0%. The indications on the included study populations are to be taken from (Table 5). The results of the evaluation can be viewed in (Tables 6,7). An excerpt from the freely-composed commentaries is to be taken from (Table 8).
This study establishes evidence to support the reliability of video-based assessments of operative competency in performing cavity preparations in dentistry. To the best of our knowledge, this is the first study to prospectively compare two different settings of video-based assessments of cavity preparation performance using predefined checklists.
The reliability of this study lay at an average of α=0.79 (Part B: study group) and at α=0.83 (Part A: control group). In other literature, one can find reliability values in the form of Cronbach’s α of around 0.5 for examinations using CAD systems [24-26]. The reliability for OSPE without CAD systems, on the other hand, is depicted between the range of α=0.68 and α=0.87 [1, 10, 27]. The current experimental study, thereby, is more closely aligned with the results of these latter results. Due to the reliability values determined, the setting of Part A could be applied to ‘high-stake’ examinations. Part B only lies slightly below the value of α = 0.8 and requires an additional assessment step beyond the models. Thus, the purely video-based assessment appears far more suitable. For an OSPE, this would mean that one would save on further dental personnel during the examination and could reach a consensus on the grades through four examiners and by means of videos, in considerably less time after the OSPE. This would, however, require the cavities to be scanned-in prior to the assessment. The extent to which the situation of the inlay preparation could also be applied to other examination material, such as, for example, the provision of fillings, would have to be discussed in further studies. In dissent situations surrounding the assessment, digital models (scans) could be useful, particularly, where the assessed working steps would have to be ‘hidden’ throughout the course of an examination. This occurs, for example, when during the application of a restoration, the preceding preparation is ‘hidden’ before the end of an under-filling or at the end of a filling.
As a result, participants, on average, assessed Part A (‘control group’ with prepCheck video) with lower values (i.e. better grades) than Part B (‘study group’ with prepCheck video + consequent model). The overall grade of the control group ended up being 0.39 lower than in the study group. Here, it was interesting to see that the results of the study group merely deviated by 0.07 grade points from the overall grade in the real, live-performed OSPE. Thus, this setting appeared to depict the examination situation most clearly. This can be explained by the fact that the assessment of the live OSPE is equally performed with the aid of models, which resulted in procedural uniformity being detected here.
In studies on video-based examinations, some reliability data is provided in the form of ICC (interclass correlation coefficients) values. LAEEQ, CHEN and SCAFFIDI report an ICC of 0.62 [12, 14, 15]. KATEEB ICC values of 0.47 ≤ r ≤ 0.78 , provided in publications on CAD systems; inter-rater correlations of 0.17 ≤ r ≤ 0.56 are mentioned by ESSER . URBANKOVA determined an ICC value of 0.69 ≤ r ≤ 0.90 . This experimental study is therefore closest aligned to ESSER . KATEEB , LAEEQ and CHEN [14, 15]. The statement by SAMPAIOFERNANDES that there is a lot of deviation between individually implemented examiners , is in any case applicable, which also occurs in this study, regardless of whether this problem was tried to be counteracted through the train-the-teacher events. The effects of the training were less than optimal, however, so that a greater need for more information and practice would have been required above all concerning the parameters ‘slide-in direction’ and ‘outer contact positioning’. The fact that the outer contacting positioning correlated to low ICC values within the control group, i.e. that were therefore exclusively assessed on grounds of the prepCheck videos, is not surprising. For, in inlay preparations, the outer contact positioning is conceivable, due to the given extension surfaces and therefore the relation to more difficult conditions for scanning the cavities. These areas would certainly be easier to demonstrate in full-crown preparations. Were additional models provided for the assessment, the ICC values doubled, as the scanning no longer played a role here and one could assess the outer contact positioning i.e. correct the assessment, better. Here, the software would have to be improved on part of the manufacturer. In addition, a significant increase of Cronbach’s alpha occurred in Setting B, when the ‘outer contact positioning’ was evaluated. This is also not surprising, as one was in the position of assessing these areas more carefully on the model. Appropriately, the study participants assessed the possibility of being able to assess the approximate outer contact positioning via prepCheck at a mean average of 4.12 ± 0.54. The assessment of the form of the cavity edge and slide-in direction, on the other hand, however, appear to represent clear indications of the analysis software. The study participants identified a further advantage of the examined analysis tool through the process of the calibration of their colleagues and perceived each of their applications with a mean of 1.87 ± 0.95. It is generally regarded as fundamentally important, however, to primarily perform the assessment by use of the analysis tool for examinations (3.00 ± 1.41). It is not surprising, that it is generally agreed that “dental assistants cannot be replaced by prepCheck when assessing cavities” (1.00 ± 0.00). For, the sole use of digital analysis tools in the current valid version alone, may require critical parameters in the grading, such as, for example, to insufficiently depict an image of the outer contact positioning. The overall assessment of the prepCheck analysis tool, ended up being a rather modest at 2.87 ± 0.89 (on a Likert scale of 1 = excellent to 6 = unsatisfactory) and points to the above-mentioned problematic areas that can certainly be optimised on part of the software.
In order to reduce the limitations of the study, various points were considered. First of all, the order of the displayed videos and models was randomised by means of an online randomiser. As the variable of the experimental parts, i.e. examination teeth, was independent to the participants, a ‘selection effect’ did not take place. Secondly, the study took place with the same four study participants in both parts, at the same time (13:20) and in the same time frame (approx. two hours and 27 minutes), using the same procedure in the same rooms. Thirdly, the lighting of both of these settings were equally also the same, as well as also the duration of the videos (2 mins 0-10 secs) and the sequence of the settings portrayed in the individual films. Furthermore, it was taken into consideration that the participants were selected from the trainers of the department of operative dentistry, who were already actively taking part in practical preparation exercises (phantoms course for the study of conservative dentistry) while in their sixth semester of study and proved to have assessment experience. In order to reduce the problematics of the lack of realistic representation, it was attempted to perform the study in such a way that it reflected the circumstances of examination as closely as possible. In this way, the duration of the live-assessment of a cavity preparation was determined in preliminary studies and the assessment questionnaires compared to the checklists familiar from the examinations [1, 27]. In order to eliminate the problem of generalisation occurring through the differing teaching experience, it was attempted to calibrate the assessment of the cavities in the preceding train-the-teacher events. Despite this, the following limitations should be taken into consideration: it is conceivable that when assessing a model (Part B), the evaluation was generally more rigid, as the preliminary grades from after the first part were already known. It is also possible that in scope of the whole experimental part, a practice-effect took place that became evident to each individual assessor to a different degree. This could explain why, despite the preceding train-the-teacher events, the inter-rater reliability differed. The influence of gender, age and teaching experience of the subject group was not a main part of this examination, although it could well be addressed in future studies.
1. This examination illustrates an average reliability of α = 0.833 in the assessment mode control group (Part A) that supersedes the demands for practical examinations (α ≥ 0.6) and also encompasses the general requirements of ‘highstake’ examinations of α ≥ 0.8. In Part B, a reliability of α = 0.797 was determined, without this being of specific significance to the control group.
2. The overall assessment did not significantly differ between both examination groups (Parts A and B). In the ‘outer contact positioning’ parameter, however, significant differences could be determined between A and B.
3. The ICC values with a mean average of 0.43 < r < 0.74 for the control group assessment mode (Part A) are higher than in the study group assessment mode (Part B) with 0.35 < r < 0.60. The ICC values of the ‘slide-in direction’ and ‘outer contact positioning’ criteria of the assessment mode of the control group (Part A) are minimal. The maximum reliability of the criteria of ‘preparation edge’, ‘surface’, ‘width & depth’ and ‘outer contact positioning’ in the assessment mode of the control group (prepCheck video) is acceptable at α > 0.7.
4. The assessment of the study participants in regards to the application concept of the digital-analytic software and study procedure generally proved to demonstrate a positive tendency.
Subscribe to our articles alerts and stay tuned.