Reliability and validity of physical examination tests for the assessment of ankle instability

Beynon, Amber; Le May, Sylvie; Theroux, Jean

doi:10.1186/s12998-022-00470-0

Systematic Review
Open access
Published: 19 December 2022

Reliability and validity of physical examination tests for the assessment of ankle instability

Chiropractic & Manual Therapies volume 30, Article number: 58 (2022) Cite this article

6023 Accesses
2 Citations
4 Altmetric
Metrics details

Abstract

Introduction

Clinicians rely on certain physical examination tests to diagnose and potentially grade ankle sprains and ankle instability. Diagnostic error and inaccurate prognosis may have important repercussions for clinical decision-making and patient outcomes. Therefore, it is important to recognize the diagnostic value of orthopaedic tests through understanding the reliability and validity of these tests.

Objective

To systematically review and report evidence on the reliability and validity of orthopaedic tests for the diagnosis of ankle sprains and instability.

Methods

PubMed, CINAHL, Scopus, and Cochrane databases were searched from inception to December 2021. In addition, the reference list of included studies, located systematic reviews, and orthopaedic textbooks were searched. All articles reporting reliability or validity of physical examination or orthopaedic tests to diagnose ankle instability or sprains were included. Methodological quality of the reliability and the validity studies was assessed with The Quality Appraisal for Reliability studies checklist and the Quality Assessment of Diagnostic Accuracy Studies-2 respectively. We identified the number of times the orthopaedic test was investigated and the validity and/or reliability of each test.

Results

Overall, sixteen studies were included. Three studies assessed reliability, eight assessed validity, and five evaluated both. Overall, fifteen tests were evaluated, none demonstrated robust reliability and validity scores. The anterolateral talar palpation test reported the highest diagnostic accuracy. Further, the anterior drawer test, the anterolateral talar palpation, the reverse anterior lateral drawer test, and palpation of the anterior talofibular ligament reported the highest sensitivity. The highest specificity was attributed to the anterior drawer test, the anterolateral drawer test, the reverse anterior lateral drawer test, tenderness on palpation of the proximal fibular, and the squeeze test.

Conclusion

Overall, the diagnostic accuracy, reliability, and validity of physical examination tests for the assessment of ankle instability were limited. Physical examination tests should not be used in isolation, but rather in combination with the clinical history to diagnose an ankle sprain. Preliminary evidence suggests that the overall validity of physical examination for the ankle may be better if conducted five days after the injury rather than within 48 h of injury.

Introduction

Sprains have been found to be the most common type of ankle injuries [1, 2]. Persistent symptoms after ankle sprains are common [3,4,5]. Approximately 55% of individuals do not seek treatment for an ankle sprain [6]. and even when treatment is sought, treatment strategies are often insufficient in the rehabilitation and prevention of recurrences [7]. Consequently, ankle sprains may be underreported in certain populations, such as by athletes [7]. The first step in being able to improve patient outcomes for ankle sprains would be to correctly diagnose the ankle sprains. Clinicians rely on certain physical examination tests to diagnose and potentially grade ankle sprains and ankle instability. Diagnostic error and inaccurate prognosis may have important repercussions for clinical decision-making and patient outcomes [8]. Therefore, it is important to recognize the diagnostic value of orthopaedic tests through understanding the reliability and validity of these tests.

Reliability looks at the consistency demonstrated when a measure using a test is repeated [9]. Inter-rater reliability measures the reliability between two or more raters, and intra-rater reliability measures the reliability of the same rater on the same patient. Validity is the degree to which a test measures what it is intended to measure [9]. Determining the reliability and validity of a test or an examination technique is essential and provides credibility to the results obtained with the test or examination technique [10].

Several previous reviews have considered the diagnostic accuracy of particular ankle injuries. Schwieterman et al. [11] focussed their review on the ankle and foot special tests, including ligament stability, neurological issues, and tendons dysfunction. Schneiders et al. [12] and Netterström-Wedin et al. [13] specifically reviewed the diagnostic accuracy of clinical tests for low ankle sprain and included the drawer and talar tilt tests, while Sman et al. [14] assessed the accuracy of syndesmosis injuries specifically the squeeze test and the dorsiflexion-external rotation stress test. Finally, Delahunt et al. [15] published a consensus statement and recommendations focussing on developing a structured clinical assessment of acute lateral ankle sprain. This Delphi study included experts from the “International Ankle Consortium” executive committee [15]. Key recommendations included establishing the mechanism of injury and assessing ankle joint bones and ligaments. This group also established an “International Ankle Consortium Rehabilitation-Orientation Assessment (ROAST), hoping to help clinicians identify mechanical and sensorimotor impairments often found with chronic ankle instability [15]. They advocated that lateral ankle integrity, including syndesmosis, must be assessed, reporting that the most utilised clinical tests were the anterior drawer, talar tilt tests, syndesmosis direct palpation, and the squeeze test [15]. However, many primary studies do not clearly define or distinguish between the types of ankle sprains and often only consider the overall ankle injuries or ankle instability [16,17,18,19]. Therefore, focusing on one only component or considering only one type of ankle sprain in isolation may mean studies are missed.

Our objective was to systematically review and report evidence on the reliability and validity of physical examination (orthopaedic) tests for the diagnosis of ankle sprains and/or ankle instability.

Methods

This review was prospectively registered within Prospero (CRD42019124090). This systematic review adheres to the Preferred Reporting Items for Systematic reviews and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines [20].

Eligibility criteria

Studies regarding either the reliability or validity of manual physical examination or orthopaedic tests for the diagnosis of ankle instability or ankle sprains, including but not limited to anterior drawer test, talar tilt test, and external rotation test were included. We included original peer-reviewed studies written in English or French that included human participants of any age, gender, or ethnicity. Studies assessing validity had to include relevant statistical values such as odds ratios, predictive value, likelihood ratios, receiver operator curves, sensitivity, or specificity. Studies assessing reliability had to include relevant statistical values such as Kappa, intra-class correlation coefficient, or percent agreement.

Search strategies

Searches were conducted in PubMed, CINAHL, Scopus, and Cochrane Database from inception to December 2021. In addition, reference lists of included studies, located systematic reviews, and important textbooks on orthopaedic evaluation/musculoskeletal diagnosis were searched for other possible studies [21,22,23].

The keywords used combination were; “reproducibility of results”, “sensitivity and specificity”, joint instability, ligament, ankle, ankle joint, physical examination, validity, predictive value, accuracy, instability, laxity, injury, alignment, clinical assessment, palpation, orthopaedic, anterior drawer test, talar tilt, and external rotation test. The full search strategy for each database is included in Additional file 1. Search results were imported into bibliographic management software (EndNote X9.2) and duplicates discarded. Results of the search were reported as per the PRISMA flow diagram (See Fig. 1).

Study selection and data extraction

Titles and abstracts were screened independently by two review authors (A.B and J.T) according to the eligibility criteria. The full texts of possibly relevant papers were obtained and again screened against the same criteria (A.B and J.T). Any disagreements were resolved through discussions and consensus between the reviewers.

Data from included studies were extracted independently by two reviewers (A.B and J.T), using data collection forms based on a Quality Appraisal for Reliability studies (QAREL) checklist [24] (reliability studies) and a Standards for Reporting Diagnostic Accuracy Studies (STARD) [25] (validity studies) by two review authors, and then collated together. Any disagreements were resolved through discussions and consensus between the reviewers. We extracted study characteristics, including purpose of study, sample size, study population, examiners, orthopaedic tests used, reference standards, and study results.

Methodological quality assessment

The quality of included articles was assessed by two review authors. Methodological quality of the reliability studies was assessed with the QAREL checklist [24], which has 11 items covering seven domains including spectrum of subjects, spectrum of raters, rater blinding, order of examinations, suitable time intervals among repeated measures, test applied and interpreted correctly, and appropriate statistical analysis. Each item is rated as ‘Yes’, ‘No’, ‘Unclear’, or ‘Not applicable’. An item rated as ‘Yes’ indicates a good quality aspect of the study, while an item rated as ‘No’ indicates a poor quality assessment [24]. As recommended each quality item on the QAREL is considered separately rather than given an overall numerical quality score [24, 26]. Studies that were rated as ‘Yes’ on all items have an overall judgement of ‘high quality’. However, if a study is rated as ‘No’ or ‘Unclear’ on one or more items then it has an overall rating of ‘At risk of bias’.

Methodological quality of the validity of the studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) [27]. The QUADAS-2 consists of four key domains covering patient selection, index test, reference standard, flow and timing, with each domain assessing risk of bias and three of the domains are also assessing applicability. As recommended, each domain on the QUADAS-2 is considered separately rather than giving an overall numerical quality score [24, 26, 27]. Studies that were rated as low risk on all domains regarding risk of bias or applicability have an overall judgement of ‘low risk of bias’ or ‘low concern regarding applicability’. However, if a study is rated as ‘high’ or ‘unclear’ in one or more domains then it has an overall evaluation of ‘at risk of bias’ or ‘concerns regarding applicability’ [27].

Summary of findings

The characteristics of the included studies were tabulated for comparison. Identifying the number of times the orthopaedic test was investigated and the validity and/or reliability of each test. Where possible and appropriate (if studies included appropriate statistics), we have included a summary of the validity results summarised by test. Where possible further validity results were calculated from results provided within the included studies. Likelihood ratios were calculated if sensitivity and specificity were reported using the equations; positive likelihood ratio = sensitivity/(1-specificity) and negative likelihood ratio = (1-specificity)/sensitivity [9]. Predictive values and diagnostic accuracy were calculated if the true positive and negative, and false positive and negative values were reported [9]. The interpretation of Kappa values were based on the Landis and Koch reliability classification scale; below chance agreement < 0.00, slight agreement 0.00–0.20, fair agreement 0.21–0.40, moderate agreement 0.41–0.60, substantial agreement 0.61–0.80, and almost perfect agreement 0.81–1.00 [28]. Intra-class correlation coefficient (ICC) were interpreted as poor < 0.40, good 0.40–0.75, and excellent if > 0.75 [29].

We assessed whether results could be included into meta-analysis. Studies were assessed for statistical heterogeneity using I² [30, 31]. Although there is no agreement on I² interpretation, we applied the following criteria: 0–40% represented low heterogeneity, 30–60% represented moderate heterogeneity, 50–90% represented substantial heterogeneity, and 75–100% represented considerable heterogeneity [30]. When considering whether a meta-analysis is potentially suitable, we considered both the I² and the methodological/clinical heterogeneity such as population under study, interpretation of index tests, and reference standards used.

Results

Study selection

We identified 6798 articles through searching databases and 26 additional records through other sources. After duplications were removed, 6007 articles remained. The title and abstract screen reduced the potential number down to 27 for full-text review. Eleven articles were excluded at full text review [32,33,34,35,36,37,38,39,40,41,42]. After the full-text review, 16 articles met the eligibility criteria (N = 935 participants) and are included in this review. Figure 1 outlines the screening and selection process.

Study characteristics

Of the 16 included studies, three studies assessed reliability [17, 19, 43], eight studies assessed validity [16, 18, 44,45,46,47,48,49], and five studies assessed both reliability and validity [50,51,52,53,54]. Two studies were cadaveric studies [46, 51]. The characteristics of all included studies are reported in Table 1.

Table 1 Characteristics of included studies

Full size table

Methodological quality

Quality assessment of included reliability studies using QAREL is presented in Table 2. Only one study rated ‘yes’ on all 11 item yielding an overall judgement of ‘high quality’ [19]. The other six studies that assessed reliability had at least one item rated as ‘no’ or ‘unclear’ giving an overall judgement of ‘at risk of bias’ [17, 43, 50,51,52,53,54]. Common sources of bias included not enough information regarding blinding of the raters to the findings of other raters [17, 50,51,52,53], to their own prior findings [17, 43], to other clinical information [17, 43, 50, 53, 54], and to additional cues [17, 43, 50, 52,53,54]. All included studies used appropriate statistical tests.

Table 2 Quality assessment of included reliability studies using QAREL

Full size table

Quality assessment of included validity studies using QUADAS-2 are presented in Table 3. Four studies assessing validity had an overall judgement of ‘low risk of bias’ [46,47,48, 51], and seven studies had an overall judgement of ‘low concern regarding applicability’ [16, 18, 44, 45, 47,48,49]. Only two studies rated as ‘low risk of bias’ and ‘low concern of applicability’ [47, 48]. the other eight studies had at least one domain within risk of bias and/or applicability with a rating of ‘high’ or ‘unclear’ [16, 18, 44,45,46, 49,50,51,52,53,54]. Common sources of bias included not enough information on how the sample was enrolled [16, 44, 45, 52], how the index test was interpreted such as if a pre-specified threshold was used [16, 50, 52,53,54], if the reference standard was interpreted without knowledge of the test [44] or if the reference standard was likely to correctly classify the condition [18], and only the cases receiving the reference standard [16]. The two cadaveric studies posed concerns regarding the applicability of patient selection and the use of the reference standard [46, 51] therefore, the results from these studies will be reported separately.

Table 3 Quality assessment of included validity studies using QUADAS-2

Full size table

Summary of findings

Six studies assessed the reliability of the anterior drawer test [17, 19, 50,51,52,53]. Three studies assessed the reliability of the external rotation test [43, 50, 53], and the squeeze test [43, 50, 53]. Two studies assessed the reliability of the anterolateral drawer test [51, 52], and the inversion tilt test [19, 53]. Only one study assessed the reliability of syndesmosis ligament palpation [43], the dorsiflexion compression test [43], tenderness of anterior inferior tibiofibular ligament, proximal fibular, deltoid ligament, anterior talofibular ligament and calcaneo-fibular ligament [50], the cotton test [50], the crossed-leg test [50], distal fibular position [17], the reverse anterolateral drawer test [52], talar tilt [19], and the eversion tilt test [53]. Table 4 reports an overview of the results from studies assessing reliability. Additional file 2 presents a description of all included tests based upon the provided reviewed literature.

Table 4 Results from studies assessing reliability

Full size table

Nine studies assessed the validity of the anterior drawer test [16, 44, 46, 48, 50,51,52,53,54]. Four studies assessed the validity of the external rotation test [45, 47, 50, 53], and the squeeze test [45, 47, 50, 53]. Three studies assessed the validity of the anterolateral drawer test [46, 51, 52], and the tenderness of the anterior talofibular ligament and calcaneofibular ligament [49, 50, 54]. Two studies assessed the validity of a talar tilt test [18, 48], and tenderness of the syndesmosis [47, 54]. Only one study assessed the validity of dorsiflexion lunge with compression [47], tenderness of anterior inferior tibiofibular ligament [50], proximal fibular [50], deltoid ligament [50], medial ankle [54], talocrural joint [54], peroneal tendon [54], lateral malleolus [54], diffusely lateral [54], supination line [54], the cotton test [50], the crossed-leg test [50], the reverse anterolateral drawer test [52], the inversion stress test [53], and the eversion stress test [53]. Table 5 reports an overview of the results from studies assessing validity.

Table 5 Results from studies assessing validity

Full size table

Due to the methodological and statistical heterogeneity of the included studies, a meta-analysis was not possible. When combining results, the I² value was 75–100% representing considerable heterogeneity for all considered meta-analyses. Additionally, there was major methodological and clinical heterogeneity among the included studies. For example, nine included studies assessed the validity of the anterior drawer test. However, two of these studies are cadaveric studies [46, 51]. A range of different reference standards were used within these studies, including ultrasound [44, 48, 52], MRI [16, 50, 53], arthrography [54], and cutting the ligaments and measured with direct anatomical measurements [46, 51]. There were also differences in how the anterior drawer test was conducted and scores interpreted.

There were only three tests; anterior drawer [17, 51], distal fibular position [17], and anterolateral drawer tests [51], that had results reported regarding intra-rater reliability. These tests were all reported to have excellent intra-rater reliability [17, 51]. However, these results are only based on at most two studies [17, 51], in which one of these studies was using cadavers [51]. The two tests with the highest reported inter-rater reliability were the external rotation and the anterior drawer tests, rated as substantial [43] and good [17] agreement respectively. However, other studies have rated the inter-rater reliability of the anterior drawer test as slight [52] and poor [19], and the external rotation test as fair [50, 53], demonstrating inconsistent results. The only test to show some consistent results based on more than one included study was the squeeze test, which was rated as having moderate inter-rater reliability based on results from two studies [43, 50].

Overall, the test with the highest reported diagnostic accuracy (91.3%) was the anterolateral talar palpation test, however, this was only based on the results of one study [16]. The tests with the highest reported sensitivity were the anterior drawer test [44, 51, 53], the anterolateral talar palpation [16], the reverse anterior lateral drawer test [52], and palpation of the anterior talofibular ligament [49, 54]. However, there were quite inconsistent results with lower sensitivity reported for the anterior drawer test depending on the grade of the ankle sprain to indicate positive test results [44]. The anterior drawer test also reported the lowest negative likelihood ratio (0.24) compared to other reported tests assessing validity for ankle sprains [53]. The tests with the highest reported specificity were the anterior drawer [16, 48, 52, 53], anterolateral drawer test [46, 52], the reverse anterior lateral drawer test [52], tenderness on palpation of the proximal fibular [50] and diffusely lateral [54], the squeeze test [45, 47, 53], the talar tilt test [48], and the eversion stress test [53]. Again, there were inconsistent results with lower specificity results reported for the anterior drawer test in other studies [44, 46, 50, 51]. The squeeze test reported the highest positive likelihood ratio (35) compared to all other reported tests [53]. The reverse anterolateral drawer test reported both a very high sensitivity and specificity, but this was only reported within one study [52].

Consideration of type of ankle sprain

In the diagnosis of an ankle injury, the mechanism of injury should be considered, such as by using Lauge-Hansen classification [55]. While many included studies included a mixture of participants with different types of ankle sprains, some included studies did specify which tests should be used for which type of ankle injury. Orthopaedic tests to assess for a potential syndesmosis injury include; tenderness of palpation of direct ligaments [43, 47, 50], squeeze test [43, 47, 50], external rotation stress test [43, 50, 53], dorsiflexion compression test [43, 47], cotton test [50], and crossed-leg test [50]. Orthopaedic tests to assess for a potential lateral ligament injury include; anterior drawer test [44, 46, 51,52,53], anterolateral drawer test [46, 51, 52], anterolateral talar palpation, reverse anterolateral drawer test [52], tenderness of palpation of direct ligaments [50], inversion stress test [53], and talar tilt test. Orthopaedic tests to assess for a potential medial ligament injury include; tenderness of palpation of direct ligaments [50], and eversion stress test [53]. Additional file 3 reports orthopaedic tests for different types of ankle sprains. Additional file 4 reports a summary of the sensitivity and specificity values by orthopaedic test.

Discussion

The tests reviewed included the anterior drawer, anterolateral drawer, reverse anterolateral drawer test, external rotation, dorsiflexion external rotation, squeeze, palpation and tenderness, cotton, crossed-leg, dorsiflexion compression, distal fibular position, talar tilt, inversion tilt, eversion stress, and dorsiflexion lunge with compression tests. Overall, none of these tests have shown robust reliability and validity scores. Even the studies that used a combination of tests did not show high diagnostic accuracy [47]. However, one study did find that the overall validity of physical examination for the ankle did drastically increase if conducted five days after the injury rather than within 48 h of injury [54]. The orthopaedic tests should be used in combination with the clinical history.

Many of the included studies had different or unclear definitions of ankle sprains. These could include a mixture of participants with a history of lateral, medial and/or syndesmotic ankle sprains [16,17,18,19, 49, 54]. Many studies had a mixture of acute and chronic ankle sprains [16, 17, 43, 44] or no information regarding how long the injury was ongoing [17, 19]. The clinical usefulness of certain tests could differ among acute or chronic conditions. Also, some studies did not consider the grade of the ankle sprain required to indicate a positive test [16, 17]. One study that did consider the grade of the ankle sprain showed that when a higher grade (grade 3 or above) was used to consider a positive result, they observed a higher specificity but a lower sensitivity compared to values when using a grade 2 or above [44].

There were other differences in how the studies were conducted, which hindered the interpretation of this systematic review’s results. There were a range of different reference tests used, including ultrasound [44, 48, 52], MRI [16, 45, 47, 49, 50, 53], Cumberland ankle instability tool [18], arthrography [54], and cutting the ligaments to directly measure anatomical movements [46, 51]. Additionally, there were differences in how tests were conducted, and scores interpreted. For instance, some authors used subjective or objective interpretations to assess the drawer test, such as feeling if there is any laxity [19, 44] compared to using a goniometer [17]. Other studies did not provide enough detail about how the index test was interpreted such as if a pre-specified threshold was used [16, 50, 52, 53]. Furthermore, many studies had a mixture of examiners with varying degrees of experience from students or clinicians with minimal clinical experience to highly experienced clinicians [19, 43, 46, 47, 50,51,52]. When studies compared the results between students or junior examiners compared to more senior or experienced examiners, there were mixed results. On occasions, the less experienced examiners yielded higher results and on other occasions, the more experienced examiners yielding higher results [19, 52]. Moreover, the two studies using cadaveric specimens [46, 51] posed concerns regarding the applicability to a clinical population, there would be differences between using living participants compared to using cadaveric specimens. The advantage of using cadaveric specimens over live patients is the easiness of distinguishing between a true positive or a true negative as the ligaments were cut however, it lacks important feedback such as patient cues and tenderness.

This systematic review differs from previous reviews. Two previous reviews on ankle injuries were published six [12] and nine [11] years ago. While both reviews investigated the diagnostic accuracy of special ankle tests, Schneiders et al. [12] included special tests of ankle and foot musculoskeletal pathologies, and Schneiders et al. [12] reviewed publications that included only the two most widely used clinical tests to assess lateral ankle sprains, namely the anterior drawer and the talar tilt tests. Both these review articles [11, 12] did not account for the reliability of the index tests. A more recent review [13] looked at the accuracy of clinical tests assessing ligamentous injury of the talocrural and subtalar joint. Netterström-Wedin et al. [13] focussed on lower lateral ankle stability assessment and did not review ankle stability integrity in its entirety, including the ankle medial side and higher aspect (syndesmosis), which we have considered in our systematic review. We also evaluated the reliability of those tests. Considering our review objectives, we included studies [17, 18, 43, 45,46,47, 50, 51, 53, 56] that were not included by Netterström-Wedin et al. [13].

Considering the risk of bias assessment of similar included studies to the most recent previous systematic review [13], our interpretation of the QUADAS-2 tool differed for some studies. For example, Netterström-Wedin et al. [13] reported that Li et al. [52] was at low risk of bias and low applicability concerns on all items. We considered this same article to have patient selection and index test to be rated as ‘unclear risk of bias, and ‘unclear’ concerns regarding the applicability of the index test, due to the study not including enough details. These bias assessment discrepancies probably relate to the subjective interpretation of the tool which has been reported with other measurement tools [57, 58] the agreement appears to be lowest on highly subjective items. Reliability may vary according to reviewers' familiarity with the tool, their expertise, items’ interpretation, or whether reviewers have worked together before [57]. What is important is to apply the risk of bias tool consistently within the systematic review. Considering this subjectivity, comparing similar systematic reviews becomes challenging.

Despite the concerns raised by our systematic review on the diagnostic value of the included ankle physical tests, clinicians should not dismiss the significance of a thorough physical examination. The argument supporting technology as a substitute remains notably debatable, often associated with false-positive results [59], imparting a false sense of confidence that can sometimes delay and increase the burden of care. Similar to Rheumatology which lacks a specific organ or system constraint [60], musculoskeletal complaints involve multiple tissues and remain a common reason for patients visiting their primary health practitioners [61]. Despite that, the physical examination, including its orthopaedic component, remains a neglected field of research [62], this component should not be abandoned but instead better understood and refined [63].

Strengths and limitations of this review

This systematic review endeavoured to include all relevant articles that assessed the reliability and/or validity of any type of ankle sprain and/or ankle instability and included a wide initial search strategy. The methodological quality of all included studies was assessed by using the QAREL and/or the QUADAS-2. Due to the methodological heterogeneity of the included studies no meta-analysis could be conducted. The results from this review highlight the heterogeneity within the current literature. Additionally, results are only based on a few studies at most for each test, frequently with limited sample sizes. This systematic review was limited to studies written in English and French.

Recommendations for future research

Appropriate reference standards should be used when determining the diagnostic accuracy of physical examination tests. More high-quality research is needed to truly determine the reliability and validity of physical examination tests for the diagnoses of ankle sprains. Clear definitions of the type of ankle injury and the duration of time since the injury should be considered in future research. Furthermore, to truly consider the use of physical examination tests in a clinical and pragmatic way, future studies should use a combination of clinical tests along with the patient’s history.

Clinical implications

Although individual orthopaedic tests may not yield high reliability and validity, they should not be discarded entirely. When examining a patient with an ankle injury, fractures of the ankle and mid-foot should first be excluded, such as by using the Ottawa ankle rules [64], and then consider a range of orthopaedic tests to assess for an ankle sprain. Physical examination tests should not be used in isolation; instead, in combination with the clinical history to diagnose an ankle sprain. Careful consideration should be taken as to when is the most appropriate time to conduct the physical examination.

Conclusion

The diagnostic accuracy, reliability, and validity of physical examination tests for the assessment of ankle instability were limited. Physical examination tests should not be used in isolation to diagnose an ankle sprain. Rather clinicians should use a combination of physical examination tests along with the clinical history. Future studies should ensure appropriate reference standards are used, such as MRI or arthroscopy, and use a combination of clinical tests with the patient’s history to determine the diagnostic accuracy in a clinical and pragmatic way.

Availability of data and materials

All data generated or analysed during this study are included in this published article and its supplementary information files.

References

Fong DT-P, Hong Y, Chan L-K, Yung PS-H, Chan K-M. A systematic review on ankle injury and ankle sprain in sports. Sports Med. 2007;37(1):73–94.
Article Google Scholar
Doherty C, Delahunt E, Caulfield B, Hertel J, Ryan J, Bleakley C. The incidence and prevalence of ankle sprain injury: a systematic review and meta-analysis of prospective epidemiological studies. Sports Med. 2014;44(1):123–40.
Article Google Scholar
Anandacoomarasamy A, Barnsley L. Long term outcomes of inversion ankle injuries. Br J Sports Med. 2005;39(3): e14.
Article CAS Google Scholar
Braun BL. Effects of ankle sprain in a general clinic population 6 to 18 months after medical evaluation. Arch Fam Med. 1999;8(2):143.
Article CAS Google Scholar
Gerber JP, Williams GN, Scoville CR, Arciero RA, Taylor DC. Persistent disability associated with ankle sprains: a prospective examination of an athletic population. Foot Ankle Int. 1998;19(10):653–60.
Article CAS Google Scholar
McKay GD, Goldie P, Payne WR, Oakes B. Ankle injuries in basketball: injury rate and risk factors. Br J Sports Med. 2001;35(2):103–8.
Article CAS Google Scholar
Hertel J. Functional anatomy, pathomechanics, and pathophysiology of lateral ankle instability. J Athl Train. 2002;37(4):364.
Google Scholar
Elder NC, Dovey SM. Classification of medical errors and preventable adverse events in primary care: a synthesis of the literature. J Fam Pract. 2002;51(11):927–32.
Google Scholar
Porta M. A dictionary of epidemiology. Oxford University Press; 2014.
Book Google Scholar
Portney LG, Watkins MP. Foundations of clinical research: applications to practice. Saddle River: Pearson Prentice Hall Upper; 2009.
Google Scholar
Schwieterman B, Haas D, Columber K, Knupp D, Cook C. Diagnostic accuracy of physical examination tests of the ankle/foot complex: a systematic review. Int J Sports Phys Ther. 2013;8(4):416.
Google Scholar
Schneiders A, Karas S. The accuracy of clinical tests in diagnosing ankle ligament injury. Eur J Physiother. 2016;18(4):245–53.
Article Google Scholar
Netterström-Wedin F, Matthews M, Bleakley C. Diagnostic accuracy of clinical tests assessing ligamentous injury of the talocrural and subtalar joints: a systematic review with meta-analysis. Sports Health. 2022;14(3):336–47.
Article Google Scholar
Sman AD, Hiller CE, Refshauge KM. Diagnostic accuracy of clinical tests for diagnosis of ankle syndesmosis injury: a systematic review. Br J Sports Med. 2013;47(10):620–8.
Article Google Scholar
Delahunt E, Bleakley CM, Bossard DS, Caulfield BM, Docherty CL, Doherty C, et al. Clinical assessment of acute lateral ankle sprain injuries (ROAST): 2019 consensus statement and recommendations of the International Ankle Consortium. Br J Sports Med. 2018;52(20):1304–10.
Article Google Scholar
Gomes JLE, Soares AF, Bastiani CE, de Castro JV. Anterolateral talar palpation: a complementary test for ankle instability. Foot Ankle Surg. 2018;24(6):486–9.
Article Google Scholar
Parasher RK, Nagy DR, April LE, Phillips HJ, Mc Donough AL. Clinical measurement of mechanical ankle instability. Man Ther. 2012;17(5):470–3.
Article Google Scholar
Rosen A, Ko J, Brown C. Diagnostic accuracy of instrumented and manual talar tilt tests in chronic ankle instability populations. Scand J Med Sci Sports. 2015;25(2):e214–21.
Article CAS Google Scholar
Wilkin EJ, Hunt A, Nightingale EJ, Munn J, Kilbreath SL, Refshauge KM. Manual testing for ankle instability. Man Ther. 2012;17(6):593–6.
Article Google Scholar
McInnes MD, Moher D, Thombs BD, McGrath TA, Bossuyt PM, Clifford T, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319(4):388–96.
Article Google Scholar
Cook C. Orthopedic manual therapy: an-evidence based approach. Upper Saddle River: Pearson Education; 2012.
Google Scholar
Dutton M, Magee D, Hengeveld E, Banks K, Atkinson K, Coutts F, et al. Orthopaedic examination, evaluation, and intervention. McGraw-Hill Medical; 2004.
Google Scholar
Magee DJ. Orthopedic physical assessment. Elsevier Health Sciences; 2014.
Google Scholar
Lucas NP, Macaskill P, Irwig L, Bogduk N. The development of a quality appraisal tool for studies of diagnostic reliability (QAREL). J Clin Epidemiol. 2010;63(8):854–61.
Article Google Scholar
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Clin Chem. 2015;61(12):1446–52.
Article CAS Google Scholar
Whiting P, Harbord R, Kleijnen J. No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Med Res Methodol. 2005;5(1):1–9.
Article Google Scholar
Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.
Article Google Scholar
Landis JR, Koch GG. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics. 1977:363–74.
Shoukri MM, Cihon C. Statistical methods for health sciences. CRC Press; 1998.
Book Google Scholar
Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane handbook for systematic reviews of interventions. Wiley; 2019.
Book Google Scholar
Borenstein M, Hedges LV, Higgins JP, Rothstein HR. Introduction to meta-analysis. Wiley; 2021.
Book Google Scholar
Kataoka K, Hoshino Y, Nagamune K, Nukuto K, Yamamoto T, Yamashita T, et al. The quantitative evaluation of anterior drawer test using an electromagnetic measurement system. Sports Biomech. 2021;21:1–12.
Google Scholar
Wenning M, Gehring D, Lange T, Fuerst-Meroth D, Streicher P, Schmal H, et al. Clinical evaluation of manual stress testing, stress ultrasound and 3D stress MRI in chronic mechanical ankle instability. BMC Musculoskelet Disord. 2021;22(1):1–13.
Article Google Scholar
Bi C, Kong D, Lin J, Wang Q, Wu K, Huang J. Diagnostic value of intraoperative tap test for acute deltoid ligament injury. Eur J Trauma Emerg Surg. 2021;47(4):921–8.
Article Google Scholar
Teramoto A, Iba K, Murahashi Y, Shoji H, Hirota K, Kawai M, et al. Quantitative evaluation of ankle instability using a capacitance-type strain sensor. Foot Ankle Int. 2021;42(8):1074–80.
Article Google Scholar
Vivtcharenko VY, Giarola I, Salgado F, Li S, Wajnsztejn A, Giordano V, et al. Comparison between cotton test and tap test for the assessment of coronal syndesmotic instability: a cadaveric study. Injury. 2021;52:S84–8.
Article Google Scholar
Beumer A, Swierstra BA, Mulder PG. Clinical diagnosis of syndesmotic ankle instability: evaluation of stress tests behind the curtains. Acta Orthop Scand. 2002;73(6):667–9.
Google Scholar
De Vries J, Kerkhoffs G, Blankevoort L, van Dijk CN. Clinical evaluation of a dynamic test for lateral ankle ligament laxity. Knee Surg Sports Traumatol Arthrosc. 2010;18(5):628–33.
Article Google Scholar
Docherty CL, Rybak-Webb K. Reliability of the anterior drawer and talar tilt tests using the LigMaster joint arthrometer. J Sport Rehabil. 2009;18(3):389–97.
Article Google Scholar
Hertel J, Denegar CR, Monroe MM, Stokes WL. Talocrural and subtalar joint instability after lateral ankle sprain. Med Sci Sports Exerc. 1999;31(11):1501–8.
Article CAS Google Scholar
Funder V, Jørgensen J, Andersen A, Andersen SB, Lindholmer E, Niedermann B, et al. Ruptures of the lateral ligaments of the ankle: clinical diagnosis. Acta Orthop Scand. 1982;53(6):997–1000.
Article Google Scholar
Lee KT, Park YU, Jegal H, Park JW, Choi JP, Kim JS. New method of diagnosis for chronic ankle instability: comparison of manual anterior drawer test, stress radiography and stress ultrasound. Knee Surg Sports Traumatol Arthrosc. 2014;22(7):1701–7.
Article Google Scholar
Alonso A, Khoury L, Adams R. Clinical tests for ankle syndesmosis injury: reliability and prediction of return to function. J Orthop Sports Phys Ther. 1998;27(4):276–84.
Article CAS Google Scholar
Croy T, Koppenhaver S, Saliba S, Hertel J. Anterior talocrural joint laxity: diagnostic accuracy of the anterior drawer test of the ankle. J Orthop Sports Phys Ther. 2013;43(12):911–9.
Article Google Scholar
de César PC, Avila EM, de Abreu MR. Comparison of magnetic resonance imaging to physical examination for syndesmotic injury after lateral ankle sprain. Foot Ankle Int. 2011;32(12):1110–4.
Article Google Scholar
Phisitkul P, Chaichankul C, Sripongsai R, Prasitdamrong I, Tengtrakulcharoen P, Suarchawaratana S. Accuracy of anterolateral drawer test in lateral ankle instability: a cadaveric study. Foot Ankle Int. 2009;30(7):690–5.
Article Google Scholar
Sman AD, Hiller CE, Rae K, Linklater J, Black DA, Nicholson LL, et al. Diagnostic accuracy of clinical tests for ankle syndesmosis injury. Br J Sports Med. 2015;49(5):323–9.
Article Google Scholar
George J, Jaafar Z, Hairi IR, Hussein KH. The correlation between clinical and ultrasound evaluation of anterior talofibular ligament and calcaneofibular ligament tears in athletes. J Sports Med Phys Fitness. 2020;60(5):749–57.
Article Google Scholar
De Simoni C, Wetz H, Zanetti M, Hodler J, Jacob H, Zollinger H. Clinical examination and magnetic resonance imaging in the assessment of ankle sprains treated with an orthosis. Foot Ankle Int. 1996;17(3):177–82.
Article Google Scholar
Großterlinden LG, Hartel M, Yamamura J, Schoennagel B, Bürger N, Krause M, et al. Isolated syndesmotic injuries in acute ankle sprains: diagnostic significance of clinical examination and MRI. Knee Surg Sports Traumatol Arthrosc. 2016;24(4):1180–6.
Article Google Scholar
Vaseenon T, Gao Y, Phisitkul P. Comparison of two manual tests for ankle laxity due to rupture of the lateral ankle ligaments. Iowa Orthop J. 2012;32:9.
Google Scholar
Li Q, Tu Y, Chen J, Shan J, Yung PS-H, Ling SK-K, et al. Reverse anterolateral drawer test is more sensitive and accurate for diagnosing chronic anterior talofibular ligament injury. Knee Surg Sports Traumatol Arthrosc. 2020;28(1):55–62.
Article Google Scholar
Hosseinian SHS, Aminzadeh B, Rezaeian A, Jarahi L, Naeini AK, Jangjui P. Diagnostic value of ultrasound in ankle sprain. Foot Ankle Surg. 2021;61:305–9.
Article Google Scholar
Van Dijk C, Lim L, Bossuyt P, Marti R. Physical examination is sufficient for the diagnosis of sprained ankles. J Bone Joint Surg Br Vol. 1996;78(6):958–62.
Article Google Scholar
Okanobo H, Khurana B, Sheehan S, Duran-Mendicuti A, Arianjam A, Ledbetter S. Simplified diagnostic algorithm for Lauge–Hansen classification of ankle injuries. Radiographics. 2012;32(2):E71–84.
Article Google Scholar
Wilkerson L, Lee M. Assessing physical examination skills of senior medical students: knowing how versus knowing when. Acad Med. 2003;78(10):S30–2.
Article Google Scholar
Gates M, Gates A, Duarte G, Cary M, Becker M, Prediger B, et al. Quality and risk of bias appraisals of systematic reviews are inconsistent across reviewers and centers. J Clin Epidemiol. 2020;125:9–15.
Article Google Scholar
Kaizik MA, Garcia AN, Hancock MJ, Herbert RD. Measurement properties of quality assessment tools for studies of diagnostic accuracy. Braz J Phys Ther. 2020;24(2):177–84.
Article Google Scholar
Bezuglov E, Khaitin V, Lazarev A, Brodskaia A, Lyubushkina A, Kubacheva K, et al. Asymptomatic foot and ankle abnormalities in elite professional soccer players. Orthop Sports Med. 2021;9(1):2325967120979994.
Google Scholar
Villaseñor-Ovies P, Navarro-Zarza JE, Canoso JJ. The rheumatology physical examination: making clinical anatomy relevant. Clin Rheumatol. 2020;39(3):651–7.
Article Google Scholar
Cieza A, Causey K, Kamenov K, Hanson SW, Chatterji S, Vos T. Global estimates of the need for rehabilitation based on the Global Burden of Disease study 2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396(10267):2006–17.
Article CAS Google Scholar
Zaman J, Verghese A, Elder A. The value of physical examination: a new conceptual framework. South Med J. 2016;109(12):754–7.
Article Google Scholar
Malanga GA, Mautner K. Musculoskeletal physical examination e-book: an evidence-based approach. London: Elsevier Health Sciences; 2016.
Google Scholar
Bachmann LM, Kolb E, Koller MT, Steurer J, ter Riet G. Accuracy of Ottawa ankle rules to exclude fractures of the ankle and mid-foot: systematic review. BMJ. 2003;326(7386):417.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This manuscript did not receive any funding.

Author information

Authors and Affiliations

Department of Chiropractic, Faculty of Medicine, Health and Human Sciences, Macquarie University, 75 Talavera Rd, Level 2, Sydney, NSW, 2109, Australia
Amber Beynon
Faculty of Nursing, University of Montreal, Montreal, 2900, Boul. Édouard-Montpetit, Montreal, QC, H3T 1J4, Canada
Sylvie Le May
CHU Sainte-Justine Research Centre, 3175 Chemin de la Côte-Sainte-Catherine, Montreal, QC, H3T 1C5, Canada
Sylvie Le May
College of Science, Health, Engineering and Education, Murdoch University, 90 South Street, Murdoch, WA, 6150, Australia
Jean Theroux

Authors

Amber Beynon
View author publications
You can also search for this author in PubMed Google Scholar
Sylvie Le May
View author publications
You can also search for this author in PubMed Google Scholar
Jean Theroux
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AB and JT contributed to study concept and design. AB and JT drafted the manuscript. All authors contributed to the acquisition, or interpretation of data; provided critical revision of the manuscript for important intellectual content; and provided final approval of the version to be published. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Amber Beynon.

Ethics declarations

Ethics approval and consent to participants

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. Search strategy.

Additional file 2

. Tests description.

Additional file 3

. Orthopaedic tests for different types of ankle sprains.

Additional file 4

. Summary of the sensitivity and specificity values by orthopaedic test.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Beynon, A., Le May, S. & Theroux, J. Reliability and validity of physical examination tests for the assessment of ankle instability. Chiropr Man Therap 30, 58 (2022). https://doi.org/10.1186/s12998-022-00470-0

Download citation

Received: 02 May 2022
Accepted: 06 December 2022
Published: 19 December 2022
DOI: https://doi.org/10.1186/s12998-022-00470-0

Reliability and validity of physical examination tests for the assessment of ankle instability

Abstract

Introduction

Objective

Methods

Results

Conclusion

Introduction

Methods

Eligibility criteria

Search strategies

Study selection and data extraction

Methodological quality assessment

Summary of findings

Results

Study selection

Study characteristics

Methodological quality

Summary of findings

Consideration of type of ankle sprain

Discussion

Strengths and limitations of this review

Recommendations for future research

Clinical implications

Conclusion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participants

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1

Additional file 2

Additional file 3

Additional file 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Chiropractic & Manual Therapies

Contact us