- Systematic review
- Open Access
A literature review of clinical tests for lumbar instability in low back pain: validity and applicability in clinical practice
© Ferrari et al.; licensee BioMed Central. 2015
- Received: 26 June 2014
- Accepted: 22 February 2015
- Published: 8 April 2015
Several clinical tests have been proposed on low back pain (LBP), but their usefulness in detecting lumbar instability is not yet clear. The objective of this literature review was to investigate the clinical validity of the main clinical tests used for the diagnosis of lumbar instability in individuals with LBP and to verify their applicability in everyday clinical practice.
We searched studies of the accuracy and/or reliability of Prone Instability Test (PIT), Passive Lumbar Extension Test (PLE), Aberrant Movements Pattern (AMP), Posterior Shear Test (PST), Active Straight Leg Raise Test (ASLR) and Prone and Supine Bridge Tests (PB and SB) in Medline, Embase, Cinahl, PubMed, and Scopus databases. Only the studies in which each test was investigated by at least one study concerning both the accuracy and the reliability were considered eligible. The quality of the studies was evaluated by QUADAS and QAREL scales.
Six papers considering 333 LBP patients were included. The PLE was the most accurate and informative clinical test, with high sensitivity (0.84, 95% CI: 0.69 - 0.91) and high specificity (0.90, 95% CI: 0.85 -0.97).
The diagnostic accuracy of AMP depends on each singular test. The PIT and the PST demonstrated by fair to moderate sensitivity and specificity [PIT sensitivity = 0.71 (95% CI: 0.51 - 0.83), PIT specificity = 0.57 (95% CI: 039 - 0.78); PST sensitivity = 0.50 (95% CI: 0.41 - 0.76), PST specificity = 0.48 (95% CI: 0.22 - 0.58)].
The PLE showed a good reliability (k = 0.76), but this result comes from a single study. The inter-rater reliability of the PIT ranged by slight (k = 0.10 and 0.04), to good (k = 0.87).
The inter-rater reliability of the AMP ranged by slight (k = −0.07) to moderate (k = 0.64), whereas the inter-rater reliability of the PST was fair (k = 0.27).
The data from the studies provided information on the methods used and suggest that PLE is the most appropriate tests to detect lumbar instability in specific LBP. However, due to the lack of available papers on other lumbar conditions, these findings should be confirmed with studies on non-specific LBP patients.
- Joint instability
- Lumbar instability
- Low back pain
- Physical examination
- Reproducibility of results
- Prone instability test
- Passive lumbar extension test
- Aberrant movements pattern
- Posterior shear test
Low back pain (LBP) is a growing health problem in the industrialized world. Despite the high medical expenses required for its management, the prevalence of LBP is increasing . LBP is a heterogeneous condition, and the identification of different sub-groups could help the management decisions [2,3]. One of these sub-groups is lumbar segmental instability [4,5].
The radiologically determined instability is characterized by a loss of passive integrity, causing excessive vertebral translation or rotation. The maximum lumbar flexion-extension radiographs in standing position are considered to be a reference standard to detect the function of the passive stabilization system [6,7]. This imaging method is commonly used to evaluate lumbar segmental mobility in isthmic and degenerative spondylolisthesis and degenerative disc dysfunctions. The radiographic diagnosis of spondylolisthesis is considered to be one of the most efficient methods of identifying lumbar instability .
Some authors refer to the concept of instability also considering the so-called “clinical” or “functional” instability, in which no defect of the body architecture of the lumbar spine, and no excessive detectable translation or rotation are shown. However, a poor trunk muscle function and/or an insufficient motor control is believed to be a factor in abnormal inter-segmental movement and LBP [9-11]. Despite this type of instability has not been demonstrated enough as a clinical entity and is not really measureable by any gold standard, it is one of the most frequent fields of interest for chiropractors and manual therapists.
Clinicians have used several clinical tests to detect the spinal instability and/or the ability of the muscles to stabilize the lumbar spine . Recently, some of these tests have been suggested in the “Clinical Practice Guidelines linked to the International Classification of Functioning, Disability and Health from the Orthopaedic Section of the American Physical Therapy Association”, to assess the impairments of body functions in LBP . The most commonly used tests are the Prone Instability Test (PIT), the Passive Lumbar Extension (PLE) test, the Aberrant Movements Pattern (AMP), the Posterior Shear Test (PST), the Prone Bridge Test (PBT), the Supine Bridge Test (SBT), and the Active Straight Leg Raise Test.
Previous reviews separately investigated the diagnostic accuracy  or the reliability  of the instability tests, but a complete vision about their diagnostic validity to detect lumbar instability is lacking. A single literature review on both the diagnostic accuracy (sensitivity, specificity and likelihood ratios) and the inter-rater reliability of these clinical tests does not exist. More specifically, a researcher could be interested in investigating the reliability of the tests that previously demonstrated sufficient face validity.
The objective of this literature review was to assess the methods used for diagnosis (primarily the accuracy with additional reporting of reliability of these tests) of the clinical tests for lumbar instability in individuals with LBP and investigate their applicability in daily practice.
This is a literature review of all the studies presenting a diagnosis of the clinical tests for lumbar instability in individuals with LBP in literature. PRISMA Guidelines  were followed during the design, search and reporting stages of this review on diagnostic test studies.
A literature search of relevant literature was performed from July 2012 to December 2013. A comprehensive search, limited to articles in English, Italian and Spanish, was conducted in the following databases: Medline, Embase, Cinahl, PubMed, Scopus. Diagnostic test studies regarding humans published between 1972 and December 2013 were included. Narrative or systematic reviews, guidelines and meta-analyses were excluded.
The results of these seven searches were unified into a single item set. From the results of the initial search, double citations were removed and then the titles, abstracts and full texts of retrieved articles were independently evaluated for definitive inclusion. When the two reviewers were unable to reach a consensus, a third reviewer (CV) was consulted. In addition to the Internet-assisted search, references were pulled from a textbook on diagnostic accuracy of orthopedic clinical tests , and from reference lists of included studies. Finally, an independent hand search including scanning of reference lists from other systematic reviews [13,14] was performed.
Diagnostic accuracy studies on adult population with sub-acute or chronic LBP were considered if clinical instability tests were employed as index tests. Dynamic radiographs were the reference test to diagnose lumbar instability. The subject articles had to report data which would allow computation of parametric statistical tests of diagnostic accuracy [sensitivity, specificity, or positive and negative likelihood ratios (+LR and -LR)].
Reliability studies on healthy or LBP adult population were considered if they concerned the use of clinical tests to diagnose lumbar instability by one or more clinicians. Articles had to report the parametric statistical tests of relationship or agreement.
Finally, only the studies in which each test was investigated by at least one study concerning both the accuracy and the reliability were considered eligible.
Data extraction and quality assessment
One author (TM) gathered data regarding clinical tests, with its description and score, study population (e.g. age, gender, setting, clinical characteristics), inclusion and exclusion criteria, diagnostic reference standard, differences in operationalizing the index tests, study raters. Study results about sensitivity, specificity, LR+, LR-, and reliability were collected (or calculated, if included articles did not provide these data). Other authors (SF and FB) verified data extraction once completed. The methodological quality of included articles was independently assessed by 2 reviewers (TM and FB), using different tools for the 2 types of studies: the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool for diagnostic accuracy articles  and the Quality Appraisal of Reliability Studies (QAREL) checklist for diagnostic reliability articles .
Data synthesis and analysis
QUADAS (Quality Assessment of Diagnostic Accuracy Study) tool results
Fritz et al. [ 24 ]
Kasai et al. [ 25 ]
Was the spectrum of patients representative of the patients who will receive the test in practice?
Were selection criteria clearly described?
Is the reference standard likely to correctly classify the target condition?
Is the time period between reference standard and index test short enough to the reasonably sure that the target condition did not change between the two tests?
Did the whole sample or a random selection of the sample, receive verification using a reference standard of diagnosis?
Did patients receive the same reference standard regardless of the index result?
Was the reference standard independent of the index (i.e. The index test did not form part of the reference standard)?
Was the execution of the index described in sufficient detail to permit replication of the test?
Was the execution of the reference standard described in sufficient detail to permit its replication?
Were the index test results interpreted without knowledge of the result of the reference standard?
Were the reference standard results interpreted without knowledge of the results of the index test?
Were the same clinical data available when test results were interpreted as would be available when the test is used in practice?
Were uninterpretable/intermediate test results reported?
Were withdrawals from the study explained?
QAREL application results
Hicks et al. [ 23 ]
Fritz et al. [ 24 ]
Schneider et al. [ 27 ]
Ravenna et al. [ 26 ]
Rabin et al. [ 12 ]
Was the test evaluated in a sample of subjects who were representative of those to whom the authors intended the results to be applied?
Was the test performed by raters who were representative of those to whom the authors intended the results to be applied?
Were raters blinded to the findings of the other raters during the study?
Were raters blinded to their own prior findings of the test under evaluation?
Were raters blinded to the results of the accepted reference standard or disease status for the target disorder (or variable) being evaluated?
Were raters blinded to clinical information that was not intended to be provided as part of the testing procedures or study design?
Were raters blinded to additional cues that were not part of the test?
Was the order of examination varied?
Was the stability (or theoretical stability) of the variable being measured taken into account when determining the suitability of the time-interval between repeated measures?
Was the test applied correctly and interpreted appropriately?
Were appropriate statistical measures of agreement used?
Concerning sensitivity and specificity, the acceptable levels were set between 50% (unacceptable test) and 100% (perfect test) . The diagnostic accuracy was considered satisfactory, thus affecting the probability of lumbar instability, with + LR ≥ 2.0 or - LR ≤0.50 .
Concerning reliability, the following criteria has been used to determine the strength of the coefficients: ≤ 0.25 = little or no relationship; 0.26 – 0.50 = fair degree of relationship; 0.51 – 0.75 = moderate to good relationship; 0.76 – 1.00 = good to excellent relationship .
Figure 1 shows the process of study selection. Initial searching identified 773 citations. Following the first screening, 299 articles were excluded and 474 citations were retained for the second screening; after reviewing the titles, 446 were excluded and 28 considered of interest, looking at the abstracts 16 were maintained and 13 retrieved in full text. Using the inclusion and exclusion criteria a further 7 articles were excluded. This study finally included 6 papers, considering 333 LBP patients, for the review [12,23-27].
Two articles of the 6 studies (33%) were identified as having high methodological rigor according to the QUADAS tool (Table 1). Table 2 shows the distribution of studies according to the scores obtained from the assessment of their methodological quality, following the QAREL tool.
Diagnostic accuracy of the tests
Summary of the studies on diagnostic accuracy
Clinical tests, scores
Inclusion (I) and exclusion (E) criteria
Reference standard and positive criteria
Fritz et al. 
- Aberrant Movement Pattern (Painful arc on flexion, painful arc on return, instability catch, Gower sign, reverse lumbopelvic rhythm). Positive test: at least 1 of the 5 signs was present.
I: LBP with or without referred pain on the lower extremities, < 60 yrs
Dynamic X-ray: the patient stands at the edge of a tall stool with feet flat on the floor and arms folded across the chest. The patient is instructed to flex forward as far as possible for the flexion X-Ray. For the extension X-ray, the patient stands with arms folded, and is asked to extend as far as possible.
1 Physical Therapist
- Prone instability test Positive test: pain provoked during the first part of the test decreases when the test is repeated with the legs off the floor.
E: contraindications to radiographic assessment (e.g., current pregnancy), previous lumbar fusion surgery, inability (e.g., pain or muscle spasm) to actively flex and extend the spine adequately to permit an assessment of segmental motion
- Age: 39.2 ± 11.3 yrs
Criteria for instability: sagittal plane translation greater than 4.5 mm or greater than 15% of the vertebral body width, or sagittal plane rotation greater than 15° at L1/L2, L2/L3, L3/L4 levels, greater than 20° at L4/L5, or greater than 25° at L5/S1.
- Posterior Shear Test Positive test: familiar symptoms are provoked.
- Duration of symptoms (median days) 78
Instability diagnosis: 2 segments with either rotational or translational instability OR 1 segment with both translational and rotational instability
- Distribution of symptoms: back/buttock only 63.3%, symptoms distal to the knee 30.6%
- Previous history of LBP: 83.7%
- LBP episodes becoming more frequent: 30.6%
Kasai et al. 
- Passive lumbar extension test: The subject was in the prone position; both lower extremities the were elevated concurrently to a height of about 30 cm from the bed while maintaining the knees extended and gently pulling the legs. Positive test when the subject complained of strong pain in the lumbar region (“low back pain”, “very heavy feeling on the low back”, “feeling as if the low back was coming off”) during elevation of both lower legs, and such pain disappeared when they returned to the initial position. In contrast, when the subject complained of an abonrmal sensation (mild numbness or prickling sensation) the test was negative.
I: lumbar degenerative diseases
N. 122 subjects with lumbar degenerative diseases: 89 lumbar spinal canal stenosis; 21 lumbar spondylolisthesis; 12 lumbar degenerative scoliosis.
Dynamic x-ray: flexion-extension films of the lumbar spine, lateral vision.
- 39 ± 8.8 yrs;
3 criteria to asses radiological instability: angular motion > 20°; transactional motion > 5 mm; cutoff value of - 5° for the intervertebral endplate angle on the flexion film.
n°2 for testing PLE test (who had 12 and 15 yrs of clinical experience)
- mean illenss duration 11.2 months;
Radiograph instability: positive for 1 o more of the 3 criteria.
n°1 for testing Instability catch sign (with 20 yrs of clinical experience).
- Instability catch sign: The subject was asked to bend his body forward as much as possible and then return to the erect position; subject who was not able to return to erect position because of sudden low back pain was judged positive to the test.
- Complain of pain: 70.5% lumbago, 60.7% intermittent claudicatio, 42.6% neurological symptoms in the lower legs
For RX evaluation:
n°2 Orthopedics who had 8 and 14 yrs of clinical experience.
Results of diagnostic accuracy studies
+LR (95% CI)
-LR (95% CI)
Kasai et al.  found that the PLE test was the most accurate clinical test, with high sensitivity (0.84, 95% CI: 0.7 - 0.93) and specificity (0.90, 95% CI: 0.82 - 0.95), in a sample of subjects diagnosed with spinal stenosis or lumbar spondylolisthesis or lumbar degenerative scoliosis. The positive and negative LR’s were informative.
The diagnostic accuracy of AMP depends on each singular test. Low sensitivity (0.26, 95% CI: 0.15 - 0.42) and good specificity (0.86, 95% CI: 0.77 - 0.92) were found by Kasai et al.  for the Instability Catch Signs. The Painful Catch Sign and the Apprehension Sign showed the same trend, low sensitivity (0.37, 95% CI: 0.24 - 0.54 and 0.18, 95% CI: 0.22 - 0.64 respectively) and good specificity (0.73, 95% CI: 0.61 - 0.8 and 0.88, 95% CI: 0.61 - 0.78 respectively). These tests are included in the AMP, also studied by Fritz et al. , who reported low sensitivity (0.18, 95% CI: 0.08 - 0.36) and high specificity (0.95, 95% CI: 0.77 - 0.99) for the AMP test in a cohort of patients with chronic LBP.
The article by Fritz et al.  is the only one that studied the diagnostic accuracy of the PIT and the PST. Both tests demonstrated by fair to moderate diagnostic test accuracy. PIT sensitivity = 0.71 (95% CI: 0.53 - 0.85); specificity = 0.57 (95% CI: 0.37 - 0.76); PST sensitivity = 0.50 (95% CI: 0.34 - 0.66); specificity = 0.48 (95% CI: 0.28 - 0.68).
Reliability of the tests
Summary of the articles on reliability
Clinical test and scores
Inclusion (I) and exclusion (E) criteria
Hicks et al. 
- Painful arc in flexion
I: current complaints of LBP.
- Painful arc on return
E: symptoms referred below the knee, LBP which may be attributed to current pregnancy, fractures in acute phase, tumor, infection, previous lumbar surgical fusion.
For each pair of raters, the first rater performs all clinical examination measures on each subject; the second rater, who is blinded to the results of the first evaluation, then performs the same examination procedures, after a minimum of 15- minutes.
PT1: PT and chiropractor with 3 yrs of experience as a chiropractor and 2 yrs as an OMT
- Instability catch
- Age 36.0 ± 10.3
PT2: PT with 6 yrs of experience in orthopedic setting
- Gower sign (“thigh climbing”)
- Gender: 38♀, 25♂
PT3: OMT with 8 yrs of experience
- Reversal of lumbopelvic rhythm
- Previous LBP episodes, 51/63.
PT4: PT with 4 yrs of experience on orthopedic environment.
- Aberrant Movement Pattern: positive if at least one of the five previously cited signs is present.
3 pair of raters: PT1 + PT2, PT2 + PT3, PT1 + PT4
- Prone Instability Test: Positive test: pain provoked during the first part of the test disappears when the test is repeated with the legs off the floor.
Fritz et al. 
- Aberrant Movement Pattern: Painful arc on flexion; Painful arc on return; Instability catch; Gower sign (“thigh climbing”); Reverse lumbopelvic rhythm. Positive test when at least 1 of the previous 5 signs was present.
I: complaint of LBP with or without radiation into the lower extremities, < 60 yrs
N. 38 patients taken by a sample of 49 patients with these characteristics:
N. 2 physical therapists
- Prone Instability Test: Positive test when pain provoked during the first part of the test decreases when the test is repeated with the legs off the floor.
E: contraindications to radiographic assessment (e.g., current pregnancy), previous lumbar fusion surgery, inability (e.g. pain or muscle spasm) to actively flex and extend the spine adequately to permit an assessment of segmental motion.
- Age: 39.2 ± 11.3 yrs;
The second rater repeats the assessment 5 minutes after the first rater’s assessment
- Posterior Shear Test: Positive test if familiar symptoms are provoked.
- Duration of symptoms (median days) 78;
- Distribution of symptoms: back/buttock only 63.3%, symptoms distal to the knee 30.6%;
- Previous history of LBP: 83.7%
- LBP episodes becoming more frequent: 30.6%
Schneider et al. 
- Prone instability test: Positive test when pain provoked during the first part of the test disappears when the test is repeated with the legs off the floor.
I: History of LBP, age between 18 and 65 years, ability to tolerate lying prone
N. 39 volunteer patients with history of LBP and undergoing chiropractic treatment at the time of their enrollment in the study
N. 2 experienced doctors of chiropractic (25 and 10 years of clinical experience, respectively).
E: History of prior lumbar surgery, stenosis, scoliosis greater than 20°, unstable spondylolisthesis, positive nerve root tension or radiculopathy, any red flags suggestive of spinal pathology.
Ravenna et al. 
- Prone Instability Test with additional guidelines:
I: chronic or recurrent LBP; age 18 to 60 years; current symptoms of LBP, but not acute phase.
- N. 30
● Inter-rater reliability for PIT examined under 2 conditions:
N. 2 examiners:
→ A trunk stabilizing belt is placed around the subject and the table at shoulder level,
E: BMI > 30 kg/m2, disk herniation, symptoms referred below the knee, lower extremity weakness or loss of reflexes, history of spinal surgery or fracture, spinal deformity, systemic inflammatory condition, neurologic disease or other serious medical conditions. LBP attributable to pregnancy or a primary hip problem.
- Age 36.1 ± 11.8 yrs
● PIT test with additional guidelines
Second-year physical therapy student
→ A stool may be placed under the subject’s feet if the feet do not comfortably reach the floor.
- Men: 56.7%
● PIT test without additional guidelines.
Licensed physical therapist with 2 years of clinical experience in outpatient orthopedic physical therapy
- Diagnosis: degenerative disk disease 16.6%, disk problem 10%, LBP 73.4%
- Previous LBP episodes: 83.0%
- Current VAS (0–10): 2.8 ± 1.6
Positive and negative criteria:
● Positive level if the subject reports a decrease of pain with the second P/A, lifting the legs in the second part of the test
● Negative test if the subject reports superficial bone-on-bone pressure;
● Negative test if the subject reports an increase in symptoms lifting the legs during the second part of the test;
● Negative level if the subject reports an increase or same with the second P/A, compared with the first.
Rabin et al. 
- Aberrant Movement Pattern. Painful arc on flexion; Painful arc on return; Instability catch; Gower sign (“thigh climbing”); Reverse lumbopelvic rhythm. Positive test when at least one of the cited five signs is present.
I: age between 18 and 60 years, main complaint of LBP and/or related leg symptoms (i.e., pain, paresthesia)
N. 30 consecutive patients with LBP of any duration, with or without associated leg symptoms.
N. 4 raters physical therapists, with experience ranging from 13 to 25 yrs.
- Prone Instability Test: Positive when pain elicited during the first part of the test is relieved or abolished during the second part.
E: pregnancy; history suggesting a non-mechanical origin of symptoms (e.g., malignancy, inflammatory conditions), LBP due to a fracture, osteoporosis, regular use of corticosteroids, rheumatoid arthritis, presence of 2 or more signs suggesting lumbar nerve root compression.
- Age: 33.5 ± 8.0 yrs
AMP was assessed by the two raters simultaneously; PIT and PLE are assessed by the two raters separately (second assessment 5 minutes after the first one).
One rater with postprofessional master’s degree (contributes to rating all subjects).
- Passive Lumbar Extension Test: Positive if LBP is elicited.
- Gender: 15♀, 15♂
Other raters with bachelor degree in physical therapy contribute to rating in 23, 4, and 3 subjects, respectively.
- Duration of symptoms: 164.4 ± 321.8 days
- Previous LBP episodes: 20 subjects
Summary of results on reliability
Hicks et al. 
Aberrant Movement Pattern
k = 0.60 (95% CI: 0.44; 0.73)
Prone Instability Test
k = 0.87 (95% CI: 0.80; 0.94)
Fritz et al. 
Aberrant Movement Pattern
k = −0.07 (95% CI: −0.45; 0.31)
Prone Instability Test
k = 0.69 (95% CI: 0.59; 0.79)
Posterior Shear Test
k = 0.27 (95% CI: 0.14; 0.41)
Schneider et al. 
Prone Instability Test
k = 0.46 (95% CI: 0.15, 0.77)
k weighed = 0.58
Ravenna et al. 
Prone Instability Test with additional guidelines
(With*) k = 0.10 (95% IC: −0.27; 0.47)
k weighed = 0.27 (95% IC: −0.08; 0.61)
(Without*) k = 0.04 (95% IC: −0.34; 0.42)
k weighed = 0.47 (95% IC: 0.15; 0.78)
Rabin et al. 
Aberrant Movement Pattern
k = 0.64 (95% IC 0.32; 0.90)
Prone Instability Test
k = 0.67 (95% IC 0.29; 1.00)
Passive Lumbar Extension test
k = 0.76 (95% IC 0.46; 1.00)
Active Straight Leg Raising
k = 0.53 (95% IC 0.2; 0.84)
The PLE test showed a better reliability, but this result comes from a single study . The inter-rater reliability of this test resulted good (k = 0.76).
Five studies investigated the inter-rater reliability of the PIT. This reliability was considered fair by Schneider et al.  (k = 0.46) and Ravenna et al.  (k = 0.10 and 0.04), moderate by Fritz et al.  and Rabin et al.  (k = 0.69 and k = 0.67, respectively), and good by Hicks et al.  (k = 0.87).
The inter-rater reliability of the AMP was studied by Hicks et al.  Fritz et al.  and Rabin et al. . Whereas Fritz et al.  found poor reproducibility (k = −0.07), Hicks et al.  (k = 0.60) and Rabin et al.  (k = 0.64) calculated moderate reliability. The inter-rater reliability of the Posterior Shear Test was only studied by Fritz et al.  showing poor reliability (k = 0.27).
Implications for clinical practice
The data from the studies provided information on the tests and methods used, the error of measurement and also the validity of the tests. However, only 5 studies (83.3%) provided information concerning the setting and the years of raters clinical experience, whereas all studies identified the person performing the assessment and his/her professional competence.
This literature review was aimed to identify the most reliable findings concerning the assessment of methods for diagnosis of the clinical tests for lumbar instability in LBP subjects.
The lumbar instability is traditionally a field of debate. Lumbar segmental instability in the absence of defects of the bony architecture of the lumbar spine has also been cited as a significant cause of chronic low back pain [5,28]. The differences between surgical instability criteria and “functional instability” criteria were defined by Panjabi  decades ago. Chiropractics and Manual Therapists are more interested in the lost of motor control than in hypermobility detectable with flexion/extension radiological imaging, which is more useful to spine surgeons. However, the difficulty to clinically detect abnormal or excessive inter-segmental motion makes these tests often insensitive and unreliable and it becomes a limit for the clinical diagnosis of lumbar segmental instability [30,31]. The lack of studies in this field emerges also by our research, which found many studies about reliability of tests used by clinicians but few about their accuracy. Being aware that this criterion is too rigorous for manual therapists we have chosen to be rigorous and we have been forced to do our research having as reference the best reference (gold standard) to instability, that is dynamic X-rays. The result is that many other tests used in the manual clinical practice to detect lumbar clinical instability (i.e. active hip abduction test or hip extension test) have not been considered because no study had investigated their accuracy. These tests are not present in this review, so that, in latest analysis, our study could be considered as a literature review of accuracy of lumbar clinical tests with additional reporting of reliability information.
Six high-quality studies were selected and four lumbar clinical instability tests (PLE test, PIT, AMP and PST) satisfied the inclusion criteria.
The characteristics of the samples of the 2 subject studies [24,25] cannot be considered accurate. Fritz et al.  studied a population whose majority had a prior history of LBP, and in which only 30.6% (n = 15) of people complained about distal knee symptoms. Kasai et al. , however, investigated a population with specific lumbar conditions (lumbar spinal canal stenosis, lumbar spondylolisthesis or lumbar scoliosis), most of whom had intermittent claudication, and 42.6% (n = 52) had neurological leg symptoms.
The PLE test was the most accurate and informative test, even though it was measured by only one study, in patients affected by lumbar degenerative diseases. Despite the PLE test appears to be a potentially effective clinical test to detect lumbar instability, the characteristics of the investigated sample and the presence of only one study on its diagnostic accuracy may suggest the necessity of studies on non-specific LBP patients.
The PIT demonstrated low to moderate sensitivity and specificity  indicating that this test has limited accuracy in diagnosing lumbar instability in patients with LBP.
The PST showed relatively poor sensitivity and specificity , indicating that this test is less accurate than the PLE test and the PIT to detect lumbar instability.
The Instability Catch Sign, the Painful Catch Sign and the Apprehension Sign are three of the five signs included in the AMP investigated by Fritz et al. . The relatively low sensitivity and high specificity resulting from the study of Kasai et al.  suggest caution in the use of these tests to diagnose lumbar instability. According to Hicks et al. , these 5 tests should be used together, as a complete observation of the trunk movement and the 5 signs could be considered as only one comprehensive test. However, positive results on AMP and PIT, which demonstrated moderate sensitivity and specificity, were considered predictive for a favorable response to stabilization exercises .
The characteristics of the samples were not always well explained or were not reliable. The PLE test  and the PIT [12,23,24] demonstrated good inter-rater reliability. The reliability of PLE test is evident in younger subjects referred to outpatient physical therapy . Five studies on PIT demonstrated very different inter-rater reliability scores. Nevertheless, the 2 studies showing fair reliability [26,27] are affected by possible bias; in the first case  due to a very limited sample size and in the second case  due to procedures and methodological weaknesses as the involvement of novel raters and the use of a modified test. The main statistical problem was the presence of few samples that could invalidate the k score. Despite all the other 4 studies adopting the PIT closely followed its original description, some differences in the positivity criteria were found. Hicks et al.  and Schneider et al.  judged the test positive when the pain disappeared in the second part of the test; Fritz et al.  when the pain decreased, whilst for Rabin et al.  the pain had to be both relieved or abolished.
After having excluded the two studies with the main methodological weaknesses, the reliability of the PIT appeared from moderate to good.
The AMP reliability was investigated in three studies [12,23,24] but their results were not similar and ranged from insufficient reliability  to moderate reliability [12,23]. The PST was investigated by only one study and scored the lowest reliability , which is insufficient to recommend its use.
Implications for clinical practice
After an initial inspection of the articles it appears that the information derived from the studies could provide a useful picture of the items that contribute to the definition of “applicability in rehabilitation practice”. Sufficient information was provided on the execution of the tests, whereas little information regarded the duration, and the time needed to process data. Considering that in clinical practice a standard manual therapy session normally lasts 30 minutes, it may be the case that a series of tests proposed in the literature cannot be repeated by the clinicians due to lack of time. The attempt to identify methods for the evaluation of lumbar instability in patients with LBP allowed us to select some tests that are suitable for clinicians in everyday clinical practice. The time needed to test and process data are compatible with clinical practice and research purposes. Starting from the same key-words used for the search of the articles of the literature review, 4 clinical tests (PIT, PLE, AMP and PST) investigated by 2 studies [24,25] met the criteria of applicability in clinical practice.
The main limitation of this review is the small number of articles found on any single test. Only 2 studies concerned the diagnostic accuracy, while for the studies investigating the reliability, the results are limited by statistical or methodological weaknesses. For example, the Ravenna’s  conclusions should be cautiously interpreted also for some significant modifications made to standardize the PIT, such as the different hip and knee positions, the use of a stabilization scapular belt and a stool for foot placement.
The average age and the characteristics of the spinal dysfunctions of the samples were not homogeneous in the different studies, thus reducing the external validity of the results. Another limitation of this review concerns the insufficient homogeneity regarding the execution and interpretation of the tests. As already mentioned, a lack of standardization of a test affects comparative analyses among different studies and the implementation of that test in clinical practice.
The actual state of the art of clinical tests for lumbar instability include 6 studies of almost 333 patients and 4 clinical tests. Our data suggest that the PLE test is the most suitable test for detecting lumbar instability, thanks to its excellent diagnostic accuracy, and good reliability. Further studies on the diagnostic properties of the PLE test to detect lumbar instability among different populations with LBP are suggested.
After more than 20 years from the definition of the importance of diagnostic clinical tests for lumbar instability in individuals with LBP, clinicians can use some tests showing encouraging results in terms of accuracy and reliability. Nevertheless, their application in daily practice might be affected by insufficient research and evidence on their performances. Future research should be oriented to compare in the same study different assessment methods on the same sample size, in order to evaluate their reliability and validity.
We would like to thank Anna Trevisan for assisting with the literature search, Giovanni Gobitti for helping us in the statistical analysis, Fabio Cassola and Paola D’Ovidio for their help in the language review.
- Martin BI, Deyo RA, Mirza SK, Turner JA, Comstock BA, Hollingworth W, et al. Expenditures and health status among adults with back and neck problems. JAMA. 2008;299(6):656–64. doi:10.1001/jama.299.6.656.View ArticlePubMedGoogle Scholar
- Childs JD, Fritz JM, Piva SR, Erhard RE. Clinical decision making in the identification of patients likely to benefit from spinal manipulation: a traditional versus an evidence-based approach. J Orthop Sports Phys Ther. 2003;33(5):259–72.View ArticlePubMedGoogle Scholar
- Hall H, McIntosh G, Boyle C. Effectiveness of a low back pain classification system. Spine J. 2009;9(8):648–57. doi:10.1016/j.spinee.2009.04.017.View ArticlePubMedGoogle Scholar
- Abbott JH, McCane B, Herbison P, Moginie G, Chapple C, Hogarty T. Lumbar segmental instability: a criterion-related validity study of manual therapy assessment. BMC Musculoskelet Disord. 2005;6:56.View ArticlePubMed CentralPubMedGoogle Scholar
- Delitto A, George SZ, Van Dillen LR, Whitman JM, Sowa G, Shekelle P, et al. Low back pain. J Orthop Sports Phys Ther. 2012;42(4):A1–57. doi:10.2519/jospt.2012.0301.View ArticlePubMedGoogle Scholar
- Dupuis PR, Yong-Hing K, Cassidy JD, Kirkaldy-Willis WH. Radiologic diagnosis of degenerative lumbar spinal instability. Spine. 1985;10(3):262–76.View ArticlePubMedGoogle Scholar
- Nizard RS, Wybier M, Laredo JD. Radiologic assessment of lumbar intervertebral instability and degenerative spondylolisthesis. Radiol Clin North Am. 2001;39(1):55–71. 1.View ArticlePubMedGoogle Scholar
- O’Sullivan PB, Phyty GD, Twomey LT, Allison GT. Evaluation of specific stabilizing exercise in the treatment of chronic low back pain with radiologic diagnosis of spondylolysis or spondylolisthesis. Spine. 1997;22(24):2959–67.View ArticlePubMedGoogle Scholar
- Hodges PW, Moseley GL. Pain and motor control of the lumbopelvic region: effect and possible mechanisms. J Electromyogr Kinesiol. 2003;13(4):361–70.View ArticlePubMedGoogle Scholar
- Lee P, Helewa A, Goldsmith CH, Smythe HA, Stitt LW. Low back pain: prevalence and risk factors in an industrial setting. J Rheumatol. 2001;28(2):346–51.PubMedGoogle Scholar
- Macedo LG, Latimer J, Maher CG, Hodges PW, Nicholas M, Tonkin L, et al. Motor control or graded activity exercises for chronic low back pain? A randomised controlled trial. BMC Musculoskelet Disord. 2008;9:65. doi:10.1186/1471-2474-9-65.View ArticlePubMed CentralPubMedGoogle Scholar
- Rabin A, Shashua A, Pizem K, Dar G. The interrater reliability of physical examination tests that may predict the outcome or suggest the need for lumbar stabilization exercises. J Orthop Sports Phys Ther. 2013;43(2):83–90. doi:10.2519/jospt.2013.4310.View ArticlePubMedGoogle Scholar
- Alqarni AM, Schneiders AG, Hendrick PA. Clinical tests to diagnose lumbar segmental instability: a systematic review. J Orthop Sports Phys Ther. 2011;41(3):130–40. doi:10.2519/jospt.2011.3457.View ArticlePubMedGoogle Scholar
- May S, Littlewood C, Bishop A. Reliability of procedures used in the physical examination of non-specific low back pain: a systematic review. Aust J Physiother. 2006;52(2):91–102.View ArticlePubMedGoogle Scholar
- Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Int J Surg. 2010;8(5):336–41. doi:10.1016/j.ijsu.2010.02.007.View ArticlePubMedGoogle Scholar
- Cleland JA. Orthopaedic Clinical Examination: An Evidence-Based Approach for Physical Therapists. Carlstadt, NJ: Icon Learning Systems; 2005. p. 516. ISBN 1-929007-87-6.Google Scholar
- Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3:25.View ArticlePubMed CentralPubMedGoogle Scholar
- Lucas NP, Macaskill P, Irwig L, Bogduk N. The development of a quality appraisal tool for studies of diagnostic reliability (QAREL). J Clin Epidemiol. 2010;63(8):854–61. doi:10.1016/j.jclinepi.2009.10.002.View ArticlePubMedGoogle Scholar
- Villafañe JH, Zanetti L, Isgrò M, Cleland JA, Bertozzi L, Gobbo M, et al. Methods for the assessment of neuromotor capacity in non-specific low back pain: Validity and applicability in everyday clinical practice. J Back Musculoskelet Rehabil. 2014. In Press.Google Scholar
- van der Wurff P, Meyne W, Hagmeijer RH. Clinical tests of the sacroiliac joint. Man Ther. 2000;5(2):89–96.View ArticlePubMedGoogle Scholar
- Vanti C, Bonfiglioli R, Calabrese M, Marinelli F, Guccione A, Violante FS, et al. Upper Limb Neurodynamic Test 1 and symptoms reproduction in carpal tunnel syndrome. A validity study. Man Ther. 2011;16(3):258–63. doi:10.1016/j.math.2010.11.003.View ArticlePubMedGoogle Scholar
- Jewell D. Guide to Evidence-Based Physical Therapist Practice. 2nd ed. Sudbury MA: Jones & Bartlett Learning; 2011. p. 230.Google Scholar
- Hicks GE, Fritz JM, Delitto A, Mishock J. Interrater reliability of clinical examination measures for identification of lumbar segmental instability. Arch Phys Med Rehabil. 2003;84(12):1858–64.View ArticlePubMedGoogle Scholar
- Fritz JM, Whitman JM, Childs JD. Lumbar spine segmental mobility assessment: an examination of validity for determining intervention strategies in patients with low back pain. Arch Phys Med Rehabil. 2005;86(9):1745–52.View ArticlePubMedGoogle Scholar
- Kasai Y, Morishita K, Kawakita E, Kondo T, Uchida A. A new evaluation method for lumbar spinal instability: passive lumbar extension test. Phys Ther. 2006;86(12):1661–7.View ArticlePubMedGoogle Scholar
- Ravenna MM, Hoffman SL, Van Dillen LR. Low interrater reliability of examiners performing the prone instability test: a clinical test for lumbar shear instability. Arch Phys Med Rehabil. 2011;92(6):913–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Schneider M, Erhard R, Brach J, Tellin W, Imbarlina F, Delitto A. Spinal palpation for lumbar segmental mobility and pain provocation: an interexaminer reliability study. J Manipulative Physiol Ther. 2008;31(6):465–73. doi:10.1016/j.jmpt.2008.06.004.View ArticlePubMedGoogle Scholar
- Long DM, BenDebba M, Torgerson WS, Boyd RJ, Dawson EG, Hardy RW, et al. Persistent back pain and sciatica in the United States: patient characteristics. J Spinal Disord. 1996;9(1):40–58.View ArticlePubMedGoogle Scholar
- Panjabi MM. The stabilizing system of the spine. Part II. Neutral zone and instability hypothesis. J Spinal Disord. 1992;5(4):390–6. discussion 7.View ArticlePubMedGoogle Scholar
- Dvorak J, Panjabi MM, Novotny JE, Chang DG, Grob D. Clinical validation of functional flexion-extension roentgenograms of the lumbar spine. Spine. 1991;16(8):943–50.View ArticlePubMedGoogle Scholar
- Pope MH, Frymoyer JW, Krag MH. Diagnosing instability. Clin Orthop Relat Res. 1992;279:60–7.PubMedGoogle Scholar
- Hicks GE, Fritz JM, Delitto A, McGill SM. Preliminary development of a clinical prediction rule for determining which patients with low back pain will respond to a stabilization exercise program. Arch Phys Med Rehabil. 2005;86(9):1753–62.View ArticlePubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.