Diagnostic accuracy of diagnostic imaging for lumbar disc herniation in adults with low back pain or sciatica is unknown; a systematic review

Main text We aim to summarize the available evidence on the diagnostic accuracy of imaging (index test) compared to surgery (reference test) for identifying lumbar disc herniation (LDH) in adult patients. For this systematic review we searched MEDLINE, EMBASE and CINAHL (June 2017) for studies that assessed the diagnostic accuracy of imaging for LDH in adult patients with low back pain and surgery as the reference standard. Two review authors independently selected studies, extracted data and assessed risk of bias. We calculated summary estimates of sensitivity and specificity using bivariate analysis, generated linked ROC plots in case of direct comparison of diagnostic imaging tests and assessed the quality of evidence using the GRADE-approach. We found 14 studies, all but one done before 1995, including 940 patients. Nine studies investigated Computed Tomography (CT), eight myelography and six Magnetic Resonance Imaging (MRI). The prior probability of LDH varied from 48.6 to 98.7%. The summary estimates for MRI and myelography were comparable with CT (sensitivity: 81.3% (95%CI 72.3–87.7%) and specificity: 77.1% (95%CI 61.9–87.5%)). The quality of evidence was moderate to very low. Conclusions The diagnostic accuracy of CT, myelography and MRI of today is unknown, as we found no studies evaluating today’s more advanced imaging techniques. Concerning the older techniques we found moderate diagnostic accuracy for all CT, myelography and MRI, indicating a large proportion of false positives and negatives.


Introduction
Approximately 5-15% of patients with low back pain suffer from lumbar disc herniation (LDH) [1,2]. LDH is the most common spine disorder requiring surgical intervention [3,4]. Clinical guidelines recommend history taking and physical examination to rule out LDH diagnosis [4]. However, the diagnostic accuracy of both history taking and physical examination is still insufficient [5,6]. Diagnostic imaging in patients with back pain and/or leg pain is often used to assess nerve root compression due to disc herniation or spinal stenosis and cauda equina syndrome [7][8][9][10]. Furthermore, diagnostic imaging can also be used to identify the affected disc level before surgery [11].
Diagnostic imaging can be done by Magnetic Resonance Imaging (MRI), Computed Tomography (CT), X-ray and myelography. Currently MRI is the imaging modality of choice, as it has the advantage of not using ionising radiation and has good visualizing capacities especially of soft tissue [9,12]. CT is often used and available for detection of morphologic changes and has a well-recognized role in the diagnosis of herniated discs [13,14]. Compared to MRI, CT is cheaper, the total testing time is shorter, and the availability of CT scanners is larger in hospital settings, but has the drawback of exposure to ionising radiation. Myelography involves injection of contrast medium in the lumbar spine, followed by X-ray, CT or MRI projections [15]. For certain conditions (e.g. metal implants or malalignment of the spine) myelography might replace MRI as the imaging modality of choice [16]. Plain radiography (X-ray) is the most commonly used technique due to its relative low cost and ready availability [9,[17][18][19].
However, the evidence for diagnostic accuracy of diagnostic imaging for LDH is still unclear [20,21]. In addition, discordance between patients' clinical findings and MRI findings is also reported [22,23]. We have performed a large study evaluating the evidence om diagnostic accuracy of MRI and CT for all kinds of lumbar pathologies compared to various reference standards [12,24]. The aim of the current review is to more specifically summarize and compare the evidence on the diagnostic accuracy of diagnostic imaging (CT, X-rays, myelography and MRI) identifying LDH in patients with low back pain and/or leg pain with surgery as a reference standard.

Design
A systematic review and meta-analysis, according to the guidelines of the Cochrane handbook of systematic reviews of diagnostic test accuracy studies [25]. The protocol was registered in PROSPERO (2015:CRD42015027687).

Search strategy
We conducted the search in MEDLINE, EMBASE, and CINAHL (untill 1 June 2017) without language restriction (see Appendix 1). The search strategy was designed in collaboration with a medical information specialist. In addition, reference lists of relevant review articles as well as all retrieved relevant publications on diagnostic test accuracy studies were checked to identify any potentially missed articles.

Study selection
We applied the following selection criteria: a) both prospective and retrospective cohort and case-control studies; b) adults with low back and/or leg pain with lumbar disc herniation as the suspected underlying pathology; c) Index tests were MRI, X-ray, myelography or CT; d) Reference standard was surgery; e) Data to generate 2 × 2 table; f ) Published full reports, preferably in English, Dutch or German language.
We defined LDH as herniated nucleus pulposus, including protruded, extruded or sequestrated disc, causing nerve root compression. Two of the review authors (RvR/RO/BK/JHK/MB) independently selected first titles and abstracts and assessed relevant full papers. We used consensus to resolve disagreements; in cases of persisting disagreement a third review author (AV) was consulted.

Risk of bias assessment
Pairs of review authors (MvT/BK/RvR/JHK) independently performed risk of bias assessment using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS)-2 tool [26]. In the flow and timing domain, we considered a time period between index test and reference standard of 1 week or less appropriate. Risk of bias and concerns about applicability of each domain were classified as low, high or unclear risk. Consensus was reached by discussion of discrepancies between the two reviewers. If discrepancies persisted, we consulted a third reviewer (AV).

Data extraction
Pairs of review authors (MvT/BK/JHK/RvR) independently performed data extraction using a standardised form. We extracted data on author, year of publication and journal; study design and setting; study population; pathology considered, age, gender, numbers of subjects for inclusion in study and analysis, patient selection, level of measurement (patient or disc). Also, we obtained data on index and reference test characteristics; including type of test, year; methods of execution, cut-off values, positivity thresholds and outcome scales; diagnostic parameters; diagnostic two-by-two table or parameters to reconstruct this table.

Statistical analysis
For each included study we calculated sensitivity and specificity (and 95% confidence intervals (CI)) preferably on patient level data using the data from two-by-two tables. We conducted a meta-analysis separately for each of the index tests using a bivariate analysis. We chose the bivariate random-effects approach, because it incorporates both within and between study variation of sensitivity and specificity together with any correlation that might exist between sensitivity and specificity [27]. We present summary point estimates of sensitivity and specificity (and 95% confidence region) and the results were plotted in receiver operating characteristic (ROC) space [28]. When possible we generated linked ROC plots in case of pairs of diagnostic imaging tests, when both tests had been evaluated in the same study. Meta-regression was used to evaluate whether there is a difference in test accuracy between different imaging techniques or between patient level data and disc level data [29]. Analysis was carried out using STATA 13.1 software.
Two reviewers (JHK, AV) assessed the quality of the evidence for each index test using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) working group criteria [28,30]. Disagreements were resolved by a third review author (MB/DvdW). The quality of evidence is categorized as high, moderate, low, or very low [31]. The quality of the evidence started at high and is reduced by one level for each of the following domains not met: limitations of the study design (> 25% of participants in studies with two or more domains with high risk of bias); indirectness (> 25% of participants in studies with serious applicability concerns); inconsistency (unexplained variation in sensitivities and specificities across the studies [32]); imprecision (wide confidence interval of the sensitivity and specificity in > 25% of the studies); and publication bias [33].

Population
A total of 940 patients receiving surgery were included.
Overall 1289 patients were involved in these studies but the reference standard was not performed in 349 patients. The patients (14 to 82 years) all had clinical findings consistent with LDH. Seven studies (n = 288) [34,37,41,42,44,46,47] were analyzed on patient level; others analyzed disc levels ( Table 1).

Risk of bias
Although we only selected studies using surgery as a reference standard, none of the studies were assessed as having low risk of bias (RoB) related to the reference standard, mainly because it was unclear whether results of the reference standard had been interpreted without knowledge of imaging results (Fig. 2). Seven studies were considered to have high RoB related to patient selection, as patients had not clearly been selected using consecutive or random sampling. Only two studies reported a time-interval between index test and reference standard, which were 3 months and 9 months, respectively [44,47].
We found a moderate quality evidence (downgraded because of limitations in study design) for the accuracy of CT (Table 2).

Magnetic resonance imaging
Six studies, with two studies with measurements on patient level (66 patients) [44,46] and a total 299 disc explorations [36,39,43,45], were included. In these studies the mean prior probability of LDH was 68.9% (range: 48.6-98.7%). The sensitivity and specificity ranged from 64 to 93% and from 55 to 100%, respectively with wide confidence intervals (imprecision) (Fig. 6). The summary estimate was 80.9% (95%CI: 68.8-89.1%) for sensitivity and 81% (95%CI: 59.2-92.6%) for specificity (Fig. 4). Because of a positive correlation between logit-transformed sensitivity and logit-transformed specificity (estimate = 0.5516) we decided that there was inconsistency. It was not possible to examine a difference between patient level data and disc level data in sensitivity and specificity.
We conclude that there is very low quality evidence for the accuracy of MRI (downgraded by study design, inconsistency and imprecision) ( Table 2).

Myelography versus MRI
Two studies evaluated myelography and MRI (Fig. 9) [39,43]. The summary estimate of sensitivity was 55.3% (95%CI: 45.2-65.0%) for myelography and 67.4% (95%CI: 56.6-76.7%) for MRI. The summary estimate of specificity was 87.8% (95%CI: 79.7-92.9%) for myelography and 81.3% (95%CI: 69.4-89.3%) for MRI. These Studies done in a hospital setting. It was not considered as a serious applicability concern because only surgery was a reference standard c It was evaluated by a correlation between logit-transformed sensitivity and logit-transformed specificity. d Wide confidence interval of the sensitivity and specificity in more than 25% of the studies e The possibility of publication bias is not excluded but it was not considered sufficient to downgrade the quality of evidence

Discussion
We found 14 diagnostic accuracy studies including 940 patients and all evaluating rather old imaging techniques. Summary estimates of sensitivity and specificity of the different imaging techniques varied between 76 and 81%, with moderate to very low quality evidence. Furthermore, CT, myelography and MRI show comparable accuracy. We found very low quality evidence for diagnostic accuracy of MRI. Even though MRI is more expensive, clinicians generally prefer MRI to CT, as it does not carry the risks associated with ionising radiation and unlike myelography, MRI is non-invasive [48]. MRI may also be more useful when surgical treatment is considered as it can identify tissue properties as well as anatomical structures [48]. These are most likely the reasons for suggesting MRI as the most appropriate test to confirm the presence of LDH in a recent guideline regardless its disappointing diagnostic accuracy.

Strengths and weaknesses
Heterogeneity arises from several reasons. First, imaging techniques used in studies included old ones like 0.5Tesla [44] or 0.35Tesla MRI [45]. In clinical practice the results of diagnostic imaging are interpreted with knowledge of history items and physical examination. Furthermore, clinicians frequently state that imaging does not play a crucial role in predicting prognosis or deciding on a management strategy among patients with  [4]. This might be one of the reasons why there are no recent studies on the diagnostic accuracy of imaging techniques for detecting LDH. However, older techniques will probably identify less underlying causes of back pain than newer imaging techniques. Evaluation of diagnostic accuracy of advanced diagnostic equipment is therefore needed. Second, the included studies focussed on LDH, but classification of this pathology differed between studies [49]. For example, some studies defined LDH as protruded, extruded, and sequestrates disc [38,39], but other studies were defined LHD as the presence of neuronal compression [35,36,42,46]. There were some studies without a definition of LHD [37,40]. Third, we combined disc level data with patient level data. Results at disc level including more than one disc level in the same patient may lead to smaller confidence intervals and possibly to an overestimation of diagnostic accuracy. Unexpectedly, confidence intervals were often wider in disc level data compared to patient level data. Fourth, the diagnostic accuracy in this study was possibly overestimated by a high prior probability (48.6 to 98.5%) of LDH. It was reported that about 4% of patients who present with low back pain in a primary care setting have a disc herniation [8]. The high prior probability results in selection bias. Furthermore, patient selection was unclear in many studies. This is important since the interpretation of the test result (posterior probability) depends on its sensitivity and specificity as well as the probability of the disease [50]. Lastly, the use of surgery as a reference standard can easily bias the results due to partial verification [51]. Surgery is often regarded as the best available reference standard. Not everyone is subjected to surgery but only those patients with a very strong suspicion based on clinical symptoms combined with the results of the diagnostic imaging of LDH which leads to (partial) verification bias. In this review, among 669 patients with suspected LDH, 349 (52.2%) patients did not undergo surgical treatment in seven studies [34,36,37,43,[45][46][47]. Verification bias can lead to an increased diagnostic accuracy of the index test; i.e. it will show an increased sensitivity.
As far as we know, this is the first meta-analysis comparing diagnostic accuracy between different techniques in low back and/or leg pain with LDH as the suspected underlying pathology.

Implications
Concerning practice we conclude that the diagnostic accuracy of today's imaging techniques in unknown. This severely hampers the choice of techniques as well as the interpretation of the outcomes as no information is present concerning false positives or negatives. Future

Conclusion
In conclusion, we found no studies evaluating modern diagnostic imaging techniques. For the older techniques we found moderate quality evidence for moderate diagnostic accuracy of CT and myelography, and very low quality evidence for moderate diagnostic accuracy of MRI in patients with suspected lumbar disc herniation. The accuracy of CT, MRI and myelography is comparable.