Is manipulative therapy more effective than sham manipulation in adults?: a systematic review and meta-analysis

Background Manipulative therapy is widely used in the treatment of spinal disorders. Manipulative techniques are under debate because of the possibility of adverse events. To date, the efficacy of manipulations compared to sham manipulations is unclear. The purpose of the study is: to assess the efficacy of manipulative therapy compared to sham in adults with a variety of complaints. Study design Systematic review and meta-analysis. Methods Bibliographic databases (PubMed, EMBASE, CINAHL, PEDro, Central) along with a hand search of selected bibliographies were searched from inception up to April 2012. Two reviewers independently selected randomized clinical trials (RCTs) that evaluated manipulative therapy compared to sham manipulative therapy in adults, assessed risk of bias and extracted data concerning participants, intervention, kind of sham, outcome measures, duration of follow-up, profession, data on efficacy and adverse events. Pooled (standardized) mean differences or risk differences were calculated were possible using a random effects model. The primary outcomes were pain, disability, and perceived recovery. The overall quality of the body of evidence was evaluated using GRADE. Results In total 965 references were screened for eligibility and 19 RCTs (n = 1080) met the selection criteria. Eight studies were considered of low risk of bias. There is moderate level of evidence that manipulative therapy has a significant effect in adults on pain relief immediately after treatment (standardized mean difference [SMD] - 0.68, 95% confidence interval (-1.06 to -0.31). There is low level of evidence that manipulative therapy has a significant effect in adults on pain relief (SMD - 0.37, -0.69 to -0.04) at short- term follow-up. In patients with musculoskeletal disorders, we found moderate level of evidence for pain relief (SMD - 0.73, -1.21 to -0.25) immediate after treatment and low level of evidence for pain relief (SMD - 0.52, -0.87 to -0.17) at short term-follow-up. We found very low level of evidence that manipulative therapy has no statistically significant effect on disability and perceived (asthma) recovery. Sensitivity analyses did not change the main findings. No serious adverse events were reported in the manipulative therapy or sham group. Conclusions Manipulative therapy has a clinical relevant effect on pain, but not on disability or perceived (asthma) recovery. Clinicians can refer patients for manipulative therapy to reduce pain.


Background
Manipulative therapy (MT) is widely used in the treatment of musculoskeletal and other kind of complaints. Its use has increased over the world in the past few decades [1]. Manipulative therapy consists of manipulations, which are passive, high velocity, low amplitude thrusts applied to a joint complex within its anatomical limit (active and passive motion occurs within the range of motion of the joint complex and not beyond the joint's anatomic limit). The intent of a manipulation is to create motion (including articular surface separation), function, and/or to reduce pain. It is often accompanied by a brief or repetitive popping noise within the affected joint [2]. The cracking sound is caused by cavitation of the joint, which is a term used to describe the formation and activity of bubbles within the fluid [3,4]. The mechanisms through which manipulations may alter musculoskeletal pain are unknown. Current evidence suggests an interaction between mechanical factors such as movement and forces and associated neurophysiological responses to these mechanical factors [5,6]. Various practitioners, including manipulative physical therapists, physicians, chiropractors or osteopaths use these interventions. However, the theoretic hypothesis, diagnostic tools and treatment methods between the professions differ considerably [7].
In the literature there have been reports published about an apparent association between cervical manipulation and serious complications such as arterial dissection and subsequent stroke, while others found no relation [8][9][10][11][12][13]. Minor adverse events such as aggravation of neck pain or headache, muscle soreness or stiffness are reported more often following manipulation [14]. Ideally to be justified, the risk-benefit ratio of (cervical) manipulations should be known. Manipulative therapy could be used if there is a substantial benefit that exceeds the risks (and costs). To provide insight into the active agent of manipulative therapy, research about the efficacy is needed. These trials will represent an attempt to differentiate between specific and nonspecific therapeutic effects of manipulative therapy.
As far we know there are no systematic reviews published about the efficacy of manipulative therapy versus sham manipulative therapy in adults with a variety of complaints. Earlier systematic reviews evaluated manipulative therapy versus other conservative treatments, waiting list controls or sham in specific patient groups such as low back pain, asthma or dysmenorrhea [15][16][17]. Therefore, the aim of this systematic review was to evaluate the efficacy of manipulative therapy compared with 'sham manipulative therapy' in adults with a variety of complaints on pain, disability or perceived recovery immediate after treatment, at the short term and long term follow-up.

Selection criteria
We consider published randomised clinical trials (RCTs) studies eligible that stated to evaluate manipulative therapy, including manipulations (as defined by the original authors), compared to sham manipulative therapy in adult participants (18 years of age or older) with a diversity of complaints. Studies were selected that used at least one of our primary outcome measures namely, pain intensity, disability or perceived recovery. Functions (e.g. range of motion, endfeel, propriocepsis, pulmonary functions), adverse events, quality of life and return to work were considered as secondary outcomes.

Search strategy
We identified RCTs by electronically searching the following databases from inception until April 2012: MEDLINE, EMBASE, CENTRAL (The Cochrane Library April 2012), CINAHL and PEDro. The sensitive search strategy developed by the Cochrane Handbook for Systematic Review of Interventions was followed, using free text words and MeSH Headings (Medline), Thesaurus (EMBASE, CINAHL) [18]. Combinations were made based on a) intervention (manipulation, spinal manipulation, manipulative therapy, high velocity thrust, chiropractic manipulation, osteopathic manipulation, musculoskeletal manipulation), b) comparison (placebo, sham treatment, sham manipulation and c) design: randomised clinical trial or randomised controlled trial. The complete search strategy is available on request from the primary review author. References from the included studies as well as relevant systematic reviews were screened and experts approached in order to identify additional studies. One research librarian together with a review author (WS) performed the electronic searches. Two review authors (WSP, ET) independently selected the studies first by screening title and abstract, and secondly by screening the full text papers. No restrictions were applied to year of publication or language. Disagreements on inclusion were resolved by discussion or through arbitration by a third review author (AV).

Risk of bias assessment
Two review authors (WSP, ET) independently assessed the risk of bias (RoB) of the included RCTs using the 12 criteria recommended by the Cochrane Back Review Group [18]. The criteria were scored as "yes," "no," or "unclear" and reported in the Risk of Bias table. Disagreements were solved in a consensus meeting. When disagreement persisted, a third review author (AV or KV) was consulted. A study with a low RoB was defined as fulfilling six or more of the criteria items, which is supported by empirical evidence [19].

Data extraction
Two review authors (WSP, ET, SK and MB) independently extracted the data using a standardized form (including profession, participants, intervention, kind of sham, outcome measures, duration of follow-up, drop-outs, data on efficacy and adverse events). Follow-up time intervals were defined as immediate (within one day), shortterm (≤ 3 months) and long-term (≥ 6 months). In cases of uncertainly about the data extracted, a third review author (AV) was consulted.

Data analysis
The inter-observer reliability of the risk of bias assessments was calculated using Kappa statistics and percentage agreement. We assessed the possibility of publication bias by creating funnel plots. For continuous data, we calculated weighted mean differences (WMD) with 95% confidence intervals (95% CI). Visual Analogue Scales (VAS) or Numerical Pain Rating Scales (NPRS) were converted to a 100-point scale, when necessary. In case different instruments were used to measure the same clinical outcome, we calculated standardized mean differences (SMD). For dichotomous outcomes, we calculated Risks Differences (RD) and 95% CI. All analyses were conducted in Review Manager 5.1, using a random-effects model. Prior to pooling, clinical heterogeneity sources were assessed such as participants, time-frame and outcomes. Statistical heterogeneity was considered using a cut-off point of 50%; then the results were thought to be too heterogeneous to pool. Stratified analyses were considered: 1) by time (immediate, short-term, long-term); 2) type of participants (musculoskeletal complaints versus nonmusculoskeletal complaints); 3) profession (chiropractor, physical therapist, osteopath, physician). We planned sensitivity analyses a priori to explain possible sources of heterogeneity for RoB. Results are considered clinically relevant when the pooled SMD is at least ≥ 0.5 [20].

Strength of the evidence
The overall quality of the evidence and strength of recommendations were evaluated using GRADE (Grading of Recommendations Assessment, Development and Evaluation) [21]. The quality of the evidence was based on performance against five principal domains: (1) limitations in design (downgraded when more than 25% of the participants were from studies with a high RoB), (2) inconsistency of results (downgraded in the presence of significant statistical heterogeneity [I 2 > 50%] or inconsistent findings (defined as ≤75% of the participants reporting findings in the same direction), (3) indirectness (e.g. generalizability of the findings; downgraded in those studies that used a specific subset of the population under investigation), (4) imprecision (downgraded when the total number of participants was less than 400 for continuous outcomes and 300 for dichotomous outcomes), and (5) other considerations, such as publication bias [21].
High quality evidence was defined as RCTs with low risk of bias that provided consistent, direct and precise results for the outcome. The quality of the evidence was downgraded when one of the factors described above was met [21]. Two independent review authors (WSP, ET) graded the quality of evidence. Single studies (N < 400 for continuous outcomes, N < 300 for dichotomous outcomes) were considered inconsistent and imprecise (i.e. sparse data) and provide "low quality evidence", which could be further downgraded to "very low quality evidence" if there were also limitations in design or indirectness [21]. The following grading of quality of the evidence was applied: High quality: further research is very unlikely to change our confidence in the estimate of efficacy; Moderate quality: further research is likely to have an important impact on our confidence in the estimate of efficacy and may change the estimate; one of the domains is not met; Low quality: further research is very likely to have an important impact on our confidence in the estimate of efficacy and is likely to change the estimate; two of the domains are not met; Very low quality: we are very uncertain about the estimate; three of the domains are not met.

Risk of bias
Overall, high levels of agreement between review authors were achieved for risk of bias assessments with a Kappa of 0.84 (95% CI: 0.77 to 0.90) and a percentage of agreement of 89% (95% CI: 0.84 to 0.93). Kappa values ranged from 0.53 (for item 3 and 5) to 1.0 (for items 6, 7, and 12). The results of the RoB for the individual studies are summarized in Figure 2.

Effect of manipulative therapy
The overall quality of the body of evidence is summarized in Table 2. We found moderate level of evidence for immediate effects of MT compared to sham for adults on pain. The subgroup analysis showed also moderate level of evidence for patients with musculoskeletal complaints on pain. All other levels of evidence were considered low to very low ( Table 2).
For non-musculoskeletal disorders, two low RoB studies (181 participants) with primary dysmenorrhea demonstrated a non-significant effect in favor of MT on pain  relief WMD -5.31 (95% CI -13.62 to 2.99) [27,28]. There is low level of evidence that MT is no better than sham on pain relief in patients with dysmenorrhea.
Stratification for profession, yielded in no significant differences between the professions. MT performed by physicians provided somewhat lager effect sizes than the other professions ( Figure 5), however, these results were based on one low RoB study [36].

Sensitivity analyses
Sensitivity analyses did not change our main findings. Only at short term follow-up the level of evidence changed from low level of evidence for pain relief to moderate level of evidence for no significant differences between the groups. The pooled effect size (SMD) decreased from -0.37 (-0.69 to -0.04) to -0.30 (-0.72 to 0.11) [27,36].
For the subgroup musculoskeletal disorders, the level of evidence changed from low level of evidence for pain relief to moderate level of evidence for pain relief on all points. The SMD changed from 0.71 (-1.02 to -0.39) to -0.81 (95% CI -1.17 to -0.45) [23,31,36].

Disability
Pooling was not possible because of statistical heterogeneity. There is very low level of evidence (high RoB, inconsistency, imprecision) that MT has no statistically significant effect on disability [22,[24][25][26]30,36].

Perceived recovery
One study with high risk of bias (31 patients with chronic asthma) evaluated perceived (asthma) recovery [32]. There is very low level of evidence (high RoB, inconsistency, imprecision) that MT has no statistically significant effect on perceived (asthma) recovery [32].

Quality of life
Two studies (164 participants all with low back pain), one with low RoB, were included in the meta-analyses [35,36]. Data from two other studies could not be used [25,26]. There is very low level of evidence (high RoB, inconsistency, imprecision) that there is no statistically significant effect on quality of life MD 1.22 (95% CI, -7.24 to 9.67).

Range of motion
Four studies (179 participants with musculoskeletal complaints), three with high RoB [22,38,40], evaluated range of motion (ROM) after MT [22,31,38,40]. Statistical pooling was not possible because of lack of data or heterogeneity on outcome. There is very low level of evidence (high RoB, inconsistency, imprecision) that MT is not more effective on ROM.

Pulmonary functions
Pulmonary functions were evaluated in two studies (66 participants) [32,33]. Statistical pooling was not possible because lack on data [32]. There is low level of evidence (high RoB, imprecision) that MT does not provide better pulmonary functions.

Discussion
There is low to moderate level of evidence that MT has a significant effect on pain relief in adults with a variety of complaints and in the subgroup of patients with musculoskeletal disorders. Performing sensitivity analysis, including only studies with low Rob, did not change our main findings. Ideally we need interventions with immediate effects that preferably lead to long-term clinically relevant benefits. In this study we found benefit for MT, especially in patients with musculoskeletal disorders. The pooled effect estimates were considered clinically relevant. A recent systematic review showed that musculoskeletal conditions were the most frequent indications for receiving spinal manipulation, with low back and neck pain being the most common ones [1]. Non-musculoskeletal conditions comprised a very small percentage of indications [1].
It appears reasonable that when MT is used there should be evidence for its efficacy with minimal or no harm. Only a few minor adverse events were reported in the included studies. There were no serious complications such as strokes. Sensitivity/subgroup analyses on the risk of specific manipulation techniques related to adverse events were not possible. Our findings are in agreement with earlier studies, which cast doubt about a causal relation between manipulation and stroke [11,12]. However, it must be acknowledged that the included trials were much too small to pick up more rare serious adverse events (if present).
Interestingly, this review found also some adverse events in the sham MT group [27,33]. Sham manipulation consisted of light touch at the same anatomic thoracic and occipital regions in the same position as the real manipulations [33], and low force maneuver at the left L2-L3 vertebral level in side lying position with bilateral flexion of the hips and knees [27]. Light touch is not expected to create physiological or biomechanical changes, therefore, we cannot explain these events. It seems that low force chiropractic techniques of at least 200 Newton may also produce some treatment effects and that these are indistinguishable from the real MT. To improve reporting of (minor) adverse events, we propose the usage of (validated) questionnaires, at all follow-up visits. An anonymous registration for practitioners in a database should be considered.
To our knowledge, there are no comparable systematic reviews that evaluated MT versus sham MT in adults with a variety of complaints. Therefore, we compared our results with systematic reviews, which evaluate MT on specific patient groups. An earlier systematic review on the effectiveness of MT for chronic low back pain patients found very low quality evidence that MT is equally effective than sham MT for short-term pain relief [15]. Their results were based on three RCTs [24,30,38], all included in this review. We added two more RCT, one with low RoB [29,36], resulting in a different conclusion: low evidence that MT showed statistically significantly better pain relief than sham MT. Our findings are in agreement with Gross et al 2010, who found low quality evidence for the use of thoracic manipulation for immediate pain relief in patients with neck pain [42]. A systematic review of spinal manipulations for patients with dysmenorrhea indicated that there was no evidence to suggest that spinal manipulation was effective in treating dysmenorrhea compared to sham, which is in line with our results [16]. Another Cochrane review for asthma reported from data of two trials [32,43] examining chiropractic MT compared to sham MT, that there are no significant differences between groups for lung function and quality of life measures [17]. One of the included trials concerned young (6 to 8 years) children and therefore was excluded from our systematic review [43].
Limitations of our review include the diversity of professions (chiropractor, physical therapist, osteopath or physician) who delivered the manipulations. Nevertheless, our subgroup analyses showed no clear differences in effect between different professions, but the power is low and the conclusion is based on 2 or 3 small studies. Another limitation is the diversity of sham manipulations. These varied from manipulations with a deactivated Activator instrument, a spring loaded piston activated instrument to low force mimic maneuvers or manual contact. A sham manipulation should produce the smallest possible treatment effect; because any manual intervention inevitably may produce some type of physiologic or biomechanical effect [44]. It is important that sham treatments are credible for the patient, equalizing the effect of expectation of improvement between groups, are valid, so that the patient can adequately be blinded. In this systematic review, adequate blinding of participants was performed in only seven studies [22,23,25,28,36,38,39]. Unclear and inadequate blinding may have affected and enlarged our pooled effect sizes. Moreover, blinding may be affected in patients previously exposed to manipulation.
Four studies used a cross-over design [29,32,39,40]. In crossover studies, participants will be aware eventually of the type of manipulations they received leading to probable bias and this may affect the outcome. Moreover, the effects of spinal manipulation cannot be reversed and are therefore likely to be carried over into the next cycle. However, these studies were not included in the meta-analyses and therefore, could not have affected our pooled results.
Most of our studies included less than 25 participants in their smallest study group. These studies could be considered as underpowered. Also, the overall power of the statistical pooling was limited. The total number of participants was less than 400 for continuous outcomes and 300 for dichotomous outcomes in all of our metaanalyses. Consequently, the level of evidence was downgraded. Our sensitivity analyses were comparable with the original analyses and showed that no other factors might have influenced the overall pooled effects.
Based on personal communication during the review process, two studies did not meet the inclusion criteria of manipulative therapy [25,26]. When asked, the original authors stated that no thrust was given. However, as we were unable to consequently contact all corresponding authors, we chose to base the study selection on the published reports and refrained from removing these studies from the manuscript. Nevertheless, excluding these studies [25,26] would not have affected our results as these were not included in the meta-analysis.
As in each systematic review, the possibility of publication bias cannot be omitted, and is more likely in small studies with non-significant results. Although, our funnel plots did not suggest that this was an issue in this review, relevant studies, hidden in unknown databases are difficult to locate and may not have been included. To reduce these biases, we performed a thorough search in multiple electronic databases and performed reference and hand-searching without language restrictions.

Implications for practice
MT produces pain relief immediate after treatment, at short-and long term follow-up, but no effects are found on disability and perceived (asthma) recovery. Clinicians could refer to MT for pain relief as a treatment goal. For patients with pulmonary diseases, no significant or clinical relevant effects were found.

Implications for research
The quality of evidence varied from very low to moderate, indicating that further research is likely to have an impact on the confidence in the estimate of effect and is likely to change this estimate. There is a need for future low risk of bias RCTs with large sample sizes that evaluate the effect immediate after treatment and at short-and long term follow-up not only on pain but also on disability and perceived recovery. Moreover there is a need for evaluating the effect of these procedures on specific subgroups of patients with musculoskeletal disorders. Adverse events should be reported more consequently.