Systematic review of clinical trials of cervical manipulation: control group procedures and pain outcomes

Objective To characterize the types of control procedures used in controlled clinical trials of cervical spine manipulation and to evaluate the outcomes obtained by subjects in control groups so as to improve the quality of future clinical trials Methods A search of relevant clinical trials was performed in PubMed 1966-May 2010 with the following key words: "Chiropractic"[Mesh] OR "Manipulation, Spinal"[Mesh]) AND "Clinical Trial "[Publication Type]. Reference lists from these trials were searched for any additional trials. The reference lists of two prior studies, one review and one original study were also searched. Accepted reports were then rated for quality by 2 reviewers using the PEDro scale. Studies achieving a score of >50% were included for data extraction and analysis. Intra-group change scores on pain outcomes were obtained. For determining clinically important outcomes, a threshold of 20% improvement was used where continuous data were available; otherwise, an effect size of 0.30 was employed Results The PubMed search yielded 753 citations of which 13 were selected. Eight (8) other studies were identified by reviewing two systematic reviews and through reference searches. All studies scored >50% on the PEDro scale. There were 9 multi-session studies and 12 single-session studies. The most commonly used control procedure was "manual contact/no thrust". Four (4) studies used a placebo-control (patient blinded). For two of these studies with VAS data, the average change reported was 4.5 mm. For the other control procedures, variable results were obtained. No clinically important changes were reported in 57% of the paired comparisons, while, in 43% of these, changes which would be considered clinically important were obtained in the control groups. Only 15% of trials reported on post-intervention group registration. Conclusions Most control procedures in cervical manipulation trials result in small clinical changes, although larger changes are observed in 47% of paired comparisons. The vast majority of studies do not result in subject blinding; the effect of unmasking of control subjects in these studies makes the interpretation of the existing clinical trials challenging. The greatest majority of trials do not report on post-intervention blinding. A small number of candidate procedures for effective control interventions exist. Much more research is required to improve this important aspect of clinical trial methodology in cervical manipulation studies.


Introduction
Clinical trials of spinal manipulation for neck pain have been published since the early 1980's. Numerous reviews of these trials have been published in the ensuing years [1][2][3]. The lack of a valid control group has been a consistent criticism of this body of studies [1][2][3][4][5]. In 2005, Vernon et al. [6] reported on a candidate manoeuvre for a cervical sham manipulation (sham cervical thrust using a "drop" headpiece). In a small group of neck pain patients, 60% mis-registered the sham manoeuvre as a "real treatment". In these subjects, no clinically important changes were obtained post-intervention in paraspinal pressure pain thresholds (R-PPT decreased by an average of 1.2%; L-PPT decreased by an average of 6%) as well as in cervical ranges of motion.
In that report, the literature on studies of manipulation with sham/placebo manoeuvres was briefly reviewed. Of note were the studies of Hawk and her colleagues [7][8][9] who identified numerous issues attendant with the development and use of sham manipulations. Their work was focused on the lumbar spine and low back pain patients. The review by Ernst and Harkness [4] was also mentioned as one of the works critical of the extant clinical trials in manipulation for neck pain.
Vernon et al. [10] conducted a systematic review of the outcome of control groups used in clinical trials of conservative treatments for chronic neck pain. These trials included primarily laser and acupuncture studies; no study of manual therapy was included. In this review, the mean [95% CI] effect size of change in pain ratings in the no-treatment control studies at outcome points up to 10 weeks was 0.18 [-0.05, 0.41] and for outcomes from 12-52 weeks it was 0.4 [0.12, 0.68]. In the placebo control groups it was 0.50 [0. 10,0.90] at up to 10 weeks and 0.33. [-1.97, 2.66] at 12-24 weeks. None of the comparisons between the no-treatment and placebo groups were statistically significant. It was concluded that changes in pain scores in subjects with chronic neck pain not due to whiplash who are enrolled in no-treatment and placebo control groups were similarly small and not significantly different. As well, they do not appear to increase over longer-term follow-up. The placebo and no-treatment control procedures in these trials appeared to be successful in inducing relatively little therapeutic benefit.
There has been no similar review of the control procedures and control group outcomes of trials of manipulation in the cervical spine for neck pain and headaches, although a review of control group outcomes in lumbar spine trials has recently been published [11]. Such a review would assist clinicians and researchers in determining the validity of the existing evidence base as well as the applicability and generalizability of the control procedures which have been employed to date. It would also identify issues for consideration by future clinical trial groups. Reference lists from these selected trials were searched for any additional trials. The reference lists of two prior studies, one review [4] and one original study [6] were also reviewed. Finally, after reviewing the retrieval lists from these searches, the authors identified some additional trials from the general literature.

Inclusion Criteria
Studies were included into the quality review round if they fulfilled the following criteria: a) randomized clinical trial b) cervical spinal manipulation was the index treatment (studies of thoracic manipulation were excluded) c) a control group was used in any of the following forms a. placebo treatment b. non-blinded control treatment c. no-treatment or waiting list control d) the clinical complaint was neck pain, neck and arm pain or headaches e) data from a pain-related outcome was provided for each group at relevant times e) English language

Study Selection
The inclusion criteria were applied by the senior author to the titles and abstracts of the studies identified in the searches.

Quality reviewing
Studies included in the review were then subjected to quality rating by two independent raters (not the senior author). Ratings were derived using the PEDro Scale [12] for a score out of 11. Scores were converted into a percentage figure. Each rater conducted a separate rating. After this, ratings were compared. When exact agreement was not achieved, a consensus method was used to resolve any disagreements in ratings. This method involved the two raters working together first. If any disagreements could not be resolved between them, the senior author joined the discussion and forced a consensus rating. Studies scoring higher than 50% were included in the review.

Categorization of the included studies
Studies were separated into two categories: 1) single and 2) multiple intervention session trials.

Data extraction and analyses
Data were extracted by a single author. The following data was extracted: complaint type, number of subjects in the control group, control intervention type, type of primary outcome measure, whether blinding was checked post-intervention, primary pain-related outcome data for the control group(s) (typically a VAS: means, variance measures, effect sizes). For determining clinically important changes, several criteria were employed. Where continuous data were available, a threshold of 20% improvement was used; otherwise, an effect size of 0.30 was employed [similar to Vernon et al. [10,13,14]. Data were not formally pooled, although, when possible, means (sd) of the outcomes of selected groups of trials were computed. Data from the index treatment group was not analyzed.
The types of control interventions are described in Table 1. The quality scores and data extraction for these 21 studies are depicted in Table 2. All studies scored above 50% and were included in the review. The mean quality score for all these studies was 77.8 (11.7) %. There were 9 multiple session studies [15,16,[27][28][29]31,33,34] whose average quality score was 82.5% (9.2). Of these, 6 were for headaches [15,28,29,31,34,35], 2 were for neck pain [16,27] and 1 was for another complaint (hypertension [33]). There were 12 single session studies whose average quality score was 74.2% (11.9). Of these, 10 were for neck pain and 2 were for other complaints in the upper limb. There was a statistically significant difference in the quality scores favouring the multi-session studies (t = 2.49, p = 0.01). Both groups of studies had an average of 24 subjects per group (range for multi-session = 9-40; range for single session = 8-54).

Pain outcomes by control group type
Four trials employed some form of placebo control. Sloop et al.'s trial for neck pain [27] employed anamnestic valium in both groups with the control group receiving no actual manipulations. Outcomes were obtained at an average of two weeks post-treatments. Most patients received only one treatment, while some received two. The mean change in VAS scores in the control group was -5 mm; however, the standard deviation for this value was quite large at 32 mm. Vernon et al.'s [31] trial for tension-type headache employed a factorial design whereby three of four groups received at least one placebo/sham version of the therapies (amitryptiline and spinal manipulation) with one of these groups receiving both placebo treatments. This was the only trial to employ a sham cervical manipulation treatment; however, there was no report of the outcomes for each group separately, so no data were available for this review on the outcome of the double placebo group.
Two trials used de-tuned therapy devices as the control treatment. Tuchin et al.s' [15] multi-session trial for migraine headaches employed de-tuned interferential therapy and reported an average headache intensity reduction in the control group of 17 mm with an effect size of 1.17. Pikula's single session study [32] employed two control groups, one of which received du-tuned ultrasound. The immediate pain reduction averaged 4 mm with an effect size of 0.18.
Using the Sloop et al. and Pikula trials for estimating pain reduction on a VAS in the placebo control groups provides an average of 4.5 mm reduction, which is well below the level most often adopted for minimal clinically important difference and is in accord with the        In these two studies, subjects received all treatments; intention-to-treat was not applicable: PEDro score out of 10. values of placebo control group outcomes reported by Vernon et al. [10] in non-manual therapy trials.
Three trials employed, as the control treatment, cervical manipulation at an "alternate" site. Bakris et al. [33] employed recoil manipulation at what they defined as an ineffective site. As their study investigated the effect of manipulation on blood pressure, no pain-related outcomes were available. They did report virtually no difference in systolic and diastolic blood pressure pre-post intervention in this control group. Pikula [32] employed manipulation to the contralateral side in his singlesession control group. He reported an average of 3 mm reduction on a pain VAS (effect size = 0.10). Haas et al. [30] employed a cervical manipulation at an alternate site from the target pain site. This site was determined randomly and compared to sites which had been determined by manual palpation. They reported an average VAS reduction of 16 mm (effect size = 0.78) which is considered a clinically important change.
Data on pain outcomes was available from fourteen (14) trials with non-placebo control groups (5 multisession; 9 single session). In the five multi-session trials, control groups received either low-level manual contact with no thrust [28,29,34,35] or a waiting list [16]. In three of these studies [16,28,29], the average pain reduction on a VAS was 3 mm. Bove and Nilsson [29] also reported a 7-week follow-up of an average 11 mm reduction in their tension-type headache patient's headache intensity. Haas et al. reported relatively small percentage reductions in headache pain (14.8% for the 8-treatment group, 9.3% for the 16-treatment group at 12 weeks, 15.3% and 9.9% respectively at 24 weeks); however, the effect sizes for these changes were above our threshold (0.90 and 0.52 for 12 weeks; 0.52 and 0.52 for 16 weeks). The fifth trial [34] reported virtually no change in their pediatric headache control group.
Of the nine (9)  Three single-session studies which used manual contact/no thrust controls did not report pain-related outcomes. Their results were as follows: in Buchman et al. [17], the control procedure appeared to have no effect on the presence of palpable cervical segmental dysfunctions; in Tuttle et al. [23], there was no significant change in active ranges of motion; in Dunning and Rushton's trial [24], resting EMG of the biceps muscle increased by an average of 21% on the ipsilateral side and 17% on the opposite side. The first two of these studies provided some support for the proposition that non-thrust procedures do not result in changes in the mobility of the cervical spine.
With regard to clinically important changes, the 21 reviewed studies provided 35 comparisons of baseline to post-treatment pain scores (some trials had no painrelated outcomes; some trials had 2 or 3 comparison times for a pain-related outcome (Haas et al. [35] had 8 such comparisons). In 15 (43%) of these comparisons (8 trials), the control group outcomes exceeded the minimum threshold of 20% reduction of pain/tenderness or effect size greater than 0.30 (See Table 2 for *). In these 8 trials, 5 employed manual contact/no thrust controls [20,21,26,30,35], 1 employed a waiting list [16], 1 employed low level laser and deep massage [29] and 1 employed detuned ultrasound [15]. This latter trial is notable as it was the only one of these 8 trials to use a single-blind placebo treatment; the control group effect size for average headache intensity reduction was 1.17.
The nine (9) multi-session studies merit additional analysis. Seven of these trials [15,16,[27][28][29]34,35 ] provided pain-related outcome comparisons; however, five of them were for headache and only two [16,27] were for neck pain. Within these trials, the control procedures which did not result in a mean reduction of pain that reached the clinically important threshold included anamnestic valium, low power laser + deep frictions, light touch/no thrust.
With respect to the issue of confirming the success of the blinding or the subject's identification of their group assignment, only 3 trials (15%) reported performing this check [27,31,34], all of which used placebo controls. These studies reported that the blinding was generally successful. Two studies which did use a placebo control [32,33] did not report post-intervention registration. None of the studies which used non-placebo control groups reported on the degree to which subjects in each of their study groups could identify their group registration.

Discussion
The primary objective of this review was to characterize the types of control procedures used in controlled clinical trials of cervical spinal manipulation and the outcomes obtained by subjects in these groups. The goal of this analysis was to identify areas for improvement in future controlled clinical trials. Twenty-one (21) trials were identified, 9 multi-session trials and 12 singlesession trials. The most commonly employed control group procedure was "manual contact/no thrust" (12 groups). The clinical outcomes obtained in these control groups are varied, as discussed below. Clinical trial theory posits that the ideal control treatment should account for all of the non-specific effects of the index treatment but carry none of the direct therapeutic benefits [5][6][7][8][9][10][11]. Machado et al. [11] used the following terms to describe these attributes: a placebo treatment that has no known or substantiated therapeutic mechanism is termed "inert"; an inert placebo which mimics the index treatment in all aspects, including the replication of any common side effects is termed "indistinguishable"; when placebos cannot be made indistinguishable, researchers should strive to create "structural equivalence". This term refers to the degree to which the control procedures are as similar in nature and delivery as possible to the index treatment.
Any difference between the index and control treatments that is obtained in the trial ("the trial effect size") should, theoretically, result from the therapeutic mechanism purported to exist in the index therapy. Any deviation from this ideal circumstance has important effects on the potential success of a clinical trial by, for example, increasing the therapeutic effect of the control treatment, thus reducing the trial effect size (Type II error) or by increasing the therapeutic effect of the index treatment (Type I error).
Hawk and colleagues [7][8][9] and Hancock et al. [5] have noted that the development of placebo manipulation procedures by researchers in manipulative therapy has been challenging. They identified two important objectives of placebo manipulation procedures: 1) the equalization of the non-specific effect of physical touch between groups of subjects, and 2) the blinding of the subject as to the nature of the treatment. Hawk et al. identified the essence of such a placebo manoeuvre in that it "increase(es) the believability of the intervention, thus equalizing the effect of expectation of improvement between groups" [7].
Strong placebogenic effects of manipulation [4-9,36] have been hypothesized. Key factors in this regard include the encouragement of patient relaxation in order to facilitate the procedure, the generic or nonspecific effects of manual contact, including fulfillment of patients' expectations regarding manual contact on subjectively felt problem sites and the effect of the thrust and cavitation in fulfilling patient expectations that "something important has just happened". This latter point is often reinforced or amplified by positive feedback statement or behavior from the clinician [36].
Sham cervical manipulation procedures should account for the following issues: tactile contact with the skin, head and neck motions involved in the procedure, mechanical loads applied to the tissues and the sounds associated with them. Differences between a sham and an actual procedure for any one or more of these characteristics might be responsible for cuing the patient as to the nature of the procedure applied. These criteria can be applied to an evaluation of the control procedures described above.
Aside from the control procedure used in Vernon et al., [6,31], which did not involve an actual thrusting manipulation, all other control procedures identified in this review do not provide for the following important elements: a) simulated manual thrust, b) distracting noise to create ambiguity on the issue of cavitation and, c) proven lack of therapeutic effectiveness a priori. This combination creates the maximum level of "indistinguishability" possible in manual therapy research. However, with Vernon et al.'s clinical trial [31], the results of the sham manipulation as a control procedure cannot be confirmed separately from the placebo medication, as both were used in the true control group (see below).
The few studies that did employ a single thrusting manipulation control procedure did so on the basis of applying it to sites designated by the investigators as alternative to the "clinically important manipulation site". Whether this was at an alternate segment in the cervical spine [30,32] or at a supposedly non-effective site at the same segment (Bakris et al. [33] who also used a supposedly ineffective thrust direction), by actually providing a "real" manipulation (but at a supposedly inert site), these procedures did not accomplish the goals of simulating thrust and cavitation sounds (they actually produced them). Furthermore, these procedures were not tested previously for their inertness; in other words, there may have been some element of "indistinguishability", but the level of inertness was not established a priori.
In the case of Haas et al. [30], the alternate site procedure produced clinically important changes that were roughly equivalent to the index procedure; thus invalidating this procedure as a useful control manipulation (although the intention of these authors was not to establish the alternate site approach as a "control" procedure). In the case of both Pikula [32] and Bakris et al. [33], their control groups' results were only confirmed posteriorly. Pikula's small study provides very limited support for the idea that a real manipulation at a site designated as clinically less important may work as a control procedure in a single session. In the case of Bakris et al., it appears that their control procedure is highly dependent on the model of manipulation and the skill of the chiropractor involved and may not be generalizable to other circumstances.
Distinguishing control procedures which do and don't employ thrusts becomes important for two reasons. These will be discussed with respect to manual-type procedures and then non-manual type procedures. With respect to manual-type procedures, the first issue pertains to the control group subjects who receive nonthrust procedures, particularly those that involve manual contact without thrust (the predominant category in this review). While the strategy of "manual contact without thrust" does account for some similarities with real manipulation in patient positioning and in initial manual contact, such subjects do come to know after-the-fact that they have not received a thrusting procedure (because there is no simulation of thrust and cavitation noise). This may become incongruent with their expectations, especially in multi-session trials, and create a psychological factor superimposed on the more direct treatment-related outcome (which should, theoretically, be minimal). This could even rise to the status of a "nocebo" effect if the subject's posterior knowledge and resultant unmet expectations (especially over several sessions) combine to result in a negative attitude to the circumstances and in a poorer response on clinical outcome measures.
On the other hand, from Table 3 it can be seen that, in single session studies using manual contact/no thrust procedures, most often the control groups reported no significant or clinically important changes in pain, tenderness or other singular physical findings. Despite the issues raised above on lack of equivalence and the effects of unmasking, this type of procedure may be satisfactory for single session studies of the immediate effects of cervical manipulation. Given the fact that, in a small number of studies, clinically important changes in pain or tenderness were reported, it is advisable that researchers conduct a pre-test of this procedure in their hands to insure that it is generally inert before using it in a larger randomized trial.
The second issue applies to those subjects who receive the "real manipulation" in studies where other groups of subjects do not. By experiencing thrusting manipulations (with resultant audible cavitations (i.e., clicking sounds), subjects in these groups automatically receive indications which they would interpret to mean that they did receive the "index" treatment. This may result in an effect opposite to the one described above, where their treatment expectations are strongly confirmed. This may result in a non-specific effect which adds to that of the direct effects of the therapy, leading to a positive augmentation of the clinical outcomes (especially those that require subjective responses, and especially those related to satisfaction ratings). In passing, it should be remarked that in virtually all prior clinical trials of manipulation for neck pain and headaches, this is the situation that prevails in the groups receiving spinal manipulation.
With regard to non-manual control procedures, these do more readily lend themselves to the creation of placebo versions by, for example, de-tuning the equipment or applying very low doses of therapy. However, these procedures account for none of the manipulationspecific issues discussed above, making comparisons between these groups problematic, especially when issues of "mechanism of action" become important for the investigation. On strictly pragmatic grounds, non-manual placebo control procedures (such as those reviewed in Vernon et al. [10]) may be satisfactory for manipulation trials as they clearly result in clinical outcomes below the threshold of minimal clinically important difference.
With the exception of Vernon et al. [31], all studies have employed a single control procedure. Even if a single procedure is somewhat successful at masking subjects, it must do so entirely on its own. The "double placebo technique" [37][38][39][40][41] uses two procedures (either in a factorial design or in a simpler 2 or 3-group design) to increase the effectiveness of masking. While there is some evidence from the trial by Vernon et al. that this strategy was successful, more studies on this approach are needed.
It is difficult to summarize the clinical outcomes of the control groups analysed in this review on account of the highly variable methods and results. In 43% of the pain-related comparisons used in approximately onethird of the trials (8/21), the control procedure resulted in mean changes (reductions) that would be deemed clinically important. On the other hand, in some proportion of each of the categories of control procedures, these procedures resulted in mean changes which were below the minimal clinically important threshold. With regard to the multi-session studies, where three trials did report mean changes in the control groups that did not exceed the minimal clinically important threshold, several caveats are offered. In the case of Sloop et al. [27], while the control group receiving anamnestic valium + no manipulation reported very little improvement, most of the subjects in that study received only one treatment session and the outcome was obtained two weeks following treatment. In the case of Borusiak et al. [34], it should be noted that this trial, which employed a light touch/no thrust control procedure, included only children as subjects; this finding makes it difficult to justify the use of this control procedure in multi-session clinical trials with adults.
Beyond the issues of indistinguishability, equivalence and clinical effect, a very serious issue is the lack of determination of the level of blinding in the vast majority of these studies. In many cases, a result in favor of the manipulation group (especially in single-session studies) has been superficially interpreted to mean that a beneficial effect exists for manipulation, when, in fact, the investigators have no idea as to the degree to which subjects in the control group(s) may have become unblinded and how that may have affected their responses. There is a strong possibility of a Type I error which is then ignored. This problem is not confined to spinal manipulation as a therapy nor is it confined to treatments for the neck alone. Machado et al. [11] and Puhl et al. [42] have shown that the same problem exists for other pharmacologic and non-pharmacologic treatments for low back pain.
Finally, for single-session studies of clinical efficacy, every attempt should be made to report on randomization concealment and intention-to-treat, as this would make for more complete reporting and easier quality scoring.
This review has limitations. The entry-level search strategy was more broadly defined in order to identify the largest range of potential studies. While a search strategy with a more highly specified algorithm might have been employed, we are confident that our strategy is ultimately replicable by other investigators.
We did not specify the results of each item of the PEDro scale for each study, as all studies achieved a score above our threshold for acceptability. In this review, we felt that such an analysis, typically found in other systematic reviews, was not necessary, as the primary objective was not a methodological review of the studies themselves, but of the control group procedures used.
As noted in the Methods, pooling of outcomes data was not conducted. This was due to several factors. First, contrary to other systematic reviews, it was not our intention to derive a pooled estimate of the effect of a specific treatment on specific clinical entities; rather, we sought to characterize the various control group procedures used in a broad range of studies whose primary objective was the determination of the treatment effect of the index treatment, i.e., spinal manipulation. This lead to our review including a wide variety of studies with considerable heterogeneity on issues of clinical condition, subjects selected, outcomes measures and end points used, etc.
This heterogeneity made the reporting of outcomes more difficult, and less conventional, leading to our narrative account of the results.

Conclusion
The most commonly used control group procedure in clinical trials of cervical manipulation is manual contact/ no thrust. Most control procedures in cervical manipulation trials do not result in subject blinding. Clinical outcomes of these groups were varied with about one third of groups demonstrating a clinically important change. The greatest majority of trials do not report on postintervention blinding. The effect of unmasking of control subjects makes the interpretation of the existing clinical trials challenging. At the very least, future clinical trial reports should include an indication of the post-intervention registration of group allocation by subjects. A small number of candidate procedures exists for effective control interventions which create optimum combinations of inertness and practical equivalence with real manipulative therapies. Much more research is required to improve this important aspect of clinical trial methodology in cervical manipulation studies.