November 25, 2012
Editor’s note: Below are two responses to Robert Schneider’s defense of his Transcendental Meditation paper, which Schneider wrote in response to my earlier article about the publication of his paper. In the first part I respond to some of the general issues raised by Schneider. The second part, from Sanjay Kaul, addresses the statistical issues discussed by Schneider.
I’m grateful for Kaul’s highly technical analysis of the statistical issues raised by Schneider, but I don’t think this case really requires a terribly high level of technical expertise. Common sense actually works pretty well in this case. A trial with barely 200 patients cannot be expected to provide broad answers about the health benefits of a novel intervention. As Kaul and others have stated on many other occasions, “extraordinary claims require extraordinary evidence,” and it is quite clear that the evidence in this trial is not extraordinary, at least in any positive sense.
Questions About Trial Reliability And Data– In his response Schneider tries to skate away from the inevitable questions raised about this paper when Archives of Internal Medicine chose to withdraw the paper only 12 minutes before its scheduled publication time. Schneider can pretend that this incident never occurred, but outsider readers can not help but wonder what sparked this extraordinary incident, and will not be satisfied until the details are fully explained.
There are additional red flags about the trial. Schneider told WebMD that since the Archives incident “the data was re-analyzed. Also, new data was added and the study underwent an independent review.” Said Schneider: “This is the new and improved version.”
This is an extraordinary claim, because a clinical trial cannot be “new and improved” unless there were serious flaws with the earlier version. What exactly does it mean to say that a paper published in 2012 about a trial completed in 2007 is “new and improved”? (According to ClinicalTrials.Gov the study was completed in July 2007, while June 2007 was the “final data collection date” for the primary endpoint.)
The 5-year delay between the 2007 completion date and the publication of the data is highly suspicious.
What exactly caused this delay? The paper hints at one possible source of delay: as Kaul notes below, the investigators refer to the primary endpoint as a “DSMB-approved endpoint.” This suggests that the primary endpoint was changed at some point in the trial. As Kaul points out, it is not the job of the DSMB to either choose or approve primary endpoints. Since the trial was not registered until 2011 with ClinicalTrials.Gov it is impossible to sort this issue out unless the investigators choose to release the initial trial protocol and statistical plan.
Schneider’s response also fails to explain why there is a difference in the number of primary endpoint events between the Archives paper and the Circulation: Cardiovascular Quality & Outcomes paper, since the collection date for the primary outcome measure is listed as June 2007 on ClinicalTrials.Gov. I see no reason why the reason for this discrepancy shouldn’t be explained. Although the difference is only 1 event, it inevitably raises questions about the reliability of the data.
Trial Interpretation– Finally, I am deeply concerned about the way this trial will be used, or misused, to “sell” the brand of Transcendental Meditation in the broadest possible population, ie, everyone. Though the study was limited to African-American with heart disease, here’s what Schneider told the Daily Mail:
‘Transcendental meditation may reduce heart disease risks for both healthy people and those with diagnosed heart conditions. The research on transcendental meditation and cardiovascular disease is established well enough that doctors may safely and routinely prescribe stress reduction for their patients with this easy to implement, standardised and practical programme.’
Meditation may of course be beneficial, but it will never be a cure for heart disease, and it won’t replace other treatments. But here’s what Schneider told WebMD:
“What this is saying is that mind-body interventions can have an effect as big as conventional medications, such as statins,” says Schneider.
It shouldn’t be necessary to say, but the evidence base for statins is several orders of magnitude greater than the evidence base for meditation. Further, there have been no studies comparing meditation to statins. Any claim that meditation is equivalent to statins is preposterous.
To be clear, I have nothing against meditation. Generic meditation is cheap, safe, and even possibly effective. Branded Transcendental Meditation, on the other hand, is a cult, and it is out to get your money. An initial TM program costs $1500, and increases the deeper you get pulled into the cult. Here’s what Schneider told Healthday:
“One of the reasons we did the study is because insurance and Medicare calls for citing evidence for what’s to be reimbursed,” Schneider said. “This study will lead toward reimbursement. That’s the whole idea.”
Here’s the real source of my discomfort with this trial. For true believers like Schneider, fighting heart disease is important only insofar as it can be employed to further the interests of TM. Scientific standards and medical progress are unimportant in the larger scheme of promoting TM.
Read the comments left by Michael Jackson and Chrissy on my earlier post to learn more about the dangers of TM. Or do your own research on the internet.
Here’s Sanjay Kaul’s response:
By convention, the difference that the study is powered to detect (delta) varies inversely with the seriousness of the outcome, i.e., larger delta for ‘softer’ outcomes and smaller delta for ‘harder’ outcomes. This does not appear to be the case in the current study. For the first phase of the trial, the power calculation was based on a 36% risk reduction in death, nonfatal MI, nonfatal stroke, rehospitalization or revascularization (the original primary endpoint). Then, for the 2nd phase of the trial, the power calculation is based on a 50% reduction in a narrower but harder outcome of death, nonfatal MI, nonfatal stroke (the revised primary endpoint). I find it curious that the authors justify their choice of the revised primary endpoint as ‘DSMB-approved endpoint’! Since when is the DSMB charged with choosing or approving trial endpoints?
Incidentally, the Proschan-Hunsberger method refers to conditional, not unconditional, power. To compute conditional power, the investigators had to have looked at data by arm. Thus, some penalty should be paid for the ‘interim look’ in the form of requiring a larger z-score (lower p value) to claim statistical significance. They did not appear to do this.
Strength of evidence
The conventional frequentist approach relies heavily on the p value which tends to overstate the strength of association. Complementary approaches such as the Bayesian inference are available that utilize Bayes factor, a more desirable metric to quantify the strength of evidence compared with p value. For instance, the Bayes factor associated with a p value of 0.03 (observed in the trial) is about 10, which means that at a prior null probability of 50%, there is still a 10% chance of null probability based on the trial results, more than 3-fold higher than that implied by a p value of 0.03. So the evidence falls in the category of at most ‘moderate’ strength against the null.
Another way of assessing the strength of evidence is to quantify the probability of repeating a statistically significant result, the so-called ‘replication probability’. The replication probability associated with a p value of 0.03 is about 58% which is unlikely to pass the muster of any regulatory agency. The FDA regulatory standard for drug approval is ‘substantial evidence’ of effectiveness based on ‘adequate and well-controlled investigations’ which translates into 2 trials, each with a p value of 0.05. At the heart of this standard (or any scientific endeavor) is replication. The replication probability for 1 trial with a p value < 0.05 is only about 50%; replication probability of 2 trials with p value <0.05 is about 90%. In 1997 the rules were changed to base approval on the basis of a statistically persuasive result obtained in 1 trial, i.e., p value <0.001 for a mortality or a serious irreversible morbidity endpoint. The p value of 0.001 is equivalent to 2 trials with 1-sided p value of 0.025 (0.025 x 0.025 = 0.000625 or 0.001). Thus, the current trial results do not comport with ‘substantial’ or ‘robust’ evidence.
Distribution of endpoints
It seems highly unusual that 80% of the primary events were fatal. If true, it means that the subjects were dying either from a non- MI-, non-stroke-related events such as sudden cardiac death or heart-failure death (as in patients with advanced heart failure) or non-cardiovascular events not accounted for by the adjudication process.
Although many have discussed how adjusting for baseline covariates in the analysis of RCTs can improve the power of analyses of treatment effect and account for any imbalances in baseline covariates, the debate on whether this practice should be carried out remains unresolved. Many recommend that the analysis should be undertaken only if the methods of analysis and choice of covariates are pre-specified in the protocol or statistical analysis plan. This is not easily discernible without registration of clinical trials.