Looking for the Pony in the HERS Data
One of the great public health advances of the last Hypertension(DASH)-sodiumtrialforhighbloodpressure.2 century was the development of the randomized, The rationale for the post hoc subgroup analyses in HERS is controlled clinical trial, which is designed to control different. Can we find a subgroup of women who might be for known and unknown differences in persons who take or benefited or harmed by HRT when the overall results show do not take medication (by the use of randomization), and for the effects of patient and provider expectation (by the use of It can be argued that subgroup analysis of null results is akin to “looking for the pony.” In this modern fable, a man has two sons, one a hopeless pessimist and the other anunrealistic optimist. Determined to change their thinking to a The late Reuel A. Stallones was fond of teaching that
less extreme position, the man buys a room full of toys for the clinical trials are done during a narrow window of opportu- pessimist and a room full of horse manure for the optimist.
nity when there is enough evidence for benefit to justify the When he returns later, the pessimist is crying because his toys time and expense of the trial, but not so much evidence that are already broken or soon will be. In contrast, the optimist is it would be unethical to deprive participants of the “active” happily shoveling through his gift, explaining, “With all that treatment by assigning them to placebo. This window is manure there must be a pony in there somewhere.” particularly difficult to achieve when the medication has been When the original HERS results were published in 1998,1 in use for many years, its benefit has been demonstrated the authors already had looked very hard for the pony, which repeatedly in epidemiological studies and in clinical practice, in this case consisted of characteristics that might explain the and when the thought leaders and major medical organiza- unexpected results of the HERS trial. In the present issue of tions have already recommended its widespread use.
Circulation, Furberg and colleagues3 publish the results of Such was the case with hormone replacement therapy multiple subgroup analyses from HERS. They report 9 (HRT) and the prevention of heart disease. At the beginning statistically significant interactions—that is, subgroups of of the Heart and Estrogen/progestin Replacement Study women who did better or worse in the HRT group than in the (HERS), a secondary prevention trial of estrogen in post-menopausal women with heart disease, several experts opined placebo group, either overall or in the first year when excess that this trial was unnecessary at best and unethical at worst, cardiovascular events were observed. These 9 statistically given the consistency of the observational data, which cer- significant comparisons out of 172 comparisons (86 tests for tainly looks very impressive in a meta-analysis, and the first-year outcomes and 86 for cumulative 4-year outcomes) plethora of potential cardioprotective mechanisms for estro- approximate the 5 out of 100 differences expected by chance gen that have been demonstrated in vivo and in vitro.
It was therefore more than a bit of a shock when the HERS There is no scientific way to reduce the number of trial results showed no overall benefit to HRT and a com- associations sought in post hoc analyses or to determine pletely unexpected early excess of cardiovascular events.1 which of the observed associations were not due to chance.
The reaction of the research and clinical community to these results has been one of disbelief and denial accompanied by a frantic search for possible explanations for the "trial failure."

Subgroup analyses often are used to show that benefit in subgroups parallels benefit in the study overall. For example, it is useful and reassuring to know that results are equivalent in different age, sex, and ethnic groups, as illustrated in a recent subgroup analysis of the Dietary Approaches to Stop Hypertension(DASH)-sodiumtrialforhighbloodpressure.2

For example, Furberg et al3 report more heart disease for women assigned to HRT who were also taking digitalis, and less heart disease for women smokers who were assigned toHRT. Both of these associations are biologically plausible, in a stretch. Like estrogen, digitalis is a steroid and could have
estrogen-like effects; perhaps too much estrogen is a bad thing. Women smokers treated with HRT have lower levels of
estradiol than do nonsmokers; in this case, smoking prevents "too much" estrogen. In a previously published subgroup analysis,4 there was an apparent cardiovascular benefit for
women with high lipoprotein(a) levels at baseline and harmfor those with low lipoprotein(a). The benefit is plausible,
HERS subgroup analyses do suggest that HRT works in adverse events, showed no benefit.10,11 Letters to trial partic- qualitatively different ways among a few subgroups. As ipants after the second and third year of the Women’s Health concisely stated by Sackett and colleagues,5 however, the Initiative, in which one third of women are taking unopposed statistics of determining subgroup prognoses are about pre- estrogen, report an excess of heart disease and stroke.12 diction, not etiology. “They are indifferent to whether the Thus, all clinical trial data with CHD outcomes published prognostic factor is physiologically logical . . . or a biologi- to date support the revised American Heart Association cally nonsensical and random, noncausal quirk. . . .” There- position that estrogen should not be prescribed to prevent or fore, even when the difference in response makes biological treat CHD.13 The present publication does not suggest any sense, if it was not hypothesized before the trial and is not subgroup likely to obtain benefit. The clinical trial results do supported by similar results from another trial, the observa- not exclude the possibility that physiological levels of endog- tion should not override conclusions based on the overall enous estrogen are cardioprotective.
HERS is not the first trial showing that best clinical Furberg et al3 have done a wonderful job indicating the practice can be misinformed when not evidence-based. In the caveats and limitations of multiple post hoc subset analyses.
recent past, clinical trials have shown that although medicine So then why perform these post hoc analyses? Any results corrected dangerous cardiac arrhythmias, the patient was will remain suspect unless confirmed in another trial. Never- harmed rather than helped,14 and that, despite the strong and theless, despite the dangers of wrong conclusions, it would biologically plausible reason to hope that the antioxidant make no more sense to answer only the “main” question in a vitamin E would prevent cardiovascular disease, it does not.15 large clinical trial than it would to have a large observationalstudy and not explore the data for disease associations References
