When (Not) to Stop a Clinical Trial for Benefit JAMA. 2005;294(17):2228-2230 (doi:10.1001/jama.294.17.2228)
Medical Practice; Medical Ethics; Randomized Controlled Trial
Randomized Trials Stopped Early for Benefit: A Systematic Review
21. Alliance for Cervical Cancer Prevention. Effectiveness, safety, and acceptabil- 25. Crum CP. The beginning of the end for cervical cancer? N Engl J Med. 2002;
ity of cryotherapy: a systematic literature review. Available at: http://www.path
.org/files/RH_cryo_white_paper.pdf. Accessibility verified October 5, 2005. 26. Skjeldstad FE, Koustsky L; for the Merck Phase III HPV Vaccine Steering Com- 22. Cox JT. Management of cervical intraepithelial neoplasia. Lancet. 1999;353:
mittee (FUTURE II). Phase III trial of prophylactic quadrivalent HPV 6, 11, 16, 18
L1 virus-like particle (VLP) vaccine: prevention of cervical intraepithelial neoplasia
23. Samson SL, Bentley JR, Fahey TJ, McKay DJ, Gill GH. The effect of loop elec-
(CIN) 2/3 including adeno- and squamous-cell carcinoma in situ (CIS). Presented
trosurgical excision procedure on future pregnancy outcome. Obstet Gynecol. 2005;
at: Infectious Diseases Society of America Late Breaker Session 66, LB-8a; Octo-
24. Goldie SJ, Kuhn L, Denny L, Pollack A, Wright TC. Policy analysis of cervical- 27. Blumenthal PD. Immunization against cervical cancer: Who? When? Where?
cancer screening strategies in low-resource settings: clinical benefits and cost
Available at: http://www.medscape.com/viewarticle/444979. Accessibility veri-
effectiveness. JAMA. 2001;285:3107-3115. When (Not) to Stop a Clinical Trial for Benefit Stuart J. Pocock, PhD
cal practice, some lenient statistical boundaries are not a sen-sible choice in the direction of benefit. For instance, the so-
INTHISISSUEOFJAMA,MONTORIANDCOLLEAGUES1PRO- calledPocockboundary9andtheO’Brien-Flemingboundary’s
vide a valuable extensive and critical systemic review
last interim look9 both typically require values around P=.02
of clinical trials that were stopped early for benefit. Read-
for stopping, which is usually insufficient strength of evi-
ers of the reports of such trials often feel a sense of ex-
dence to stop a trial for benefit. Both boundaries can be made
citement, especially when phrases such as “a major treat-
more appropriate if the overall type I error is set at 1% rather
ment advance,” “ethical need to stop the inferior treatment,”
and “vital to tell the world immediately” are used. How-
Many complex methods exist for statistical stopping bound-
ever, experience suggests that early results and enthusi-
aries, whereas in practice there is considerable merit in the
asm, especially for modestly sized trials terminated early for
simple Haybittle-Peto boundary,9 which requires PϽ.001 as
apparent major benefit, are often moderated as subsequent
evidence required to consider stopping a trial early for ben-
efit. Even so, such a boundary should not be applied too soon,
The skeptic should ask first whether correct and appro-
when few outcome events have been observed.
priate structures were in place for analyzing and review-
Decisions on early stopping (or not) need to be based on
ing, and making decisions based on, the trial’s accumulat-
wise judgments interpreting the totality of available evi-
ing interim data. Having the members of an effective
dence, both in the current trial (considering primary and
independent data monitoring committee (DMC) or data and
other efficacy outcomes and safety issues) and in other ex-
safety monitoring board as the only individuals accessing
ternal evidence (especially from related trials).10 Accord-
and interpreting interim data split by treatment group is now
ingly, a statistical stopping boundary is only one useful ob-
considered an essential part of good practice for major ran-
jective component in an inevitably more challenging decision-
domized trials.3-5 Still, a substantial minority of reported ma-
making process. The ethical dilemma is to safeguard the
jor trials appear not to have a DMC in place.6
interests of patients randomized in the current trial while
Second, with or without a formal DMC recommenda-
also protecting society from overzealous premature claims
tion, another question is whether the decision to stop a trial
of treatment benefit.11 For instance, if a trial is evaluating a
early and report the results was an appropriate judgment.
treatment meant to be given long-term for conditions such
This decision should be aided by a predefined statistical stop-
as hypertension or chronic arthritis, short-term benefits, no
ping boundary for a primary outcome,7-9 but some trials have
matter how statistically significant, may not merit early stop-
no such guideline. It is important that such a boundary is
ping. If a trial is for regulatory approval, the sponsor and
sufficiently stringent (eg, very strong evidence of a treat-
trialists should be encouraged not to stop early unless there
ment difference with a very small P value) to match the ethi-
is overwhelming evidence of treatment superiority, since the
cal and public health implications of a decision to stop the
regulators require substantial evidence of both efficacy and
trial. In a spirit of requiring proof beyond reasonable doubtthat a treatment difference is sufficient to affect future clini-
Author Affiliation: Medical Statistics Unit, London School of Hygiene and Tropi- cal Medicine, London, England. Corresponding Author: Stuart J. Pocock, PhD, Medical Statistics Unit, London School See also p 2203.
of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT, England (stuart.pocock@lshtm.ac.uk). 2228 JAMA, November 2, 2005—Vol 294, No. 17 (Reprinted)
2005 American Medical Association. All rights reserved.
safety, often in at least 2 trials reaching their intended full
hence an exaggerated claim of survival benefit was avoided
and important long-term benefits in other outcomes, such
Montori et al1 rightly draw attention to some reports of
as cardiovascular death and heart failure hospitalization, were
trials that were stopped early but that did not document
realized in each of the 3 component trials of the CHARM
the planned size and circumstances of the relevant interim
analysis and stopping boundary. Such deficiencies need
So when is it appropriate to stop a trial early? The ASCOT
correcting by authors, peer reviewers, and editors in line
factorial trial’s data monitoring experience provides useful
with CONSORT recommendations.12 Indeed, journals
insights.15,16 First, in 10305 patients with hypertension, the
should consider rejecting the report of any trial potentially
comparison of atorvastatin with placebo was halted when
stopped prematurely and lacking adequate documentation,
the difference in the primary end point, major coronary
and access to trial protocols by journals would help in
events, at interim analysis reached PϽ.001, the stopping
making this decision. There is probably less need to pre-
boundary. With 100 vs 154 primary events in the atorvas-
sent adjusted analyses that attempt to correct for the
tatin and placebo groups, respectively, and a risk ratio of
interim monitoring and early stopping, since stopping
0.64 (P = .0005), the published result was clear-cut.15 The
depends on more than a statistical boundary, and com-
appropriateness of stopping early was supported by other
plexities of adjustment can clutter the presentation of
trials of statins in other populations and by important ben-
results and make interpretation of the findings more diffi-
efits in other outcomes, such as stroke.
cult. Real insight rests more on a full understanding of the
A more difficult stopping decision arose in the ASCOT
circumstances at the time of stopping. Also, between the
trial for the 19342 patients randomized to receive
moment of making the decision to stop and locking the
amlodipine-based and atenolol-based regimens. The pre-
final database used for analysis and publication, substantial
defined primary end point was major coronary events,
additional and corrected data may become available for
whereas it is well known that the key effect of antihyper-
analysis. Indeed, such data cleaning may justify a pause
tensive treatment is in reducing risk of stroke. Thus, when
before any definite decision to stop the trial.
there emerged a highly significant reduction in stroke for
From a reader’s perspective, the key problem is whether
amlodipine-based compared with atenolol-based treatment
to believe the treatment benefit is truly as great as the data
(PϽ.001), much debate ensued on whether to stop the
imply. Montori et al1 appropriately emphasize that trials stop-
trial, resulting in a decision to continue to the next interim
ping early will tend to be on a “random high” of observed
analysis. Some months later, the trial was stopped early
benefit, and if further data had been collected in either this
when there was also a significantly higher rate of mortality
or another trial, some “regression to the truth” to a more
in the atenolol-based group, although still no significant
modest effect estimate would occur.2,13 These issues are more
difference existed for the primary end point. This example
illustrates the complexities and tough decisions that can
Montori et al reported a median of 66 events observed at
the time trials were stopped. To achieve a difference
Can a trial be stopped on the basis of secondary end points?
between treatment that is significant at PϽ.001 requires a
Perhaps not, but on occasion, such as with the ASCOT-BPLA
split by treatment group of at least 46 vs 20 events, which
study, results of secondary end points (327 strokes with am-
means that risk happens to be reduced by 57% or more. In
lodipine vs 422 with atenolol, a 23% risk reduction
most therapeutic areas, this is highly implausible and is
[P = .0003]) provide convincing evidence of great public
often associated with relatively short patient follow-up
health importance.16 In lay terms, “when early results proved
time. Thus in many settings, trials should not stop so soon,
so promising it was no longer fair to keep patients on the
because it is highly likely that the therapeutic claim is
older drugs for comparison, without giving them the op-
portunity to change.”18 However, the data in these 2 ex-
The data monitoring experience in the CHARM pro-
amples are more substantial compared with those in the ma-
gram in 7599 patients with heart failure provides a thought-
jority of trials reviewed by Montori et al. The message is clear:
provoking example.14 At the fourth interim analysis with a
most trials stopped early for benefit should not have been
median 1-year follow-up, there were 260 vs 339 deaths in
stopped at that point. Stopping for harm or futility is an-
the candesartan and placebo groups, respectively, a 24% risk
other matter19 that equally importantly requires future sys-
reduction that crossed the PϽ.001 stopping boundary. For
tematic review and comment. Inappropriate stopping of trials
several documented reasons,14 the DMC voted to continue
for commercial reasons raises additional serious con-
until the next interim analysis. The treatment mortality dif-
ference was then attenuated in subsequent interim analy-
In summary, all major randomized trials should have an
ses so that at the trial’s intended completion with a median
independent DMC that functions effectively and makes wise
of 3.1 years of follow-up, there were 886 deaths in the can-
judgments aided by stringent statistical stopping bound-
desartan group vs 945 deaths in the placebo group, a 9%
aries for benefit. It is critical that the DMC, principal inves-
risk reduction (P = .055). Early stopping was resisted, and
tigators, executive committees, and sponsors all recognize
2005 American Medical Association. All rights reserved.
(Reprinted) JAMA, November 2, 2005—Vol 294, No. 17 2229
the full public health implications of their recommenda-
10. Brocklehurst P, Elbourne D, Alfirevic A. The role of external evidence in moni- toring clinical trials: reflections from a perinatal trial. BMJ. 2000;320:995-998. 11. Pocock SJ. When to stop a clinical trial. BMJ. 1992;305:235-240. Financial Disclosures: None reported. 12. Moher D, Schulz KF, Altman DG; CONSORT Group. The CONSORT state- ment: revised recommendations for improving the quality of reports of parallel- group randomized trials. JAMA. 2001;285:1987-1991. REFERENCES 13. Pocock S, White I. Trials stopped early: too good to be true? Lancet. 1999;353: 1. Montori VM, Devereaux PJ, Adhikari NKJ, et al. Randomized trials stopped early
for benefit: a systematic review. JAMA. 2005;294:2203-2209. 14. Pocock S, Wang D, Wilhelmsen L, Hennekens CH. The data monitoring ex- 2. Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical
perience in the Candarsartan in Heart failure Assessment of Reduction in Mortal-
research. JAMA. 2005;294:218-228.
ity and morbidity (CHARM) program. Am Heart J. 2005;149:939-943. 3. Ellenberg S, Fleming T, DeMets D. Data Monitoring Committees in Clinical 15. Sever P, Dahlof B, Poulter NR, et al; ASCOT Investigators. Prevention of coro- Trials: A Practical Perspective. Chichester, England: John Wiley & Sons; 2002.
nary and stroke events with atorvastatin in hypertensive patients who have aver-
4. Draft guidance for clinical trial sponsors on the establishment and operation
age or lower-than-average cholesterol concentrations, in the Anglo-Scandinavian
of clinical trial data monitoring committees, 66 Federal Register 58151-58153
Cardiac Outcomes Trial—Lipid Lowering Arm (ASCOT-LLA): a multicentre ran-
domised controlled trial. Lancet. 2003;361:1149-1158. 5. DAMOCLES Study Group. A proposed charter for clinical trial data monitoring 16. Dahlöf B, Sever PS, Poulter NR, et al; ASCOT Investigators. Prevention of
committees: helping them to do their job well. Lancet. 2005;365:711-722.
cardiovascular events with an antihypertensive regimen of amlodipine adding
6. Sydes M, Altman DG, Babiker AB, Parmar M, Spiegelhalter DJ; DAMOCLES
perindopril as required versus atenolol adding bendroflumethiazide as required,
Study Group. Reported use of data monitoring committees in the main published
in the Anglo-Scandanavian Cardiac Outcomes Trial—Blood Pressure Lowering Arm
reports of randomised controlled trials: a cross-sectional study. Clin Trials J. 2004;
(ASCOT-BPLA): a multicentre randomised controlled trial. Lancet. 2005;366:
7. O’ Brien P. Data and safety monitoring. In: Armitage P, Colton T, eds. Ency- 17. DeMets DL, Furberg CD, Friedman L. Data Monitoring in Clinical Trials: A clopedia of Biostatistics. Chichester, England: John Wiley & Sons; 1998:1058-
Case Studies Approach. Heidelberg, Germany: Springer; 2005. 18. Hall C. Heart attacks may be cut by half. Daily Telegraph. September 5, 2005:1. 8. Fleming TR, Harrington DP, O’Brien PC. Designs for group sequential tests. 19. DeMets DL, Pocock SJ, Julian DG. The agonising negative trend in monitor- Control Clin Trials. 1984;5:348-361.
ing of clinical trials. Lancet. 1999;354:1983-1988. 9. Schulz KF, Grimes DA. Multiplicity in randomised trials, II: subgroup and in- 20. Psaty BM, Rennie D. Stopping medical research to save money: a broken pact
terim analyses. Lancet. 2005;365:1657-1661.
with researchers and patients. JAMA. 2003;289:2128-2130. 2230 JAMA, November 2, 2005—Vol 294, No. 17 (Reprinted)
2005 American Medical Association. All rights reserved.
H1N1 Flu (Swine Flu) Patient Discharge Information The H1N1 Flu (Swine Flu) is a viral respiratory disease that usually affects the respiratory tract (nose,throat, airways and lungs). The symptoms include fever, cough, sore throat, body aches, headache,The H1N1 Flu is diagnosed by collecting a respiratory specimen. The specimen we collected from youwas sent to a laboratory for testing of
Dr. Helene Pulnik, N.D., Medical Director phone 860.657.4105 www.cleanmycolon.com www.naturopathicwellness.com D i e t a r y G u i D e l i n e s For a day or two before and after a session, drink lots of filtered or distilled water, herbal teas, and fresh juices, along with eating fresh, organic and seasonal vegetables, fruits, wh