Brazilian Journal of Probability and Statistics (2001), 15, pp. 201–220.
SURVIVAL ANALYSIS: PARAMETRICS TOSEMIPARAMETRICS TO PHARMACOGENOMICS
Pranab K. SenDepartments of Biostatistics and Statistics, University of North Carolinaat Chapel Hill, USA. Email: pksen@bios.unc.edu
SummarySurvival analysis with genesis in biometry and reliability analysis evolved withstatistical modeling and analysis of life time data and survival distributions. Dur-ing the past three decades, the advent of clinical trials has expanded its domain toclinical epidemiology, biomedical sciences, and public health disciplines, encom-passing a variety of regression and noncompliance models. In this evolutionaryprocess, parametrics gave way to more robust nonparametreics and semiparamet-rics where counting processes have stolen the methodoloic limelights. The pastfew years have witnessed a phenomenal growth of research in genomic sciences. From drug research and developmental perspectives, in pharmacogenoimcs thereis a genuine need to incorporate survival analysis (albeit in a somewhat differentform) in genomic clinical trials (involving biomarkers). The present study aimsto provide a general appraisal of survival analysis with due emphasis on clinicaltrials as well as pharmacogenomics.
Key words: Bioinformatics; clinical trials; counting processes; genetic markers;genome; martingales; MRL; noncompliance; pharmacogenetics; pharmacokinet-ics; regression models; reliability; surrogate endpoint.
Life time data models typically involve nonnegative stochastic responsevariables (Y ) along with some explanatory or auxiliary variables Z whichmay include nonstochastic design and stochastic concomitant variables (allof which need not be continuous or even count ones). In a simple univariatesetup, for Y having a distribution function (d.f.) F (y), y ∈ R+, the survivalfunction (s.f.) is defined as
S(y) = 1 − F (y) = P {Y > y}, y ∈ R+.
Brazilian Journal of Probability and Statistics, 15, 2001
Note that S(0) = 1 (if there is no probability mass at y = 0), S(y) isnonincreasing in y(≥ 0), and S(∞) = 0. If F admits a density f = F ′almost everywhere (a.e.), then the hazard (failure) rate is defined as
h(y) = f (y)/S(y) = −(d/dy) log S(y), y ∈ R+,
so that h(y) is nonnegative a.e. Also, let
be the cumulative hazard (integrated failure) rate. Then H(0) = 0, H(∞)= − log S(∞) = ∞, and H(y) is monotone nondecreasing in y. Further,
Therefore, S(.), f (.) or F (.), more commonly used in biometric survivalanalysis, can equivalently be characterized in terms of H(.) or h(.), morepopularly in use in reliability analysis. It did not, however, take too longto reconcile the two approaches in a common vein: reliability and survivalanalysis (RSA), as would be pursued here.
In the presence of explanatory variables (Z), along with Y , we need
to work with the conditional s.f. S(y|z) = P {Y > y|Z = z}, or equiv-alently, the conditional h(y|z) or H(y|z), given Z = z. We relate sucha conditional function to the explananatory variable Z by an appropriateregression model. In view of the properties of the s.f., or h(.), a conven-tional linear regression model may not be appropriate. In this context, ahazard regression model may have some practical convenience over S(y|z),particularly in a semiparametric setup, as would be considered later.
In a parametric formulation, drawing inference on S(.) or F (.) can
be made by using the dual combination of maximum likelihood estima-tors (MLE) and likelihood ratio tests (LRT). For conditional models (e.g.,S(y|Z)) such parametrifc models may lose their simplicity and computa-tional ease, and may not be robust to plausible model departures. On top ofthat, censoring (due to withdrawl, drop out, or noncompliance) is commonin clinical trials and follow-up studies where survival analysis is especiallyuseful. Often, the precise nature of censoring can not be parameterized,and even it could be, the compound event may lead to a rather complicatedlikelihood function. As such, the MLE and LRT may not only lose theirnatural appeal (and optimality) to a greater extent, but also there couldbe some computational roadblocks. Nonparametrics has greater appeal inthis respect, and much of this evolved around the Kaplan-Meier (1958)product-limit estimator (PLE).
Led by the Cox (1972, 1975) ingeneous partial likelihood formulation,
semiparametrics captured the prime attention of mathematicians as wellas researchers in survival analysis, and martingale-attracted counting pro-cesses flooded statistical journals in the 1980’s and 1990’s; the rampage is
still on. Yet there is some concern on unrestricted adoption of semipara-metrics in survival analysis, especially when there are multiple end-pointsor surrogate variables, time-varying covariates, and more notably, informa-tive censoring.
In the evolving field of bioinformatics (designated as applications of
mathematics, statistics, and information technology (including computerscience) to study and analysis of very large biological, and in particular,genetic data sets), genomics is the most significant component. To em-phasize more the basic role of stochastics in this setup, biostochastics hasbeen annexed to the same complex (Sen 2001). With the nearly completedhuman genome project, genomic information is to be properly incorpo-rated in drug discovery, drug development, and pharmaceutical research,as a whole. With assistance from knowledgre discovery and data mining(KDDM) protocols, at the present, pharmacokinetics, pharmacogeneticsand pharmacpgenomics are going for extensive data mining with the ob-jectives of identifying (a) disease promoting genes, (b) disease preventinggenes, (c) personal risks, and (d) the familial factors. The task is by nomeans simple. At the present, pharmaceutical industry plans to collectand utilize genetic marker data from patients in contemplated clinical tri-als with the hope that combining clinical data and genetic marker datawould yield deeper insights into both efficacy and safety. Little is knownon how to plan a pharmacogenomic clinical trial to incorporate geneticeffects; presently, KDDM tools are used for genomic simulations. In thischallenging statistical task, survival analysis, because of its affinity to clin-ical trials, has the right potentiality of providing resolutions.
The present study aims to appraise nongenomic survival analysis with
an eye on the genomic counterpart, along with the hope that it would caterfor the needed methodologic developments in the near future.
In this phase, the evolution of parametrics and its transition to nonpara-metrics are highlighted by the reconciliation with reliability analysis andformulaion of the PLE. Parametrics evolved around the exponential law:
S1(t) = e−λt or h1(t) = λ, ∀t ≥ 0,
where λ is a positive constant; this constancy of h(t) is a charateristicproperty of the exponential s.f. which may not generally match applica-tions. To accommodate nonconstant hazard functions, other parametrics.f.’s have been considered; they include the Wibull and gamma families,having monote hazards. The Weibull s.f. has the form
S2(t) = e−λtγ or h2(t) = λγtγ−1, t ≥ 0,
Brazilian Journal of Probability and Statistics, 15, 2001
where λ and γ are both positive quantities. Thus, h2(t) is increasing or
decreasing in t according as γ is > or < 1; for γ = 1, we have the exponentialcase. A gamma density is given by
e−tλtγ−1, t ≥ 0, λ > 0, γ > 0,
so that the hazard rate is ր or ց in t ≥ 0 according as γ is > or < 1. The first spark of nonparametrics exploited this monotonicity propertyunder increasing generality.
(1) The IFR/DFR family: S(.) belongs to the increasing/decreasing
failure rate family according as h(t) is increasing/decreasing in t ≥ 0.
(2) The NBU/NWU family: S(.) belongs to the new better/worse than
S(u) ≥ ≤ S(t + u)/S(t), ∀u, t ≥ 0.
We also define the mean remaining life (MRL) at time t by
(3) The DMRL/IMRL family, characterized by µ(t) being decreasing
or increasing in t ≥ 0. Similarly, NBUE/NWUE family is characterized by µ(t) ≤ (≥)µ(0), t ≥0. There are other families studied extensively in reliability analysis; insurvival analysis, the MRL function M = {µ(t), t ≥ 0} has a special role,and we shall refer to that later on. In the parametric case, the log-normaland log-logistic s.f.’s are also sometimes used.
In many (biometric) survival analysis setups, without prior information
on monotone hazards, or preference for log-normal or log-logistic s.f.’s, wemay prefer a nonparametric approach. Based on n independent and iden-tically distributed (i.i.d.) nonnegative random variables (r.v.) Y1, . . . , Yn,
the empirical (sample) s.f. is defined as
Although Sn(.) is an optimal nonparametric estortor of S(.), it is a stepfunction with decrement n−1 at the n ordered sample points. Thus, eventhough S(.) may be absolutely continuous, Sn(.) may not be smooth enough
to match such properties. As such, if our goal is to estimate f (.), h(.) orsome other functional of f (.), the empirical s.f. Sn(.) may not suit; we may
need suitable nonparametric estimators of f (.), h(.) etc., derived essentially
from Sn(.) through smoothing to a desired extent. Among the popular
smoothing approaches we may mention the following.
(i) Kernel Smoothing (including the nearest neighbor (NN) method).
For a suitable kernel density k(.) = {k(u), u ∈ R}, normalized to have zeromean, unit variance and some other restraints, and for a suitable sequence{bn} of positive numbers, converging to zero with n → ∞, let us consider
an estimator of the density f (.) of the form
It follows then by some routine steps that
so that the mean squared error (MSE) of ˆ
In that sense, a minimizer of the MSE leads to a natural choice of bn asO(n−1/5) with an order n−4/5 for the MSE (instead of the usual ordern−1 prevailing in a parametric model). The actual choice of bn depends on
t, f (t) and the chosen k(.). Integrated MSE is usually used to have an adap-tive choice. In particular, if the kernel density is chosen to be uniform on(− 1 , b
2 n), we have the usual version of the NN method; other versions
can be treated in a similar way. Having obtained the density estimate, thehazard rate can be estimated by plugging-in the density estimate. Insteadof the kernel method, bounded spline methodology has also been developedfor log-density and hazard functions; we may refer to Green and Silverman(1994).
(ii) Differentiable Smoother. Following Chaubey and Sen (1996), let us
define a triangular array of nonnegative weight functions:
(ty)r/r!}, k = 0, . . . , n; t ≥ 0, y ≥ 0.
Let {λn, n ≥ 1} be a sequence of (possibly stochastic) positive numbers,such that λn increases (a.s.) but n−1λn decreases (a.s.) with n. For
example, one may choose λn = max{Yi : i = 1, . . . , n}. Consider then the
Brazilian Journal of Probability and Statistics, 15, 2001
Note that the wn,k(ty) are contonuously differentiable (with respect to t),so that ˜
Sn(t) is also so, and hence, it yields a smooth estimator of f (.) as
well as h(.). We refer to Chaubey and Sen (1996) for details. In survival analysis, often, noncompliance due to various reasons crop-up. Among the different modes of noncompliance, censoring of various typesare most commonly encountered. For example, a clinical trial may beplanned for a prefixed (nonstochastic) duration of time, so that responsesobserved in that time interval are all recorded, while those beyond the ter-mination point are all censored. This is termed (right) Type I censoring orright truncation; here the number of censored observations is random. Theleft censoring may be defined in a similar way. In some other cases, a studymay be planned for a (stochastic) duration of time, just enough to producea prefixed number of responses; that is termed Type II censoring. In eithercase, the s.f. S(t) can be estimated only for the period under study, and thiscan be done with simple adjustments. A more complex type of censoring,known as random censoring, may arise due to random entry of the subjectsin the study plan, and / or their withdrawl or dropiing out also conceivedin a random pattern. In this setup, the censoring may or may not be(statistically) dependent on the design or the primary response categories,and in that sense, they are said to be informative or noninformative (ran-dom) censoring. The informative censoring may generally require morecomplicated statistical modeling (depending on the informative nature),and in statistical modeling in survival analysis, noninformative censoringschemes have received greater attention (possibly due to greater mathe-matical convenience). This body of statistical research evolves around thecelebrated Kaplan and Meier (1958) PLE of S(.). Let the failure timesY1, . . . , Yn be i.i.d. nonnegative r.v.’s with the s.f. S(.), and let censoring
times C1, . . . , Cn be i.i.d.r.v.’s with s.f. SC(.); it is assumed that Ci and
Yj are independent for all i, j = 1, . . . , n. The observable random elements
Xi = Yi ∧ Ci, Ii = I(Yi = Xi), i = 1, . . . , n.
αi(t) = I(Xi ≤ t, Ii = 1), i = 1, . . . , n,
Thus, Nn(t) is the number at risk (amnong the n subjects) who have not
failed or censored prior to time t, while αi(t) is the indicator function for
the ith subject’s failure before t. Under the assumed independence of Yand C, the s.f. for X is So(t) = S(t)SC(t), t ≥ 0. Based on this basic
Pn(t), the PL estimator of S(t) is defined as follows.
Pn(t) = 0, ∀t > τn, and the nonincreasing, nonnegative stochastic
Pn(t), t ≥ 0 can be regarded as a functional of the counting process
{Nn(t), t ≥ 0}. If there is no censoring, all the Ii are equal to 1, so that
Pn(.) reduces to Sn(.), as it should. When censoring is not stochastically
independent of Y , the product formula that So(t) = S(t)SC(t) may not
hold, and hence the PLE may not be appropriate.
its decrement may not be all equal (to n−1), and hence, it may not be con-venient to use it directly for estimating the density or the hazard functionin the presence of random censoring. For this reason, suitable smoothingtools are desired to have estimates of functionals of the density under ran-dom censoring. Following Bhattacharjee and Sen (1995), we rewrite thePL estimator as follows. Let mn be the total number of failure points, and
let these ordered failure points be denoted by X∗n:i, i = 1, . . . , mn. Then,
where X∗n:0 = 0, X∗n:mn+1 = ∞ and Xn:j has the rank kj among all the
Xi, i = 1, . . . , n, for j = 1, . . . , mn. It is naturraly appealing to use the
weights {wn,k(tλn)} in (2.10) to obtain a smooth version which would leadus to suitable estimates of f (.) or h(.) in the presence of random censoring. However, there are some technical difficulties, particularly, for large valuesof t, and hence, Chaubey and Sen (1998) suggested the use of Poissonweights which lead to better estimates for the tail. Thus, we consider thesmooth estimator
{e−tλn (tλn)k/k!}S∗n(k/λn), t ≥ 0.
The estimator of f (.) (and h(.)) can readily be obtained from ˜
S∗n(.)) by direct differentiation with respect to t.
Brazilian Journal of Probability and Statistics, 15, 2001
Let us briefly comment on the MRL function, defined in (2.5). One
obvious way to estimate µ(t) is to replace S(t) by Sn(t) or ¯
then for t > τn, the estimator will not be defined properly. Moreover, the
estimated MRL will be a nonsmooth function. Hence, Chaubey and Sen(1999) recommended the use of Poisson weights based smooth estimator,as introduced above in (2.16). That results in a smooth estimator whichextends beyond τn.
Two-sample life testing models are the precursor to comparative clinicaltrials where placebo vs. treatment comparisons are quite analogous in thisrespect. In this comparison, there is a subtle emphasis on the hypothesistesting aspect. Although parametric testing procedures are quite appeal-ing, excepting for the exponential s.f., in other cases, an exact test mayoften be difficult to construct, specially under Type I or Type II censoring. Nonparametrics fare better in this respect. A major part of the devel-opments in nonparametrics took place in the 1960’s and 1970’s, and theprogress is still flowing on. Basically, for the log-survival times for theplacebo and treatment group observations, a classical two-sample linearrank test can be used. Such a two-sample linear rank statistic can be ex-pressed as a statistical functional of the two s.f.’s, and more simply, interms of the ranks of the individual sample observations (in the combinedsample) incorporating suitable scores. A two-sample linear rank statisticis a special case of (regression) linear rank statistics when the regressionconstants are binary. Within this class of tests, an optimal one can bechosen under Pitman (contiguous) alternatives. Such a test statistic is con-ditionally distribution-free under Type I and distribution-free under TypeII censoring. These tests are globally robust, and do not require the pro-portional hazard model (PHM) assumption underlying the semiparametriccounterparts. Under a PHM, the log-rank score test can be identified as anoptimal one, though plausible departures from the PHM may drasticallytake away its (asymptotic) optimality properties. We may refer to Sen(1981, Ch.11) for a general treatise of related nonparametrics.
In clinical trials, as contrasted to reliability analysis, there are a few
features that deserve special mention. First, interim analysis and time-sequential test. In a comparative trial, often, a new drug is considered asmarketable only if it performs at least as well as an existing one, and with-out any significant side-effect. Thus, in many cases, one has a one-sidedalternative that the treatment is better than the placebo. If that alterna-tive is statistically accepted then medical ethics may prompt to switch thesurviving subjects to the treatment group as soon as possible. With thatmotivation, instead of waiting till the trial is over and then drawing statis-tical conclusions, it might be more appealing to look into the accumulatingdata sets at regular time intervals, and see whether or not an early termi-
nation of the trial can be made in favor of the treatment group. In interimanalysis, this is done on a discrete basis, while in a time-sequential setup,this can be done even on a continual basis. In either way, dealing with mul-tiple statistical looks in a clinical trial setup, a repeated significance testingscheme underlies the statistical modeling and analysis part. In principlethis setup differs from the conventional group sequential setup where the se-quential process involves independent and homogeneous increments, whilehere, in a typical follow-up scheme, neither independence nor homogene-ity of increments can be taken for granted. For that reason, Progressivelycensoring schemes (PCS) based on suitable martingale characterizations ofrank statistics were advocated by Chatterjee and Sen (1973) to formulatethe methodology for time-sequential rank based procedures which may aswell be used for interim analysis. We refer to Sen (1999) for an overviewof this specific aspect. One nice feature of rank tests under random censoring is that under thehomogeneity of the censoring s.f.’s for the two groups, rank tests based onthe Xi values alone (i.e., disregarding the indicator variables Ii) have the
similar properties as in the uncensored case, though there is some loss ofpower due to the sacrifice of this information. To clarify this point, werefer back to (2.12) and (2.13). Note that under random censoring the s.f. So(t) is the product of S(t) and SC(t), so that under the null hypothe-
sis, the Xi are i.i.d.r.v.’s with the s.f. So(.). Hence the rank permutation
principle holds. As the null hypothesis distribution does not depend onSo(.), the significance level and other features remain unaltered. On theother hand, because of the discount factor SC(.) under the alternative, the
noncentrality is smaller than in the uncensored case, resulting in powerloss. If, however, our interest lies in the modeling of regression relationship(with the explanatory variables), then the situation is different, and willbe presented in the next section.
Depicting the dependence of the conditional s.f. or hazard rate on theexplanatory variables is a delicate task. If the explanatory variables arebinary or assume only a finite number of realizations, then the problemcould be reduced to a multisample model, and simple nonparametric pro-cedures can be posed. If, in addition to such nonstochastic auxiliary vari-ables, there are stochastic concomitant variates, then in a nonparametricapproach, multivariate rank procedures can be used to generate condition-ally distribution-free tests for the null hypothesis of no difference in thetreatment vs. placebo survival distributions, given the concomitant vari-ables. Such procedures have been elaborately studied in a more generalsetup of time-sequential tests in Sen (1981, ch.11). If we assume that theregression of the (log) survival time on the stochastic concomitant is lin-ear, but the conditional s.f. is arbitrary, then rank order estimates of the
Brazilian Journal of Probability and Statistics, 15, 2001
regression parameters may as well be obtained from such rank tests. It isalso possible to use nonparametric regression tools in this context, but thatmay generally require a larger sample size.
In comparative clinical trials, often the main interest centers around
the regression of the hazard function on the explanatory variables. In theclassical two-sample setup, Lehmann (1953) considered a class of alterna-tives that initiated a line of research leading to the celebrated Cox (1972)proportional hazards model (PHM). During the past 30 years the PHM hasflourished under the limelight of semiparametric models. Although semi-parametric models have touched almost every corner of statistical science,it has its genesis in survival analysis, and it has an astounding impact inthis field. It is quite appropriate to introduce the PHM in a simple setupand to relate it to the semiparametrics in a more general formulation, wheremartingales and counting processes have cropped up in a very orchesteredway.
With the same setup as in (2.12) - (2.13), we introduce covariates Zi, i =
1, . . . , n (which for the timebeing we assume to be not time-dependent). Attime t−, there are Nn(t−) units at risk, so that the conditional likelihood
for a failure of the ith unit, at time t, given a faiure (not censoring) at thattime can be computed. In that specific setup, Cox (1972) assumed that
where the baseline hazard function ho(t) is arbitrary (i.e., nonparametric)
while the regression function (on z) is parametric. Under (3.1), the partiallikelihood can be written as
where Ri = {r(≤ n) : Xr > Xi}, and the other notations are introduced in
(2.12). Therefore, the partial likelihood score function (vector) is
Suppose now that we intend to test the null hypothesis H0 : β = 0 against
alternatives that β = 0. Sen (1981) exploited a discrete time-parametermartingale propoerty of the score statistics (at the null point), and formu-lated a class of time-sequential tests. We may look into this picture froma more general perspective as follows.
Adopting the notations in (2.13), we define Wi(t) = I(Xi ≥ t) as the
at-risk indicator process for i ≥ 1, so that Nn(t) =
Then, if Λi(t) denotes the (conditional) cumulative hazard function of Yi
are (independent) square integrable martingales over [0, ∞) with predictablecovariation
where ∆Λi(t) = Λi(t) − Λi(t−). Considering the flow of the events upto
the time-point t, considering a more general case where the covariates arepossibly time-dependent (and denoting them by Zi(t)) we may consider
the score statistic (vector) upto time-point t expressed as
and a continuous time-parameter martingale property can be ascribed tothis vector process Un(t; β), t ≥ 0 (viz., Andersen et al. (1993) for a very
elaborate and unified treatise). In the two-sample model (when the Zi are
binary), the above statistic reduces to a version of the so-called log-rankstatistic, and is known to have some desirable properties. Some other two-sample tests considered earlier can also be viewed as special cases of a moregeneral weighted log-rank statistic:
Pn) is a bounded nonnegative weight function that is made to
depend on the pooled sample Kaplan and Meier (1958) product limit esti-mator. Fleming and Harrington (1991) suggested the family of weights:
Lin and Kosorok (1999) considered a function-indexed stochastic processeswherein they allowed (a, b) ∈ [0, K] × [0.K] for a suitable values of K. Within this family, they used Monte Carlo method to prescribe a specific
Brazilian Journal of Probability and Statistics, 15, 2001
weight function. Actually they also considered the case of vector valuedcovariates under the PHM as sketched before. Their study fails to revealany analytical properties of their simulation based test construction. Thereis a need to supplement their Monte Carlo findings with analytical results,though the latter may not emerge in closed forms.
The past two decades have witnessed a phenomenal growth of research
literature on semiparametrics covering all sorts of variations of the originalCox (1972) models, and these are discussed in detail in Andersen et al. (1993). Some of the developments taking place after 1992 deserve someappraisals. Two significant things in this respect are (i) multiple endpointsand surrogate endpoints models which are commonly encountered in clini-cal trials and survival analysis, and (ii) interim analysis which is generallyrecommended for trial monitoring and medical ethics consideration. Letme summarize the following remarks from Sen (1996). Due to excessivecost for measurement or other limitations, sometimes, in clinical trials,the primary endpoint may not be recordable, and it is not uncommon tomake use of a very closely related but presumably more accessible (andless expensive) variate, termed the surrogate endpoint. This substitutiongenerally results in some loss of statistical information, and also requiressome regularity conditions that may not hold universally. This is particu-larly significant when the statistical interface of the surrogate and primaryendpoints is not that precisely known, as may be the case when simulta-neous recording of both these endpoints may not be possioble. Prentice(1989) formulated some operational criteria for surrogate endpoints, andPepe (1992) formulated some alternative approaches involving a validationsample where both the endpoints are observable. In the context ofincom-plete multiresponse designs in clinical trials, Sen (1996) has reviewed theavailable tools with special emphasis on rank based procedures.
The situation with multiple endpoint clinical trials is admittedly much
more complex. Even for the classical multivariate multisample problems,sans the normality assumption, exact statistical analysis stumbles into dif-ficulties. In the setup of clinical trials, the design aspects invoke complica-tions due to the follow-up nature, possibly surrogate endpoints, censoringand noncompliance of varuious kind. Nonparametrics workout better inthis respect, though the simplicity of statistical analysis may have to becompromised to a certain extent (Sen 1996). With respect to semipara-metrics, we may note that the popular Cox PHM may not be directlyapplicable to multivariate failure time data, as the very basic assumptionof conditional independence of the endpoint variables, given the covariatesmay not hold in general. To eliminate this roadblock, some alternativesemiparametric approaches have been explored in the recent past. Theyinclude the following:
(1) The copula or Frailty model,(2) The marginal regression model, and(3) The matrix-valued counting processes model.
The frailty model is essentially a semiparametric model with conditional
independence of the multiple endpoints, given the unobservable stochas-tic frailty variable. In copula models, some specific (parametric) copulafunctions are assumed to tie up the marginal hazards or survival functionsin a semiparametric setup. A recent article by Oakes and Ritz (2000) ex-plores the bivariate case relating to both parametric and semiparametricsetups. Their simple treatment may not be very appropriate in multipleendpoint clinical trials where multiple explanatory variables and noncom-pliance factors may cause considerable complications (Clegg et al. 2000). In marginal regression models, only the marginal hazards are formulatedin a semiparametric setup, while nothing is specified on the dependencestructure of the endpoints. As such, first, for each marginal failure time, aconventional univariate semiparametric approach is advocated, and then,to account for their interdependence, the naive covariance matrix of the es-timators is replaced by some robust covariance matrix estimator (Prenticeand Cai 1992). For an additional review of such related material, we referto Clegg et al. (2000). In passing, we may remark that in the above themain issue is not related to the effect that the covariates may have on thesurvival experience, but rather to the estimation of the survival function. Pedroso de Lima and Sen (1997) proposed a model, termed the matrix-valued counting process model, that allows for modeling of multivariatefailure time data in a Cox regression type setup, incorporating first-orderinteractive intensities. In this way, their model differs from the marginalmodels or the frailty models. For each marginal failure time, we can definea counting process as in (2.13). In that way, considering the set of all fail-ure times, we will have a multivariate case where for each subject, we willhave a p-variate counting process, so that the resulting case is a collectionof n multivariate counting processes, i.e., a matrix-valued counting pro-cess. In a separate communication, Pedroso de Lima and Sen (2001) havedealt with a treatise of such counting processes in time-sequential analysisarising in clinical trials and survival analysis. As such, we shall not dealwith this material in detail here.
In the preceding section, we have discussed the statistical aspects of
interim analysis in the univariate setup. The picture becomes immenselycomplicated in the case of multiple and / or surrogate endpoint trials. Suchcomplications may not only arise due to the complexities of the multipleendpoints, their interdependent structure, and noncompliance factors, butalso due to the fact that the underlying statistical models could be muchless precisely defined and may have too many parameters involved. Evenfor the simplest parametric models involving multivariate normal distri-butions, analytical determination of the cut-off points at the successivestages becomes an unmanageable task, and in the semiparametrics, thecomplications are many fold larger. Simulation studies have therefore beenadvocated on a large scale basis for drawing statistical conclusions.
Let us conclude this section with some pertinent observations on semi-
parametrics in this general setup of survival analysis. The basic idea ofpartial likelihood (Cox 1975) has been one of the most significant develop-
Brazilian Journal of Probability and Statistics, 15, 2001
ments that paved the way for statistical modeling and analysis in far morecomplex setups than conceived before. However, in this development, thePHM, or in general, the multiplicative intensity processes that permit theformulation of the partial lilkelihood function can not be taken for grantedin all applications. In a simple two-sample setup, this amounts to sayingthat a s.f. lies above the other; but in a more general regression setup, thisentails more severe restrictions depending on the nature of the regressorsor explanatory variables. As has been pointed earlier, in multiple end-points such a condition is even less likely to hold, while the copula/frailtymodels entail some structuring of the copula or frailty functions, and inthat way lack of robustness to such model departures remains a vital con-cern. In the context of survival analysis, often, we have more emphasison the regression function on the covariates, and in that case, a nonpara-metric regression approach merits greater attention. On the other hand, acomplete nonparametric formulation may require a very large sample sizewhich may not always suit the practical application. As a compromise,it might be better to use a semiparametric model wherein the parametricregression function is replaced by a generalized additive model. Neverthe-less, for multiple or surrogate endpoints this is going to create additionalcomplications. With a complete nonparametric model, one ends up witha functional parameter space, and hence, interim analysis methodologyrequires considerable modifications.
With the advent of genomic science and information technology, genomicinformation is emerging at a furious rate and in astounding detals; thisneeds to be properly incorporated in drug discovery, drug development,and pharmaceutical research, in general. In a typical pregenomic era setup,in a clinical trial, the basic objectives were to compare a placebo and treat-ment group ( or more generally multiple treatment groups) with respect totheir survival patterns, possibly in the presence of numerous explanatoryor auxiliary variables. This makes it naturally appealing to import theusual survival analysis methodology and to extend it to suit the neededstatistical modeling and analysis task. In the present complex of phar-macokinetics, pharmacogenetics and pharmacogenomics, the situation isfar more complex, though there is a great need for clinical trials to meetthe general objectives. At the current state of developments, it is largelycontemplated and assessed to a certain extent that there are certain geneswhich are carriers of certain diseases, so that a mapping of the discseasegenes can be of immense help in better understanding the disease etiologyas well as its treatment by medical device. Likewise, there are genes whichare associated with the role of preventing certain diseases. In view of thiscomplex, when one looks into personal risks for acquiring certain disease(s),such genetic information can be very helpful. It has also been recognized
for a long period that familial factors are very important explanatory vari-ables in the disease diagnosis as well as prognosis. The use of biologicalmarkers in clinical trials is a first significant step in this utilization of ge-netic information in clinical studies. In pharmaceutical research, geneticmarker data from patients in contemplated clinical trials are planned for in-clusion in the statistical modeling and analysis with the hope that it wouldenhance the scope of assessment of both efficacy and safety of drugs. Thisway, genomic survival analysis methodology is emerging as an essential toolin pharmaceutical research.
Stochastic evolutionary forces act on genomes, and on top of that, sam-
pling schemes, as may be needed for collection and monitoring of relevantdata sets, also involve stochastics. Because of considerable variation inhuman metabolism and absorption of pharmacologic products, a precisemathematical law or deterministic equation may not be usually appropri-ate in pharmacogenomic research; statistical genetics plays a basic role inthis context. However, much of the biological developments are takingplace at a fast pace, and allowing the usual time to absorb the funda-mental outcomes in an interpretable and measurable manner, statisticalreasoning is somewhat lagging behind the technological advances. Thismight have prompted the exploratory data analysis tools under the fancyname of KDDM to take control of statistical reasonings in this highly com-putational biological field.
Mapping a disease means localizing the gene(s) associated with the
disease on a chromosome. Many of the common diseases are thought tobe under the influence of multiple environmental and genetic factors, sothat gene-environment interaction has emerged as one of the most activeresearch areas. Since there are more than 30,000 genes and the pool of dis-eases and disorders is not small either, drug discovery and target evaluationis a delicate task, lacking complete scientific precision. Microarrays havebeen incorporated as a tool for high throughput screening of genes andpotential targets. The microarray analysis, at the present time, is largelyin the court of KDDM wherein detection of outliers and quantification ofvariability are done routinely. There is a genuine need for statistical meth-ods to address some of the basic issues: multiple testing dilemmas, multiplegenes and diseases interaction, and their environmental impacts. As such,the potential problems are formidable. No wonder that the current analy-sis is more of exploratory nature. However, the statisticians could come upwith appropriate methodology to provide better support for data miningand carry out statistical analysis in a more reasonable manner.
What is the relevance of survival analysis in this complex? In what way
the classical survival analysis is amenable in this respect? Clinical trials inthe conventional sense and in pharmacogenomics have somewhat differentobjectives, and yet they share a common goal. Because of the contem-plated emphasis on clinical trials in pharmacogenomics and the affinity ofsurvival analysis to clinical trials, there is no doubt of the relevance of sur-vival analysis in this complex, though there are other pertinent queries at
Brazilian Journal of Probability and Statistics, 15, 2001
the present. Because of the leading role of statistical genetics in this com-plex, and the fact that many common stochastic processes have genesis ingenetics models, it is not difficult to foresee the need for more complex typeof stochastic processes in genomic sciences. Such complexities are likely toarise due to excessively high diemsional spatial domains, and also due toforeseen as well as unforeseen structural restrians underlying the complex;in that way, functional models (involving latent effects) are more likelyto be appropriate. This could put semiparametrics on a more favorablestand. As such, some of the developments sketched in earlier sections aremore likely to provide the desired lead of statistical methodology. In phys-ical sciences, Brownian motions (bridges, excursions) and related Gaussianprocesses crop up in many ways, so do the diffusion processes. In survivalanalysis, it is well known that such Gaussian processes provide the basisfor refined statistical analysis. As such, here in genomic survival analysis,such Gaussian processes are expected to play a similar role, and in thatway, survival analysis tools could provide the desired lead. In the samewayas semiparametrics grew out of survival analysis, it is not unlikely thatgenomic survival analysis will also be centered around the classical oneswith more genetic annexations. At the sametime, because of the compu-tational biological undercurrents, computational aspects in statistical rea-soning should not be overlooked. At the present, genomic simulations andclinical trials attempt to take into account genetic marker allele frequencies,linkage disequilibrium and their impact on clinical outcomes, in a ratherempirical way based solely on simulated clinical populations. Even in thatrespect, survival analysis models may provide methodologic incentives forsimulating clinical populations preserving the genetic undercurrents to adesired extent. It’s a challenge for the statistical researchers which we areobliged to meet.
Basically, clinical trials are designed for studying treatments for serious/chronic/life-threatening diseases or disorders. They need actual medicaltrials on concerned population, and often, are exploratory in nature (par-ticularly, to start with); see Phase I and II trials. Drug developers andpharmaceutical groups, as well as, health regulatory agencies (like the FDAin USA) generally focus on treatments to relieve symptoms which may notmatch the real treatment objectives, raising questions and concern on thecost-benefit aspects of the treatment. Bioethics and public advocates havevoiced concern about clinical trial exploitation in Third World Countries,the cost-benefit factor being a primary issue in this context. WHO andpublic health authorities all over the World are therefore trying to identifyeffective as well as affordable regimens to suit the need of developing coun-tries. We refer to Rothman and Michel (1994), Agostini (1995), Aspinalland Goodman (1995), Angell (1997) and the 1997 Helsinki Declaration of
the World Medical Association (WMA). These developments raise a basicquestion: How far medical ethics can be implemented in clinical trials withsuch diverse perspectives in mind? To what extent cost-benefit aspects canoverturn the basic medical prerequisites of a clinical trial (particularly, indeveloping countries)? How much statistical reasoning can be imparted inthis broader context?
There is another important issue [viz., Temple and Ellenberg (2000)]
that deserves our attention. Placebo-controlled trials (PCT), as have beendiscussed in earlier sections, are extensively used in developing new phar-maceuticals. There are allegations that PCT are invariably unethical whenknown effective therapy is available for the condition being treated or stud-ied, regardless of the condition or the consequences of deferring treatments. The WMA 1997 Helsinki declaration documents ethical principles for clin-ical investigations. In any medical study, every patient - including those ofcontrol group - if any, should be assured of the best proven diagnostic andtherapeutic method. Based on this declaration, patients asked to partici-pate in a PCT must be informed of the existence of any effective therapy,must be able to explore the consequences of deferring such therapy with theinvestigator, and must provide fully informed consent. This would providejustification for the PCT even when effective therapy exists. This declara-tion has also led to the formulation of active-controlled equivalence trials(ACET) which may show that a new therapy is superior (or not inferior)to an existing one - but may not have all the other characteristics of aPCT. In any case, there is a genuine need for development of innovativestatistical methodology to address such ACET, and also the more complexscenario of pharmacodynamics and pharmacokinetics.
Agostini, A. (1995). Placebo and EU guidelines. Lancet, 310, 1710.
Andersen, P. K., Borgan, O., Gill, R. D., and Keiding, N. (1993). Statis-
tical Models Based on Counting Processes. New York: Springer.
Angell, M. (1997). The ethics of clinical research in the third world. New
Aspinall, R. L. and Goodman, N. W. (1995). Denial of effective treatment
and poor quality of clinical information in placebo-controlled trialsof ondansetron for postoperative nausea and vomiting: A review ofpublished trials. Brit. J. Med., 311, 844–846.
Bhattacharjee, M. C. and Sen, P. K. (1995). Kolmogorov-Smirnov type
tests for NB(W)UE alternatives under censoring schemes. Inst.Math. Statist. Lect. - Monog. Sr., 27, 25–38.
Brazilian Journal of Probability and Statistics, 15, 2001
Chatterjee, S. K. and Sen, P. K. (1973). Nonparametric testing under
progressive censoring. Calcutta Stat. Asso. Bull., 22, 13–50.
Chaubey, Y. P. and Sen, P. K. (1996). On smooth estimation of survival
and density functions. Statist. Decs., 14, 1–19.
Chaubey, Y. P. and Sen, P. K. (1997). On smooth estimation of hazard
and cumulative hazard functions. Frontiers in Probability and Statis-tics (eds. S. P. Mukherjee et al.) Narosa, New Delhi, pp. 92–100.
Chaubey, Y. P. and Sen, P. K. (1998). On smooth functional estimation
under random censoring. Frontiers in Reliability (eds. A. P. Basu etal.) World Scientific, Singapore, pp. 83–97.
Chaubey, Y. P. and Sen, P. K. (1999). On smooth estimation of mean
residual life. J. Stat. Plan. Infer., 75, 223–236.
Clayton, D. G. and Cuzick, J. (1985). Multivariate generalizations of the
proportional hazards model (with discussion). J. Roy. Statist. Soc. A, 148, 82–117.
Clegg, L. X., Cai, J. and Sen, P. K. (1999). A marginal mixed baseline
hazards model for multivariate failure time data. Biometrics,55, 805–812.
Clegg, L. X., Cai, J. and Sen, P. K. (2000). Modeling multivariate failure
time data. Handbook of Statistics, Vol. 18: Bioenvironmental andPublic Health Statistics (eds. P. K. Sen and C. R. Rao), Elsevier,Amsterdam, pp. 803–838.
Cox, D. R. (1972). Regression models and life tables (with discussion).
J. Roy. Statist. Soc. B, 34, 187–220.
Cox, D. R. (1975). Partial likelihood. Biometrika, 62, 269–276.
Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and
Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and
Generalized Linear Models. London: Chapman-Hall.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models.
Kaplan, E. L. and Meier, P. (1958). Non-parametric estimation from
incomplete observations. J. Amer. Statist. Assoc., 53, 457–481,562–563.
Lehmann, E. L. (1953). The power of rank tests. Ann. Math. Statist.,
Lin, C.-Y. and Kosorok, M. R. (1999). A general class of function-indexed
nonparametric tests for survival analysis. Ann. Statist., 27, 1722–1744.
Murphy, S. A. (1995). Asymptotic theory for the frailty model. Ann.
Oakes, D. and Ritz, J. (2000). Regression in a bivariate copula model.
Pedroso de Lima, A. C. and Sen, P. K. (1997). A matrix-valued counting
process with first-order interactive intensities. Ann. Appl. Probab.,7, 494-507.
Pedroso de Lima, A. C. and Sen, P. K. (2001). Time-sequential inference
and counting processes. (in preparation).
Pepe, M. S. (1992). Inference using surrogate outcome data and a valida-
tion sample. Biometrika, 79, 495–512.
Prentice, R. L. (1989). Surrogate end-points in clinical trials: Definition
and operational criteria. Statist. Med., 8, 431–440.
Prentice, R. L. and Cai, J. (1992). Covariance and survivor function
estimation using censored multivariate failure time data. Biometrika,79, 495–512.
Rothman, K. J. and Mitchels, K. B. (1994). The continued unethical use
of placebo-controls. New Eng. J. Med., 331, 394–398.
Sen, P. K. (1981a). The Cox regression model, invariance principles for
some induced quantile processes and some repeated significance test. Ann. Statist., 9, 109–121.
Sen, P. K. (1981b). Sequential Nonparametrics: Invariance Principles
and Statistical Inference. New York: Wiley.
Sen, P. K. (1996). Design and analysis of experiments : Nonparametric
methods with application to clinical trials. Handbook of Statistics,Vol. 13: Design and Analysis of Experiments (eds. S. Ghosh and C. R. Rao), Elsevier, Amsterdam, pp. 91–150.
Sen, P. K. (1999). Multiple comparisons in interim analysis. J. Statist.
Sen, P. K. (2001). Excursions in Biostochastics: Biometry to Biostatistics
to Bioinformatics. Lect. Notes Inst. Statist. Acad. Sinica, Taipei,Taiwan.
Brazilian Journal of Probability and Statistics, 15, 2001
Temple, R. and Ellenberg, S. S. (2000). Placebo-controled trials and
active-controlled trials in the evaluation of new treatments, I: Ethicaland scientific issues. Ann. Inter. Med., 133, 455–463.
World Medical Association Declaration of Helsinki. (1997). Recommen-
dation guiding physicians in biomedical research involving humansubjects. J. Amer. Med. Assoc., 277, 925–926.

Meinen Sohn habe ich damals 11 Monate gestillt bis er quasi selber abgestillt hat. Ich dachte immer, diese Story vom selber Abstillen denken sich Mütter aus, um ihren Entscheid zu rechtfertigen. Weit gefehlt, bei uns hat tatsächlich mein Sohn das Ende der Stillzeit bestimmt. Meine erste Stillzeit verlief also wie im Bilderbuch. Vor zwei Monaten kam meine Tochter zur Welt. Nach erheblichen Kompl

Journal of Fish Diseases 2002, 25, 733–736Identification of Edwardsiella ictaluri from diseasedfreshwater catfish, Pangasius hypophthalmus (Sauvage),M Crumlish1, T T Dung2, J F Turnbull1, N T N Ngoc2 and H W Ferguson11 Institute of Aquaculture, University of Stirling, Stirling, UK2 Aquaculture and Fisheries Science Institute, CanTho University, CanTho, VietnamKeywords: Edwardsiella ictaluri