asympTest: A Simple R Package for Classical Parametric Statistical Tests and Confidence Intervals in Large Samples by J.-F. Coeurjolly, R. Drouilhet, P. Lafaye de Micheaux
An important point to be noticed is that stu-
dents are usually told that mean tests are robust tonon-normality for large samples as indicated by the
Abstract:
asymptotic N (0, 1) distribution in the last two cells
menting large sample tests and confidence in-
tervals. One and two sample mean and vari-
could think that this also occurs for variance tests. In-
ance tests (differences and ratios) are considered.
deed, many practitioners use the classical chi-square
The test statistics are all expressed in the same
single variance test or Fisher’s two variances test,
form as the Student t-test, which facilitates their
even if the Gaussian assumption fails. This could
presentation in the classroom. This contribution
lead to heavy errors, even for large samples, as
also fills the gap of a robust (to non-normality)
alternative to the chi-square single variance test
this situation as "catastrophic".
for large samples, since no such procedure is im-
To have a better idea of the type I error in the
plemented in standard statistical software.
classical single variance test, let us test for example
0 : σ = 1 versus H1 : σ < 1, by simulating 10000
Introduction
samples of size 1000 from an E (1) distribution (i.e. under H0) and using α = 5%. We obtained a percent-
It is sometimes desirable to compare two variances
age of rejection of the null of 21.53%, thus showing
rather than two averages. To cite a few examples
a type I error far greater than α. The percentage for
the asymptotic test (described later) is 9.05% which is
would like two college professors grading exams to
not too far from α. For a U ([0, 5]), the classical single
have the same variation in their grading; in order for
variance test leads to a type I error far lesser than α
a lid to fit a container, the variation in the lid and the
(0.44%). Our test still behaves correctly with a type
container should be the same; a supermarket might
I error near α (5.39%). This is mainly due to the de-
be interested in the variability of check-out times for
parture of the kurtosis of the distribution from 3 (for
more theoretical details see e.g. Section 2.2 of
Now usually, a first course on statistical inference
presents mean tests in both Gaussian and asymptoti-
Note that the problem of the robustness (to de-
cal frameworks (Table 1), but variance tests are often
partures from normality) of tests for comparing two
presented only in the Gaussian case (Table 2).
(or more) variances has been widely treated in theliterature, see e.g.
ences therein. These authors built specific test statis-
tics. Note also that in the one sample (non Gaussian)
case, to the best of our knowledge, no statistical tool
is available to compare a population variance to a ref-
p. 492), that a common method for construct-
ing a large sample test statistic may be based on an
estimator that has an asymptotic normal distribu-
tion. Suppose we wish to test a hypothesis about a
parameter θ, and ˆθn is some estimator of θ based on
0 : µ = µre f for both the Gaussian
a sample of size n. If we can prove some form of thecentral limit theorem to show that, as n → +∞,
where ˆσˆ is the usual standard error, which is a con-
vergent (in probability) estimate of σˆ =
then one has the basis for an approximate test. Chi−square variance test Fisher's ratio of variances test
Figure 1: P-value Plots (see under H0 of m = 10000 replications of teststatistics of the chi-square variance test (top) and Fisher’s ratio of variances test (bottom) in the large sample
Gaussian context. The parameters of the simulation are: n = n
1 = n2 = 500, Y = Y1 = Y2 ∼ χ (5) (resp. E (1),
resp. U [0, 5]). The dotted lines are 45◦ lines.
This approach can be used to complete Table 2 for
framework, with no additional difficulty, to test var-
the large sample case, shown in Table 3 for the single
ious parameters such as the mean, the variance, and
the difference or ratio of means or variances (forlarge samples). This approach also allows the direct
derivation of asymptotic confidence intervals. Note
ilar asymptotic approach, with a refinement based
on a variance stabilizing transformation, to obtain
asymptotic confidence intervals, solely for the single
variance and ratio of variances cases. Table 4 gives a
summary of the various parameters we can test and
the R functions we have implemented to computethe standard error ˆσˆ of ˆθ:
The case of a (large sample) test for a difference in
scale parameters (possibly weighted by a factor
also of interest as suggested by the availability of re-
lated procedures in R (to compute Ansari-Bradley’s
and Mood’s tests for example). The standard error
The point to be noted here is that this general ap-
Table 4: Various parameters we can test and available
very similar to the classical t-test from a mathemat-
R functions to compute standard error ˆσˆ.
ical point of view. Proofs, which are not very com-plicated, are provided in the report just cited. The
These functions can be used in conjunction with
details are not fully expounded here but lead us to
to obtain p-values for various tests. For a simple
propose a more complete, homogeneous teaching
example, if you want to use a sample contained in
Table 5: Type I error in terms of n for the test H
This contribution also solves the problem of pro-
viding an implemented “robust” (to departure of the
i.i.d. large sample distribution from normality) al-ternative to the chi-square single variance test for
large samples. Indeed, we did not find any such
of n for m = 10000 replications of the distribution
procedure in standard statistical software and so it is
highly likely that practitioners would incorrectly usea chi-square test on a single variance. It also provides
a very simple alternative to the (ratio of variances)
Fisher test in large samples. Some other “robust”
alternative procedures to the Fisher test in the case
of non Gaussian (not necessary large) samples are
implemented in R: the Bartlett test (bartlett.test),
the Fligner test (fligner.test) and the Levene test
levene.test available in the lawstat package). R
of n for m = 10000 replications of the distribution
functions, Ansari-Bradley’s and Mood’s two-sample
rank-based tests for a difference in scale parameters. The purpose of this paper is not to compare our teststo their competitors in terms of power. We neverthe-
Using asympTest
less conduct two short simulation studies (limited tothe probability of Type I error): first for the problem
of testing a variance (Table 5), comparing the clas-
tion asymp.test and six auxiliary ones designed to
single variance test to our procedure, and
compute standard errors of estimates of different pa-
second for the problem of comparing (the differences
rameters, see Table 4. The auxiliary functions will
not be the most useful ones for the user, except if
σ2 of) two variances (Tables 6, 7 and 8), comparing
the classical Fisher test to our procedure, as well as
he/she wants to compute the confidence interval
Ansari-Bradley’s test and Mood’s test. These sim-
himself/herself. The function asymp.test has been
ulations were based on the three distributions used
written in the same spirit as the standard R functions
earlier in Figure 1. The simulations show that the
t.test or var.test. The arguments of asymp.test
level α is quite correct (when n increases) for our
and the resulting outputs are also inspired from these
procedure in the case of testing a single variance and
functions. In particular, the function asympt.test re-
for all three alternative tests (ours, Ansari-Bradley’s
turns an object of class "htest" (which is the general
and Mood’s tests) for testing two variances.
This asymp.test function has several arguments,
similar to those of the t.test function, whose
description can be obtained using the command
In order to illustrate this function, let us con-
sider the Digitalis Investigation Group NHLBI
of n for m = 10000 replications of the distribution
NHLBI. Note that statistical processes such as per-
mutations within treatment groups were used to
completely anonymize the data; therefore, inferences
derived from the teaching dataset may not be valid.
The DIG Trial was a randomized, double-blind,
multicenter trial with more than 300 centers in the
United States and Canada participating. The pur-
pose of the trial was to examine the safety and effi-cacy of Digoxin in treating patients with congestive
We can see that var.test, not to be used due to the
unlikely normality of the data, significantly shows a
Diastolic BP (DIABP, mmHg) is a known risk fac-
difference in variances (at a 5% level). We don’t ob-
tor of cardiovascular diseases. In this case, it is de-
tain the same conclusion with our test.
sirable to compare the variability of this quantity for
We can also place ourselves in a fictitious case by
placebo (TRTMT=0) and treatment (TRTMT=1) groups,
both our test and the classical chi-square test to show
Reading of the data
> asymp.test(x, par = "var", alt = "gr",
Comparing the two variances
Shapiro-Wilk normality test performed by the func-
tion shapiro.test() indicates that the two samples
seem to be far from the Gaussian distribution. Thus,
this should prevent us from using the following
> var.test(DIABP ~ TRTMT, data = DIGdata,
> pchisq(chisq.stat, n-1, lower.tail = F)
For the above generated sample x, we respectivelyfound the following p-values: 0.0398 and 0.120. In
this case, we can thus see that our proposition cor-
F = 0.9295, num df = 3399, denom df = 3394
rectly accepts H1 (at the 5% level) but not the chi-
true ratio of variances is not equal to 1
Conclusion
This paper has introduced a new package called
procedures available. It is interesting firstly in the
fact that it provides a unified teaching frameworkto present classical parametric tests (based on the
> asymp.test(DIABP ~ TRTMT, data = DIGdata,
Central Limit Theorem). These tests are made read-
na.action = na.omit, parameter = "dVar")
ily available in R through an easy to use functioncalled asymp.test. This function resembles t.test
Two-sample asymptotic diff. of variances test
or var.test, so students will not be confused. Sec-ondly, it also makes available in R a robust (to non-
normality) alternative to the classical chi-square sin-
gle variance test. In the future, we also plan to pro-
vide tools similar to the power.t.test function in the
true diff. of variances is not equal to 0
Bibliography
R. G. Miller. Beyond ANOVA, Basics of Applied Statis-
tics. Texts in Statistical Science Series. Chapman &
for standard deviation of nonnormal distributions. Computational Statistics & Data Analysis, 50:775–
C. Ozgur and S. E. Strasser. A study of the statisti-
cal inference criteria: Can we agree on when to usez versus t? Decision Sciences Journal of Innovative
D. G. Bonnet. Robust confidence interval for a ratio
of standard deviations. Applied Psychological Mea-surement, 30:(5) 432–439, 2006.
G. Pan. On a Levene type test for equality of two
G. E. P. Box. Non-normality and tests on variances.
variances. Journal of Statistical Computation and Sim-
Biometrika, 40:(3/4) 318–335, 1953.
M. L. Tiku and A. Akkaya. Robust Estimation and Hy-
Duxbury Press, Belmont, California, 2nd edition,
pothesis Testing. New Age International (P) Ltd,
J.-F. Coeurjolly, R. Drouilhet, P. Lafaye de Micheaux
and J.-F. Robineau. asympTest: an R package forperforming parametric statistical tests and confi-
dence intervals based on the central limit theorem.
L. Jean Kuntzmann, Grenoble University, France
W. J. Conover, M. E. Johnson and M. M. Johnson.
A comparative study of tests for homogeneity of
variances with applications to the outer continen-
tal shelf bidding data. Technometrics, 23:(4) 351–
R. Davidson and J. G. MacKinnon. Graphical meth-
ods for investigating the size and power of hypoth-esis tests. The Manchester School, 66:(1) 1–26, 1998.
Pierre Lafaye de MicheauxDépartement de Mathématiques et Statistique
S. Dean and B. Illowsky. F Distribution and ANOVA:
T. Ferguson. A course in large sample theory. Chap-
R. A. Fisher. The use of multiple measurements in
taxonomic problems. Annals of Eugenics, 7:(Part II)
drug conjugates and the drugs which may be present in the urinesample, for binding to antibodies. In the test procedure, a sample ofurine is placed in the Sample well of the device and is allowed tomigrate upward. If the drug is present in the urine sample, itcompetes with the drug conjugate bound to the dye, for the limitedantibodies immobilized on the membrane. If the level of drug ordrug metab
RELAZIONE DELL’ATTIVITA’ SVOLTA E DI QUELLA PROGRAMMATA L’Associazione Cachisagua nasce a Trento nel novembre del 2005 e prende il nome dalla comunità indigena kichwa di Cachisagua, situata a 3000 metri sulle Ande ecuadoriane, a una ventina di km dalla città di Guaranda (provincia di Bolivar), dove alcuni volontari trentini hanno vissuto un’esperienza in ambito educativo nel pro