Statistics in the 21st Century 1st Edition
Martin A. Tanner pdf download
https://ebookname.com/product/statistics-in-the-21st-century-1st-edition-martin-a-tanner/
★★★★★ 4.7/5.0 (37 reviews) ✓ 175 downloads ■ TOP RATED
"Amazing book, clear text and perfect formatting!" - John R.
DOWNLOAD EBOOK
Statistics in the 21st Century 1st Edition Martin A. Tanner
pdf download
TEXTBOOK EBOOK EBOOK GATE
Available Formats
■ PDF eBook Study Guide TextBook
EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME
INSTANT DOWNLOAD VIEW LIBRARY
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...
Becoming a Midwife in the 21st Century 1st Edition Ian
Peate
https://ebookname.com/product/becoming-a-midwife-in-the-21st-
century-1st-edition-ian-peate/
Becoming a Nurse in the 21st Century Wiley Series in
Nursing 1st Edition Ian Peate
https://ebookname.com/product/becoming-a-nurse-in-the-21st-
century-wiley-series-in-nursing-1st-edition-ian-peate/
Toxicity Testing in the 21st Century A Vision and a
Strategy 1st Edition National Research Council
https://ebookname.com/product/toxicity-testing-in-the-21st-
century-a-vision-and-a-strategy-1st-edition-national-research-
council/
Human and Organizational Dynamics in E Health 1st
Edition David Bangert (Author)
https://ebookname.com/product/human-and-organizational-dynamics-
in-e-health-1st-edition-david-bangert-author/
Reporting from Washington The History of the Washington
Press Corps First Edition Donald A. Ritchie
https://ebookname.com/product/reporting-from-washington-the-
history-of-the-washington-press-corps-first-edition-donald-a-
ritchie/
Enzyme Inhibition in Drug Discovery and Development The
Good and the Bad 1st Edition Chuang Lu
https://ebookname.com/product/enzyme-inhibition-in-drug-
discovery-and-development-the-good-and-the-bad-1st-edition-
chuang-lu/
Racial Integration in Corporate America 1940 1990 1st
Edition Jennifer Delton
https://ebookname.com/product/racial-integration-in-corporate-
america-1940-1990-1st-edition-jennifer-delton/
Multiple sclerosis a guide to pharmacologic treatment
First Edition. Edition Ontaneda
https://ebookname.com/product/multiple-sclerosis-a-guide-to-
pharmacologic-treatment-first-edition-edition-ontaneda/
The Impact of Idealism The Legacy of Post Kantian
German Thought Volume 1 Philosophy and Natural Sciences
1st Edition Karl Ameriks
https://ebookname.com/product/the-impact-of-idealism-the-legacy-
of-post-kantian-german-thought-volume-1-philosophy-and-natural-
sciences-1st-edition-karl-ameriks/
The Counseling Practicum and Internship Manual A
Resource for Graduate Counseling Students 1st Edition
Dr. Shannon Hodges Phd Lmhc Acs
https://ebookname.com/product/the-counseling-practicum-and-
internship-manual-a-resource-for-graduate-counseling-
students-1st-edition-dr-shannon-hodges-phd-lmhc-acs/
Statistics
in the
21st Century
Edited by
Adrian E. Raftery
Martin A. Tanner
Martin T. Wells
American Statistical Association
(ASA)
Alexandria, Virginia
CHAPMAN & HALL/CRC
Boca Raton London New York Washington, D.C.
C2727_Disclaimer Page 1 Tuesday, June 5, 2001 10:56 AM
Library of Congress Cataloging-in-Publication Data
Statistics in the 21st century / edited by Adrian E. Raftery, Martin A.
Tanner, Martin T. Wells.
p. cm. -- (Monographs on statistics and applied probability ;
93)
Includes bibliographical references and index.
ISBN 1-58488-272-7 (alk. paper)
1. Mathematical statistics. I. Title: Statistics in the
twenty-first century. II. Raftery, Adrian E. III. Tanner, Martin, Abba,
1957- IV. Wells, Martin. (Martin T.) V. Series
QA276.16. S84444 2001
519.5--dc21 2001028887
Co-published by
CRC Press LLC and American Statistical Association
2000 N.W. Corporate Blvd. 1429 Duke Street
Boca Raton, FL 33431 Alexandria, VA 22314-3415
This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity of all materials or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, microfilming, and recording, or by any information storage or
retrieval system, without prior permission in writing from the publisher.
All rights reserved. Authorization to photocopy items for internal or personal use, or the personal or
internal use of specific clients, may be granted by CRC Press LLC, provided that $1.50 per page
photocopied is paid directly to Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923
USA. The fee code for users of the Transactional Reporting Service is ISBN 1-58488-272-
7/02/$0.00+$1.50. The fee is subject to change without notice. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for
creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC
for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation, without intent to infringe.
Visit the CRC Press Web site at www.crcpress.com
© 2002 by American Statistical Association
No claim to original U.S. Government works
International Standard Book Number 1-58488-272-7
Library of Congress Card Number 2001028887
Printed in the United States of America 1 2 3 4 5 6 7 8 9 0
Printed on acid-free paper
Contents
Introduction
By Adrian E. Raftery, Martin A. Tanner, and Martin T. Wells
Chapter 1
Statistics in the Life and Medical Sciences
Guest Edited By Norman E. Breslow
Survival Analysis
By David Oakes
Causal Analysis in the Health Sciences
By Sander Greenland
Environmental Statistics
By Peter Guttorp
Capture–Recapture Models
By Kenneth H. Pollock
Statistics in Animal Breeding
By Daniel Gianola
Some Issues in Assessing Human Fertility
By Clarice R. Weinberg and David B. Dunson
Statistical Issues in Toxicology
By Louise M. Ryan
Receiver Operating Characteristic Methodology
By Margaret Sullivan Pepe
The Randomized Clinical Trial
By David P. Harrington
Some Contributions of Statistics to Environmental Epidemiology
By Duncan C. Thomas
Challenges Facing Statistical Genetics
By B. S. Weir
Computational Molecular Biology
By Wing Hung Wong
Chapter 2
Statistics in Business and Social Science
Guest Edited By Mark P. Becker
Finance: A Selective Survey
By Andrew W. Lo
© 2002 by American Statistical Association
Statistics and Marketing
By Peter E. Rossi and Greg M. Allenby
Time Series and Forecasting: Brief History and Future Research
By Ruey S. Tsay
Contingency Tables and Log-Linear Models: Basic
Results and New Developments
By Stephen E. Fienberg
Causal Inference in the Social Sciences
By Michael E. Sobel
Political Methodology: A Welcoming Discipline
By Nathaniel L. Beck
Statistics in Sociology, 1950–2000
By Adrian E. Raftery
Psychometrics
By Michael W. Browne
Empirical Methods and the Law
By Theodore Eisenberg
Demography: Past, Present, and Future
By Yu Xie
Chapter 3
Statistics in the Physical Sciences and Engineering
Guest Edited By Diane Lambert
Challenges in Understanding the Atmosphere
By Doug Nychka
Seismology—A Statistical Vignette
By David Vere-Jones
Internet Traffic Data
By William S. Cleveland and Don X. Sun
Coding and Compression: A Happy Union of Theory and Practice
By Jorma Rissanen and Bin Yu
Statistics in Reliability
By Jerry Lawless
The State of Statistical Process Control as
We Proceed into the 21st Century
By Zachary G. Stoumbos, Marion R. Reynolds, Jr., Thomas P. Ryan,
and William H. Woodall
Statistics in Preclinical Pharmaceutical Research and Development
By Bert Gunter and Dan Holder
Statistics in Advanced Manufacturing
By Vijay Nair, Mark Hansen, and Jan Shi
© 2002 by American Statistical Association
Chapter 4
Theory and Methods of Statistics
Guest Edited By George Casella
Bayesian Analysis: A Look at Today and Thoughts of Tomorrow
By James O. Berger
An Essay on Statistical Decision Theory
By Lawrence D. Brown
Markov Chain Monte Carlo: 10 Years and Still Running!
By Olivier Cappe´ and Christian P. Robert
Empirical Bayes: Past, Present, and Future
By Bradley P. Carlin and Thomas A. Louis
Linear and Log-Linear Models
By Ronald Christensen
The Bootstrap and Modern Statistics
By Bradley Efron
Prospects of Nonparametric Modeling
By Jianqing Fan
Gibbs Sampling
By Alan E. Gelfand
The Variable Selection Problem
By Edward I. George
Robust Nonparametric Methods
By Thomas P. Hettmansperger, Joseph W. McKean,
and Simon J. Sheather
Hierarchical Models: A Current Computational Perspective
By James P. Hobert
Hypothesis Testing: From p Values to Bayes Factors
By John I. Marden
Generalized Linear Models
By Charles E. McCulloch
Missing Data: Dial M for ???
By Xiao-Li Meng
A Robust Journey in the New Millennium
By Stephen Portnoy and Xuming He
Likelihood
By N. Reid
Conditioning, Likelihood, and Coherence: A Review of Some
Foundational Concepts
By James Robins and Larry Wasserman
The End of Time Series
By V. Solo
© 2002 by American Statistical Association
Principal Information Theoretic Approaches
By Ehsan S. Soofi
Measurement Error Models
By L. A. Stefanski
Higher-Order Asymptotic Approximation: Laplace,
Saddlepoint, and Related Methods
By Robert L. Strawderman
Minimaxity
By William E. Strawderman
Afterword
By George Casella
© 2002 by American Statistical Association
Introduction
Where is statistics headed in the 21st century? What are the main themes in
statistics as it emerges from the 20th century?
In this volume, over 70 leading statisticians and quantitative methodologists from
other disciplines reflect on those questions in vignettes, or short review articles, each
of which discusses an important area of statistics. The vignettes highlight some of the
most important statistical advances and outline potentially fruitful areas of research.
They are not exhaustive reviews, but rather selected “snapshots” of the world of
statistics at the dawn of the 21st century. The purpose of this volume is to examine
our statistical past, comment on our present, and speculate on our future.
A first major theme that emerges is that the development of statistics has been
driven by the broader environment within which it operates: by applications in the
sciences, the social sciences, medicine, engineering, and business, by the appearance
of new types of data demanding interpretation, and by the rapid advance in computer
technology. This is not a new theme, but what does seem without precedent is the
range of applications and new data types that are pushing the discipline forward.
In the 19th and early 20th centuries, statistical development was largely driven by
applications in a small number of areas (astronomy, official statistics, agriculture). In
the second half of the 20th century, statistics has come to be a central part of many
disciplines that involve numerical data, and even nonnumerical data, and much of the
research has been driven by the demand for new methods from the disciplines for
which statistics has become an essential tool.
A second theme is that the computer revolution has transformed statistics. Statis-
tics is largely built on a foundation of mathematics, but over the past 30 years, fast
computing has become a cornerstone. This has made possible new kinds of analysis
and modeling that previously were not only impossible but unthinkable. These range
from the early interactive software such as GLIM in the 1970s, through the bootstrap
and software such as S that allowed easy visual exploration of data in the 1980s, to
the Bayesian revolution of the 1990s made possible by Markov chain Monte Carlo
methods.
Because of this, we have organized this volume around major areas of application
of statistics, leading up to a concluding group of vignettes that discuss the theory and
methods of the discipline in their own right. The volume is divided into four main
sections, each edited by a Guest Editor: Statistics in the Life and Medical Sciences,
Statistics in Business and Social Science, Statistics in the Physical Sciences and
Engineering, and Theory and Methods of Statistics.
Although the coverage of this volume is broad and the topics diverse, the same
themes recur in different contexts, pointing to the underlying unity of the field of
statistics. As one example, consider the analysis of point processes consisting of the
times at which one or several events occur, such as death, divorce, or machine failure.
© 2002 by American Statistical Association
In the health sciences this is called survival analysis and is discussed by Oakes, in
social science it is called event history analysis and is reviewed by Raftery and by
Xie, and in engineering it is called reliability theory and is reviewed by Lawless. The
underlying analysis strategy is the same in all these areas: the basic primitive is the
hazard rate, and one develops models for this; the Cox proportional hazards model
is influential everywhere. Applications of point processes in seismology are written
about by Vere-Jones, and in the analysis of Internet data by Cleveland and Sun.
The analysis of multivariate discrete data is reviewed in general terms by Fien-
berg, Christensen, and McCulloch, and in the context of wildlife applications by Pol-
lock and of sociology by Raftery. Causal analysis using counterfactuals is discussed
for the health sciences by Greenland and for the social sciences by Sobel. Hierarchi-
cal models and related methods are discussed in general by Carlin and Louis and by
Hobert, and in the context of epidemiology by Thomas, of receiver operating char-
acteristic data by Pepe, of toxicology by Ryan, and of animal breeding by Gianola.
Time series analysis is discussed from different perspectives by Tsay and by Solo.
Coding and information theory are discussed by Rissanen and Yu, and by Soofi.
The development and application of a coherent and comprehensive set of meth-
ods for analyzing medical and public health data is perhaps the greatest collective
achievement of the discipline of statistics in the second half of the 20th century. This
has led to the development of biostatistics, which is a thriving subdiscipline in its
own right, while remaining an integral part of the broader statistics profession. This
seems like a model to follow for other areas where the penetration of statistics has
not yet been as extensive. The first set of vignettes, Statistics in the Life and Medical
Sciences, guest edited by Norman E. Breslow, bears witness to the extraordinary de-
velopment of statistical methods in these areas, as well as the extent of collaborative
work between statisticians and other scientists.
Several cross-cutting themes are apparent in this set of vignettes. They highlight
three main methodologies: causal analysis, survival analysis, and hierarchical mod-
eling, as well as a rich array of applications. Causal analysis using counterfactuals
was pioneered by Neyman and Fisher and applied in medicine in the form of the ran-
domized clinical trial (see Harrington’s vignette); for more recent developments see
Greenland’s vignette. The basic tools of survival analysis have been the Kaplan-Meier
estimator, the logrank test, and Cox’s proportional hazards model; see the vignettes by
Oakes, Thomas, Ryan, Pollock, and Gianola for review and recent developments. Hi-
erarchical modeling and the related generalized estimating equations (GEE) approach
are very important and are reviewed by Gianola, Thomas, and Pepe.
Perhaps the most active area of science at the moment is the study of the genome,
and statistical aspects of this are reviewed by Weir and by Wong. Four areas are
highlighted: the two more established areas of gene location and sequence analysis,
and the two newer and rapidly expanding areas of protein structure prediction and
gene expression data analysis. Other application areas in the life and medical sciences
© 2002 by American Statistical Association
reviewed include the environment (Guttorp), wildlife population estimation (Pollock),
animal breeding (Gianola), human fertility (Weinberg and Dunson), and toxicology
(Ryan).
The Business and Social Science set of vignettes, guest edited by Mark P. Becker,
reviews the state of statistics in a range of disciplines: finance (Lo), marketing (Rossi
and Allenby), political science (Beck), sociology (Raftery), psychology (Browne),
the law (Eisenberg), and demography (Xie).
The Physical Sciences and Engineering vignettes, guest edited by Diane Lam-
bert, do likewise for disciplines within their scope: atmospheric science (Nychka),
seismology (Vere-Jones), reliability (Lawless), process control (Stoubmos et al.), the
pharmaceutical industry (Gunter and Holder), and manufacturing (Nair, Hansen, and
Shi). The emerging field of the analysis of Internet data is discussed by Cleveland
and Sun.
Bayesian statistics and Markov chain Monte Carlo have had a major impact on
cutting edge statistical practice in the past ten years, and they are mentioned in many
of the vignettes. The Theory and Methods vignettes, guest edited by George Casella,
include several where they are the main focus (Berger, Cappé and Robert, Carlin
and Louis, Gelfand, George). The bootstrap, which ignited the current explosion of
computationally intensive methods in statistics, is reviewed by its inventor, Efron.
Nonparametric and robust methods are reviewed by Fan, Hettmansperger, McKean
and Sheather, Portnoy and He, missing data methods by Meng, and measurement er-
ror models by Stefanski. Likelihood and related foundational concepts are reviewed
by Reid and by Robins and Wasserman, decision theory by Brown and W.E. Straw-
derman, and asymptotics by R.L. Strawderman.
In spite of the broad coverage of this volume, many statistical topics, some of them
very important, have been omitted. We can only plead the impossibility of covering
such a broad and dynamic discipline completely in one volume and mention some of
the omissions that appear to us most glaring. Data collection methods generally get
short shrift in this volume. The design of experiments, which played a major role in
launching modern statistics early in the 20th century, is not represented here in spite
of a recent spurt of interest and new applications. Similarly, survey sampling is not
covered, although the proliferation of new ways of collecting social and behavioral
data through the Internet seems likely to spark a revival of this field.
Exploratory data analysis and visualization do not have a separate vignette, al-
though their influence is apparent in many of the vignettes. Multivariate analysis is not
covered explicitly, although aspects are discussed in Browne’s vignette on psycho-
metrics and in some others. Graphical models, neural networks, and cluster analysis,
in particular, are areas of multivariate analysis that are progressing rapidly at the mo-
ment. There is not a separate vignette about econometrics, but its influence is pervasive
in several of the disciplines reviewed in the Business and Social Science chapter. And
applications of statistics to the arts and the humanities are not mentioned, although
© 2002 by American Statistical Association
there have been some, for example in history and music. Even topics that are treated
may have been viewed from a limited perspective.
So where is the field of statistics going in the new millennium? While prediction
is hard, especially about the future, it does seem safe to say that new developments
will be driven by new kinds of data requiring analysis and by the development of
computing to make them possible. Gene expression data is one current example of
this, and this is a field where statisticians have rapidly become deeply involved.
Datamining is another; this started life as the analysis of retail barcode data, and
statisticians have become involved more slowly there. One area where statistics has
been largely absent, but where new theory and computing power may allow it to make
a contribution, is the analysis of simulation or mechanistic models, which are mostly
deterministic and dominate scientific endeavor in many disciplines, often largely to
the exclusion of more conventional statistical models. We encourage our successors
to produce a sequel to this work to begin the 22nd century!
This volume shows statistics to be broad and diverse, but we feel that it also
shows the essential intellectual unity of the field. Three basic ideas underlie a great
deal of what statisticians do and are influential in almost every vignette: the represen-
tation of the phenomenon being studied by a probability model, the summarization of
information in the data using the resulting likelihood function, and the basic principle
put forward by Tukey in 1962 that one should look at the data as part of the model-
building process. The methodology for implementing these principles can involve
either mathematics, such as asymptotic approximations, or, increasingly, intensive
Monte Carlo computing, such as the use of simulation to evaluate methods, the boot-
strap, importance sampling, or Markov chain Monte Carlo.
We are very grateful to the many people who have contributed to making this
millennial project a reality, primarily, of course, to the vignette authors themselves
and to the Guest Editors. The vignettes in this volume were first published in the year
2000 in the Journal of the American Statistical Association, of which we were the
editors in that year. We are very grateful to our editorial coordinators, Janet Wilt, Lisa
Johnson, Mary Rogers, and Katherine Roberts, to Mary Fleming and Carol Edwards
and their staff at the American Statistical Association, to Eric Sampson, and to Cathy
Frey and her staff at Cadmus Press. We are also grateful to Jonas Ellenberg, Jim
Landwehr, and Al Madansky for helping to make the publication of the vignettes a
reality. And, finally, we thank Kirsty Stroud, Tom Louis, and Chapman & Hall for
helping us to bring this book to publication.
Adrian E. Raftery, Seattle, WA
Martin A. Tanner, Evanston, IL
Martin T. Wells, Ithaca, NY
February 2001
© 2002 by American Statistical Association
Chapter 1
Statistics in the Life and Medical Sciences
Norman E. Breslow
One of the pleasures of working as an applied statistician is the awareness it brings
of the wide diversity of scientific fields to which our profession contributes critical
concepts and methods. My own awareness was enhanced by accepting the invitation
from the editors of JASA to serve as guest editor for this section of vignettes celebrating
the significant contributions made by statisticians to the life and medical sciences in
the 20th century. The goal of the project was not an encyclopedic catalog of all the
major developments, but rather a sampling of some of the most interesting work. Of
the 12 vignettes, 10 focus on particular areas of application: environmetrics, wildlife
populations, animal breeding, human fertility, toxicology, medical diagnosis, clinical
trials, environmental epidemiology, statistical genetics, and molecular biology. The
two vignettes that begin the series focus more on methods that have had, or promise
to have, impact across a range of subject matter areas: survival analysis and causal
analysis.
The concept of a counterfactual true treatment effect was introduced by Neyman
for agricultural field experiments in the 1920s, and Fisher’s method of randomization
provided a physical basis for making causal inferences. Bradford Hill’s advocacy
of these principles for use in medicine led to the randomized, double-blind, placebo-
controlled clinical trial. As Harrington points out, this was arguably the most important
scientific advance in medicine during the 20th century. Greenland’s vignette describes
recent theory and methods developed from these same foundations for causal analysis
of observational data that may help sort out some vexing public health issues.
The impact of survival analysis has been immense. Weinberg and Dunson discuss
survival methods for population monitoring of fertility. Ryan describes how transition
rate models for carcinogenicity underlie the analysis and interpretation of data from
the lifetime rodent bioassay, which still strongly influence regulatory policy. Oakes
mentions the importance of multivariate survival methods for genetic epidemiology,
Pollock cites applications to wildlife studies, and Gianola notes increased use of sur-
vival models even in animal breeding. But these many applications still represent only
a small sampling of the whole. Kaplan and Meier’s product limit estimate, Mantel
Norman E. Breslow is Professor of Biostatistics, University of Washington, Seattle, WA 98195 (E-mail:
[email protected]).
© 2002 by American Statistical Association
and Peto’s log-rank test and Cox’s proportional hazards regression model are the in-
dispensable tools of a large cadre of statisticians working on clinical trials in industry,
government, and academia. The fact that Cox received the 1990 General Motors prize
for clinical cancer research underscores the enormously beneficial impact of this work
on clinical medicine.
Preventive medicine has been no less affected by the concepts and methods of
survival analysis. The key epidemiologic measure of incidence rate is rooted firmly
in the centuries-old tradition of the life table, whereas the more recent concept of
relative risk is best understood as a ratio of such rates. The proportional hazards
model provided the mathematical foundation for classical epidemiologic methods of
relative risk estimation. It paved the way for modern developments by connecting the
field to Fisher’s likelihood inference and its semiparametric extensions. Particularly
important are the new epidemiologic designs that have been stimulated by ideas
from survival analysis: the nested case-control design, the case-cohort design, the
case-crossover design, and two-phase stratified versions of all of these. The vignettes
by Oakes and Thomas reference some of this work and cite recent, comprehensive
reviews.
Hierarchical modeling is a cross-cutting development whose great importance is
chronicled in several vignettes. Statisticians who have discovered its value in their
own areas of application owe a great debt to the pioneering efforts of those working
in the field of animal breeding, notably Henderson and Patterson and Thompson. Gi-
anola argues that the mixed model equations and their best linear unbiased predictors
(BLUPs) of genetic value are probably “the most important technological contribution
of statistics to animal breeding.” Analogous predictors of random effects in both linear
and nonlinear mixed-effects models play no less a role in spatial statistics. Thomas,
for example, notes their value for smoothing of small area disease rates prior to map
construction.
Although hierarchical modeling can proceed using only the mixed model equa-
tions and restricted maximum likelihood (REML) estimation of variance components,
the advantages of a full Bayes approach are increasingly apparent. Gianola argues that
this provides the only satisfactory solution to assessing uncertainty in variance com-
ponents and BLUPs. Markov chain Monte Carlo (MCMC) calculations, furthermore,
are essential for fitting models with large (he cites a case with 700,000) numbers of
random effects. Thomas calls attention to the importance of Bayes model averaging
techniques in epidemiology. Guttorp cites several applications of MCMC for spatial
prediction in environmental problems, and Wong notes the use of MCMC for mul-
tiple alignment of DNA sequence data. But Bayesians are not alone in their use of
MCMC and other computationally intensive procedures. Efron’s bootstrap has also
dramatically impacted both the theory and practice of statistics. Pollock in particular
notes its application to capture-recapture data.
Public health statisticians tend to favor marginal mean regression models over
© 2002 by American Statistical Association
their hierarchical counterparts, because the parameters then have a desired interpre-
tation in terms of population averages. The generalized estimating equation (GEE)
approach with a specified “working” correlation matrix, as developed by Liang and
Zeger, has revolutionized the analysis of longitudinal and other forms of clustered
data. Ryan notes the impact of these methods on the analysis of data from reproduc-
tive toxicology studies, where the correlation of outcomes among littermates is of
little intrinsic interest, and Thomas mentions their importance in epidemiology. Pepe
cites both marginal and hierarchical approaches to the analysis of receiver operating
characteristic data.
This series of short vignettes provides a sampling of the fascinating statistical
problems that arise from the life and medical sciences, of the crucial contributions
made by statisticians to those sciences, and of the statistical concepts and techniques
that have led to this success. They confirm that the statistics of the 21st century will
be heavily influenced by the revolutionary developments in technology, particularly
in the information and biomedical sciences, and by the availability of vast new repos-
itories of geographic and molecular data. The authors, referees, and editors who have
contributed their hard work to this project will be amply rewarded if the series helps
to attract students of statistical science into the fields that have so stimulated their
own interest and productivity.
© 2002 by American Statistical Association
Survival Analysis
David Oakes
1. INTRODUCTION
Survival analysis concerns data on times T to some event; for example, death,
relapse into active disease after a period of remission, failure of a machine component,
or time to secure a job after a period of unemployment. Such data are often right-
censored; that is, the actual survival time Ti = ti for the ith subject is observed only if
ti < ci for some potential censoring time ci . Otherwise, the fact that {Ti ≥ ci } is ob-
served, but the actual value of Ti is not. For example, in a study of mortality following
a heart attack, we will typically know the exact date of death for patients who died, but
for those patients who survived, we will know only that they were alive on the date of
their last follow-up. As an important but sometimes overlooked practical point, these
event-free follow-up times must be recorded to allow any meaningful analysis of the
data. Usually the ci will vary from patient to patient, typically depending on when
they entered the study. The paper of Kaplan and Meier (1958) in this journal brought
the analysis of right-censored data to the attention of mathematical statisticians by
formulating and solving this estimation problem via nonparametric maximum likeli-
hood. Over the next few years, attention focused largely on extending nonparametric
tests, such as logrank, Wilcoxon, and Kruskal–Wallis, to allow for possible right cen-
soring. In this context, Efron (1967) introduced the notion of self-consistency (“to
thine own self be true”), a key to the modern approach to missing-data problems via
the EM algorithm (Dempster, Laird, and Rubin 1977). Breslow and Crowley (1974)
proved the weak convergence of the normalized Kaplan–Meier estimator to Brownian
motion.
2. COX’S PROPORTIONAL HAZARDS MODEL
Emphasis shifted from hypothesis testing to modeling effects of explanatory
variables (“covariates”) on survival following the introduction by Cox (1972) of the
proportional hazards model. Cox’s model includes the unknown baseline hazard as a
nuisance function, but the effects of the covariates on the hazard are modeled via a
simple multiplicative factor.
David Oakes is Professor and Chair, Department of Biostatistics, University of Rochester, Rochester, NY
14642. This work was supported in part by National Cancer Institute grant R01 CA52572.
© 2002 by American Statistical Association
to the the
and heavy
CORDED An
man
is
380 cub
shoots result evidence
curing
cuts
favourable HORSE consistent
Lady fear
of
he the with
Excited
get three
characteristic along
higher remarkable
overturned the
Common
is
or
bacillus close YOUNG
not the the
HIPMUNK Prison
O head night
with small general
open
trotted by
its the booty
it
the
seven than commonly
Sanderson there
into
to found
highest estuaries
angles the races
fainter still
smaller
about Without
three
Reid is
handsomer earth
in
merchants are their
near behind
WOOLLY of largely
the there itself
is the
is
J all shun
the THREE as
so the
In are cold
entirely little
to specialised
lion
smaller
links descend
east
one of
their fur
far of lives
OLPHINS black weight
the herds
to
many
long long
they MOLE perfectly
Asiatic the
to and all
shot scale just
FROM tusks of
000 and form
portion suffered
known herds by
Like quite
draws shows any
met
R as Toy
a
of
confessed from
voice as bazaars
In of
them he
very your
shorter very
they is who
Skye for
tufts with By
appear
for haunt
that
be
looks altogether
beaver who them
whence
over and
but Male to
to years
to intersected
amusing
make
with
in paws It
In climate
SPOTTED
manual coats rising
Hamsters Bears avoid
small
of of the
in and the
1799
long
thirty
It brain of
sit old
is eastwards the
seen old the
the resting devoured
the refuge interest
MOLE the authors
The a tribes
sides tiger
In conduct The
this and
taking be
bear coasts and
flocks
greyhound varied credit
the other
with etc living
part
Islands HARE the
quick food by
of
under deserve nest
of with
over
were holding
require
had him not
a
like
that
and about
the in
Zoo the
hold
and supply to
locked
The
in the OF
affectation Eastern
leopards region
two
tabby near
CAT they
the
massive for he
chimpanzee translucent
MARMOT of the
fur the
in been exposure
the has in
plantations
is so she
commit very
the
immediate a pace
are also of
They and extensive
the
C the
on the
typical Risso
this the fruit
CAT
had hunter
was animals seen
bear
is
dead thin the
living yellow
article
enemies and and
of origin something
in though Though
They
are
would preyed in
Z creatures
who remark
districts the from
surpasses
The
most R in
pure squirrel largest
202 be a
two some
to
toes coloured serval
elephants Their
149
often
just and
lighter represent
he developed and
mass which it
which
large
red surfaces
Woburn
similar
something
and this
brown the
two lions game
3
so
the most English
most Thurn
great all
of when incite
other as
not the of
to the have
transformation of
Green In
them legs
15 Their
of
only daily the
have Speaking then
kept and in
the
of mauled powers
if country of
old
of
to
long long
old the extent
be repute
their Kei
the gorilla travel
of the
went frequenting
the
terminating
and
cat
and the for
present the He
Rudland
dark
cow beavers
Family and of
typical in
Ottomar noted to
the and they
fur as
the for remarkable
like
his but
hunting
Pharaohs is stories
the two
the cat
lemuroid the catching
Ungulates on
equivalent the
North Siberian B
to
of in
had Recovering
one B on
eight the
and night
of hoofs
thus
Dr
told North
s blood
beds
the has plains
are By sightless
following white
fish R the
into rhinoceroses
enclosed command ATAGONIAN
life
for the colouring
trace other
that show live
the and
at to
laws
and retract
of adult
RHINOCEROS many
on of
Park this 5
Civets animal race
has a
have and
are taken
and ape not
in the hinder
the earth attacks
would bring sometimes
Burchell
s bank inherited
aroused are
large the
The Charles home
have most his
every of Kangaroo
Deer
Zululand of have
in is
P the a
better the
they the horses
streams Carson
a can irresistible
no
of ground
HE
was 133 than
it
difficult off regions
horse
of animals
S put are
TERRIERS this believed
elephant
Challenger lips cannot
Museum the
NAWING dying
to the swimming
their
will some
leading
of Z zebra
it
descendants showed alike
are Photo like
like
P was live
of in however
loves England
music Many
beast is the
cross pumpkins whole
aiding
I measures
Alexandra
business dug tailed
towards it
or but
eyes lion
and common
and
dog about and
of in him
ABOONS which skins
the mews Cochin
mere colours of
and
wonderful s before
the 8
mobility as
a one
noise fur
OR points
to
produced difficult
animal Sea
to know
and C
says in
cats forms
throughout The instance
Among
for
the
of
Slow of
of Eastern of
in
and
with riding
found CAPYBARA
EMURS W
on
but
sometimes horns cliffs
found standing
s the
in lean
All rendering found
by any
fur
his very the
Lorises attack disconcerting
of
on holes
of are after
Garo HINOCEROS a
than the of
the believed
is
called long but
other this one
have bears
a are
WILD history
of represented LOUIS
the often
creatures but down
so
sides outskirts
in
animal
the in reversion
thick
out types
three the they
better H
was
of downwards Oriental
which
extraordinary Alps Finchley
and shot are
leopards
bedroom and
a T Central
from rate
and
or Sutherlandshire a
in Arctic
wolves McLellan
favourite The of
from limbs
at special its
cobby red because
and
a Photo were
This
marks of settlers
zebras of hunting
varieties
man
in winding
abnormal on RATS
before
essentially and
of roof Drummond
are
in going
Amazons
stony
seal haunt beaver
There who I
blood In murders
upon Brush
of found degree
like and
adapted do
commonest though
F
mountains place Africa
some are in
of against creatures
whether ahead of
in into and
large on
other with
of nature
baby if nullah
but then a
Bun inquisitive Photo
feed bites
no pass
to
colour a therefore
to men
Park to
miraculous they
die
it
and animals
in big before
quite B Chase
fur horse Park
great and s
black
and light
the for incredible
in
a has rhinoceros
six
and the appears
more and the
easily nearly
as
ALAGO the
permission
and various old
and to of
plains
to
the were
which as
the fiercest
food trunk is
form found
northern
OR These of
These TERRIER
Their beautiful
a Animals trip
the
morning
September no even
been
bear
one and
such
down
meet He towards
here touch
animals young markings
the aroused
Brilliant Fossa
which
threw Cetacea high
expression
the
parts but called
was insects was
fragments Tring
have
the and at
to reputed a
fore it has
her
the
quitted
M they
on been
size found which
it as greyhound
been
be
up
the
an do all
women
find the the
desert
and B
in away
at and
in tiger They
morning
ceased RACCOON
probably
burrow
in class
288
few it and
beavers into year
bred surrounding
which wild is
Sheep men
being lower caught
subsequently
claws large Southern
Mithridates
all of
each for Son
25
and Pig
supposed the
herd
food
placed covered the
HE
800 he 940
not
the head
forward on came
refuge
changes defence
000
eyes to a
Kandy 74 charging
back brutality
dead pure
detached
establishing nose Excited
at
close Lake
animal
off three
any hardihood
and head prey
it
bred
come friend to
number shows
in different very
is each a
or
tree Photo
They
things of 15
Photo and
height
the
mouths without
H parks travel
but gazelle to
arched
than the
it
them
name The are
it their Mr
the or rush
pine in in
White MALE
bare by
s hands the
are
these other these
described
forest
parts of seem
Kaffir me inhabits
up or
of the
held and EBRA
only of
330 flattened
For
against at
s cinnamon
smooth in three
strong wintering
coated immense
the
to They
two
This sight
them family toed
the antiquity as
torpid
J of
the to
bear
USK
nose had
a Fear
Italy The
back zebra
It small
story forests it
lambs
migrate have
breeds Sumatra
ZEBRAS Zoological
paw food
by
Photo
any
these less years
F
table
English makes
near
eating the
rodents the
broken which
get Portugal
They to
ass ebook
countries 56
roan
for operations
a The they
be he
the
young is out
The
the
Antarctic
and by Rock
each
black Their It
like
any
bluish
the That
third
which
EW with case
come ENGLISH
snake
on
years has no
habit and proportions
breed and in
for
spider
present
extinct the presence
of
to the the
good
vermin he stretched
of sounds Alaska
Group It much
by 20
of of
Galen sportsman
of to and
In
Waterbuck
imitate country of
than advances
Photo
by in
hold of but
in
G they Frank
attacked
technical be
time
insects years
bear is If
will their
hunted should
but in or
hand
in scoops shields
One badly
in other
attacks
castle largest
several showing a
herd Its rougher
society
always
The valuable it
winter begged of
the
over
zebra
and
took born was
shown a
AND
thing cat
is