- Báo Cáo Thực Tập
- Luận Văn - Báo Cáo
- Kỹ Năng Mềm
- Mẫu Slide
- Kinh Doanh - Tiếp Thị
- Kinh Tế - Quản Lý
- Tài Chính - Ngân Hàng
- Biểu Mẫu - Văn Bản
- Giáo Dục - Đào Tạo
- Giáo án - Bài giảng
- Công Nghệ Thông Tin
- Kỹ Thuật - Công Nghệ
- Ngoại Ngữ
- Khoa Học Tự Nhiên
- Y Tế - Sức Khỏe
- Văn Hóa - Nghệ Thuật
- Nông - Lâm - Ngư
- Thể loại khác

Tải bản đầy đủ (.pdf) (679 trang)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.08 MB, 679 trang )

A First Course in

Design and Analysis

of Experiments

A First Course in

Design and Analysis

of Experiments

Gary W. Oehlert

University of Minnesota

Cover design by Victoria Tomaselli

Cover illustration by Peter Hamlin

Minitab is a registered trademark of Minitab, Inc.

SAS is a registered trademark of SAS Institute, Inc.

S-Plus is a registered trademark of Mathsoft, Inc.

Design-Expert is a registered trademark of Stat-Ease, Inc.

Library of Congress Cataloging-in-Publication Data.

Oehlert, Gary W.

A first course in design and analysis of experiments / Gary W. Oehlert.

p. cm.

Includes bibligraphical references and index.

ISBN 0-7167-3510-5

1. Experimental Design

I. Title

QA279.O34 2000

519.5—dc21

99-059934

Copyright c 2010 Gary W. Oehlert. All rights reserved.

This work is licensed under a “Creative Commons” license. Briefly, you are free to

copy, distribute, and transmit this work provided the following conditions are met:

1. You must properly attribute the work.

2. You may not use this work for commercial purposes.

3. You may not alter, transform, or build upon this work.

A complete description of the license may be found at

http://creativecommons.org/licenses/by-nc-nd/3.0/.

For Becky

who helped me all the way through

and for Christie and Erica

who put up with a lot while it was getting done

Contents

Preface

1

2

3

Introduction

1.1

Why Experiment? . . . . . . . .

1.2

Components of an Experiment .

1.3

Terms and Concepts . . . . . . .

1.4

Outline . . . . . . . . . . . . .

1.5

More About Experimental Units

1.6

More About Responses . . . . .

xvii

.

.

.

.

.

.

1

1

4

5

7

8

10

.

.

.

.

.

.

.

.

.

13

14

16

17

19

20

25

26

27

28

Completely Randomized Designs

3.1

Structure of a CRD . . . . . . . . . . . . . . . . . . . . .

3.2

Preliminary Exploratory Analysis . . . . . . . . . . . . .

3.3

Models and Parameters . . . . . . . . . . . . . . . . . . .

31

31

33

34

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Randomization and Design

2.1

Randomization Against Confounding . . . . . . . . . .

2.2

Randomizing Other Things . . . . . . . . . . . . . . . .

2.3

Performing a Randomization . . . . . . . . . . . . . . .

2.4

Randomization for Inference . . . . . . . . . . . . . . .

2.4.1 The paired t-test . . . . . . . . . . . . . . . . .

2.4.2 Two-sample t-test . . . . . . . . . . . . . . . .

2.4.3 Randomization inference and standard inference

2.5

Further Reading and Extensions . . . . . . . . . . . . .

2.6

Problems . . . . . . . . . . . . . . . . . . . . . . . . .

viii

CONTENTS

3.4

3.5

3.6

3.7

3.8

3.9

3.10

3.11

3.12

4

5

Estimating Parameters . . . . . . . . . . . .

Comparing Models: The Analysis of Variance

Mechanics of ANOVA . . . . . . . . . . . .

Why ANOVA Works . . . . . . . . . . . . .

Back to Model Comparison . . . . . . . . . .

Side-by-Side Plots . . . . . . . . . . . . . .

Dose-Response Modeling . . . . . . . . . . .

Further Reading and Extensions . . . . . . .

Problems . . . . . . . . . . . . . . . . . . .

Looking for Specific Differences—Contrasts

4.1

Contrast Basics . . . . . . . . . . . . .

4.2

Inference for Contrasts . . . . . . . . .

4.3

Orthogonal Contrasts . . . . . . . . . .

4.4

Polynomial Contrasts . . . . . . . . . .

4.5

Further Reading and Extensions . . . .

4.6

Problems . . . . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

39

44

45

52

52

54

55

58

60

.

.

.

.

.

.

65

65

68

71

73

75

75

Multiple Comparisons

77

5.1

Error Rates . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.2

Bonferroni-Based Methods . . . . . . . . . . . . . . . . . 81

5.3

The Scheff´e Method for All Contrasts . . . . . . . . . . . 85

5.4

Pairwise Comparisons . . . . . . . . . . . . . . . . . . . . 87

5.4.1 Displaying the results . . . . . . . . . . . . . . . 88

5.4.2 The Studentized range . . . . . . . . . . . . . . . 89

5.4.3 Simultaneous confidence intervals . . . . . . . . . 90

5.4.4 Strong familywise error rate . . . . . . . . . . . . 92

5.4.5 False discovery rate . . . . . . . . . . . . . . . . 96

5.4.6 Experimentwise error rate . . . . . . . . . . . . . 97

5.4.7 Comparisonwise error rate . . . . . . . . . . . . . 98

5.4.8 Pairwise testing reprise . . . . . . . . . . . . . . 98

5.4.9 Pairwise comparisons methods that do not control

combined Type I error rates . . . . . . . . . . . . 98

5.4.10 Confident directions . . . . . . . . . . . . . . . . 100

CONTENTS

5.5

5.6

5.7

5.8

5.9

5.10

6

7

Comparison with Control or the Best

5.5.1 Comparison with a control .

5.5.2 Comparison with the best .

Reality Check on Coverage Rates .

A Warning About Conditioning . . .

Some Controversy . . . . . . . . . .

Further Reading and Extensions . .

Problems . . . . . . . . . . . . . .

ix

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Checking Assumptions

6.1

Assumptions . . . . . . . . . . . . . . . . . .

6.2

Transformations . . . . . . . . . . . . . . . . .

6.3

Assessing Violations of Assumptions . . . . . .

6.3.1 Assessing nonnormality . . . . . . . .

6.3.2 Assessing nonconstant variance . . . .

6.3.3 Assessing dependence . . . . . . . . .

6.4

Fixing Problems . . . . . . . . . . . . . . . . .

6.4.1 Accommodating nonnormality . . . .

6.4.2 Accommodating nonconstant variance

6.4.3 Accommodating dependence . . . . .

6.5

Effects of Incorrect Assumptions . . . . . . . .

6.5.1 Effects of nonnormality . . . . . . . .

6.5.2 Effects of nonconstant variance . . . .

6.5.3 Effects of dependence . . . . . . . . .

6.6

Implications for Design . . . . . . . . . . . . .

6.7

Further Reading and Extensions . . . . . . . .

6.8

Problems . . . . . . . . . . . . . . . . . . . .

Power and Sample Size

7.1

Approaches to Sample Size Selection . .

7.2

Sample Size for Confidence Intervals . . .

7.3

Power and Sample Size for ANOVA . . .

7.4

Power and Sample Size for a Contrast . .

7.5

More about Units and Measurement Units

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

101

101

104

105

106

106

107

108

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

111

111

113

114

115

118

120

124

124

126

133

134

134

136

138

140

141

143

.

.

.

.

.

149

149

151

153

158

158

x

CONTENTS

7.6

7.7

7.8

8

9

Allocation of Units for Two Special Cases . . . . . . . . . 160

Further Reading and Extensions . . . . . . . . . . . . . . 161

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Factorial Treatment Structure

8.1

Factorial Structure . . . . . . . . . . . . . . . . .

8.2

Factorial Analysis: Main Effect and Interaction .

8.3

Advantages of Factorials . . . . . . . . . . . . .

8.4

Visualizing Interaction . . . . . . . . . . . . . .

8.5

Models with Parameters . . . . . . . . . . . . . .

8.6

The Analysis of Variance for Balanced Factorials

8.7

General Factorial Models . . . . . . . . . . . . .

8.8

Assumptions and Transformations . . . . . . . .

8.9

Single Replicates . . . . . . . . . . . . . . . . .

8.10 Pooling Terms into Error . . . . . . . . . . . . .

8.11 Hierarchy . . . . . . . . . . . . . . . . . . . . .

8.12 Problems . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

165

165

167

170

171

175

179

182

185

186

191

192

197

A Closer Look at Factorial Data

9.1

Contrasts for Factorial Data . . . . . . . . . . . . . . . .

9.2

Modeling Interaction . . . . . . . . . . . . . . . . . . .

9.2.1 Interaction plots . . . . . . . . . . . . . . . . .

9.2.2 One-cell interaction . . . . . . . . . . . . . . .

9.2.3 Quantitative factors . . . . . . . . . . . . . . .

9.2.4 Tukey one-degree-of-freedom for nonadditivity .

9.3

Further Reading and Extensions . . . . . . . . . . . . .

9.4

Problems . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

203

203

209

209

210

212

217

220

222

.

.

.

.

.

.

225

225

226

227

230

233

234

10 Further Topics in Factorials

10.1 Unbalanced Data . . . . . . . . . . . . . .

10.1.1 Sums of squares in unbalanced data

10.1.2 Building models . . . . . . . . . .

10.1.3 Testing hypotheses . . . . . . . . .

10.1.4 Empty cells . . . . . . . . . . . . .

10.2 Multiple Comparisons . . . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

CONTENTS

10.3

10.4

10.5

10.6

Power and Sample Size . . . . .

Two-Series Factorials . . . . . .

10.4.1 Contrasts . . . . . . . .

10.4.2 Single replicates . . . .

Further Reading and Extensions

Problems . . . . . . . . . . . .

xi

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

11 Random Effects

11.1 Models for Random Effects . . . . . . . . . . .

11.2 Why Use Random Effects? . . . . . . . . . . .

11.3 ANOVA for Random Effects . . . . . . . . . .

11.4 Approximate Tests . . . . . . . . . . . . . . .

11.5 Point Estimates of Variance Components . . . .

11.6 Confidence Intervals for Variance Components

11.7 Assumptions . . . . . . . . . . . . . . . . . .

11.8 Power . . . . . . . . . . . . . . . . . . . . . .

11.9 Further Reading and Extensions . . . . . . . .

11.10 Problems . . . . . . . . . . . . . . . . . . . .

12 Nesting, Mixed Effects, and Expected Mean Squares

12.1 Nesting Versus Crossing . . . . . . . . . . . .

12.2 Why Nesting? . . . . . . . . . . . . . . . . . .

12.3 Crossed and Nested Factors . . . . . . . . . . .

12.4 Mixed Effects . . . . . . . . . . . . . . . . . .

12.5 Choosing a Model . . . . . . . . . . . . . . . .

12.6 Hasse Diagrams and Expected Mean Squares .

12.6.1 Test denominators . . . . . . . . . . .

12.6.2 Expected mean squares . . . . . . . .

12.6.3 Constructing a Hasse diagram . . . . .

12.7 Variances of Means and Contrasts . . . . . . .

12.8 Unbalanced Data and Random Effects . . . . .

12.9 Staggered Nested Designs . . . . . . . . . . .

12.10 Problems . . . . . . . . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

235

236

237

240

244

245

.

.

.

.

.

.

.

.

.

.

253

253

256

257

260

264

267

271

272

274

275

.

.

.

.

.

.

.

.

.

.

.

.

.

279

279

283

283

285

288

289

290

293

296

298

304

306

307

xii

CONTENTS

13 Complete Block Designs

315

13.1

Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

13.2

The Randomized Complete Block Design . . . . . . . . . 316

13.2.1 Why and when to use the RCB . . . . . . . . . . 318

13.2.2 Analysis for the RCB . . . . . . . . . . . . . . . 319

13.2.3 How well did the blocking work? . . . . . . . . . 322

13.2.4 Balance and missing data . . . . . . . . . . . . . 324

13.3

Latin Squares and Related Row/Column Designs . . . . . 324

13.3.1 The crossover design . . . . . . . . . . . . . . . . 326

13.3.2 Randomizing the LS design . . . . . . . . . . . . 327

13.3.3 Analysis for the LS design . . . . . . . . . . . . . 327

13.3.4 Replicating Latin Squares . . . . . . . . . . . . . 330

13.3.5 Efficiency of Latin Squares . . . . . . . . . . . . 335

13.3.6 Designs balanced for residual effects . . . . . . . 338

13.4

Graeco-Latin Squares . . . . . . . . . . . . . . . . . . . . 343

13.5

Further Reading and Extensions . . . . . . . . . . . . . . 344

13.6

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 345

14 Incomplete Block Designs

14.1

357

Balanced Incomplete Block Designs . . . . . . . . . . . . 358

14.1.1 Intrablock analysis of the BIBD . . . . . . . . . . 360

14.1.2 Interblock information . . . . . . . . . . . . . . . 364

14.2

Row and Column Incomplete Blocks . . . . . . . . . . . . 368

14.3

Partially Balanced Incomplete Blocks . . . . . . . . . . . 370

14.4

Cyclic Designs . . . . . . . . . . . . . . . . . . . . . . . 372

14.5

Square, Cubic, and Rectangular Lattices . . . . . . . . . . 374

14.6

Alpha Designs . . . . . . . . . . . . . . . . . . . . . . . . 376

14.7

Further Reading and Extensions . . . . . . . . . . . . . . 378

14.8

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 379

CONTENTS

15 Factorials in Incomplete Blocks—Confounding

15.1

xiii

387

Confounding the Two-Series Factorial . . . . . . . . . . . 388

15.1.1 Two blocks . . . . . . . . . . . . . . . . . . . . . 389

15.1.2 Four or more blocks . . . . . . . . . . . . . . . . 392

15.1.3 Analysis of an unreplicated confounded two-series 397

15.1.4 Replicating a confounded two-series . . . . . . . 399

15.1.5 Double confounding . . . . . . . . . . . . . . . . 402

15.2

Confounding the Three-Series Factorial . . . . . . . . . . 403

15.2.1 Building the design . . . . . . . . . . . . . . . . 404

15.2.2 Confounded effects . . . . . . . . . . . . . . . . 407

15.2.3 Analysis of confounded three-series . . . . . . . . 408

15.3

Further Reading and Extensions . . . . . . . . . . . . . . 409

15.4

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 410

16 Split-Plot Designs

417

16.1

What Is a Split Plot? . . . . . . . . . . . . . . . . . . . . 417

16.2

Fancier Split Plots . . . . . . . . . . . . . . . . . . . . . . 419

16.3

Analysis of a Split Plot . . . . . . . . . . . . . . . . . . . 420

16.4

Split-Split Plots . . . . . . . . . . . . . . . . . . . . . . . 428

16.5

Other Generalizations of Split Plots . . . . . . . . . . . . 434

16.6

Repeated Measures . . . . . . . . . . . . . . . . . . . . . 438

16.7

Crossover Designs . . . . . . . . . . . . . . . . . . . . . 441

16.8

Further Reading and Extensions . . . . . . . . . . . . . . 441

16.9

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 442

17 Designs with Covariates

453

17.1

The Basic Covariate Model . . . . . . . . . . . . . . . . . 454

17.2

When Treatments Change Covariates . . . . . . . . . . . . 460

17.3

Other Covariate Models . . . . . . . . . . . . . . . . . . . 462

17.4

Further Reading and Extensions . . . . . . . . . . . . . . 466

17.5

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 466

xiv

CONTENTS

18 Fractional Factorials

471

18.1

Why Fraction? . . . . . . . . . . . . . . . . . . . . . . . . 471

18.2

Fractioning the Two-Series . . . . . . . . . . . . . . . . . 472

18.3

Analyzing a 2k−q . . . . . . . . . . . . . . . . . . . . . . 479

18.4

Resolution and Projection . . . . . . . . . . . . . . . . . . 482

18.5

Confounding a Fractional Factorial . . . . . . . . . . . . . 485

18.6

De-aliasing . . . . . . . . . . . . . . . . . . . . . . . . . 485

18.7

Fold-Over . . . . . . . . . . . . . . . . . . . . . . . . . . 487

18.8

Sequences of Fractions . . . . . . . . . . . . . . . . . . . 489

18.9

Fractioning the Three-Series . . . . . . . . . . . . . . . . 489

18.10 Problems with Fractional Factorials . . . . . . . . . . . . 492

18.11 Using Fractional Factorials in Off-Line Quality Control . . 493

18.11.1 Designing an off-line quality experiment . . . . . 494

18.11.2 Analysis of off-line quality experiments . . . . . . 495

18.12 Further Reading and Extensions . . . . . . . . . . . . . . 498

18.13 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 499

19 Response Surface Designs

509

19.1

Visualizing the Response . . . . . . . . . . . . . . . . . . 509

19.2

First-Order Models . . . . . . . . . . . . . . . . . . . . . 511

19.3

First-Order Designs . . . . . . . . . . . . . . . . . . . . . 512

19.4

Analyzing First-Order Data . . . . . . . . . . . . . . . . . 514

19.5

Second-Order Models . . . . . . . . . . . . . . . . . . . . 517

19.6

Second-Order Designs . . . . . . . . . . . . . . . . . . . 522

19.7

Second-Order Analysis . . . . . . . . . . . . . . . . . . . 526

19.8

Mixture Experiments . . . . . . . . . . . . . . . . . . . . 529

19.8.1 Designs for mixtures . . . . . . . . . . . . . . . . 530

19.8.2 Models for mixture designs . . . . . . . . . . . . 533

19.9

Further Reading and Extensions . . . . . . . . . . . . . . 535

19.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 536

CONTENTS

xv

20 On Your Own

543

20.1 Experimental Context . . . . . . . . . . . . . . . . . . . . 543

20.2 Experiments by the Numbers . . . . . . . . . . . . . . . . 544

20.3 Final Project . . . . . . . . . . . . . . . . . . . . . . . . . 548

Bibliography

A Linear Models for Fixed Effects

A.1

Models . . . . . . . . . .

A.2

Least Squares . . . . . . .

A.3

Comparison of Models . .

A.4

Projections . . . . . . . .

A.5

Random Variation . . . . .

A.6

Estimable Functions . . . .

A.7

Contrasts . . . . . . . . .

A.8

The Scheff´e Method . . . .

A.9

Problems . . . . . . . . .

549

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

563

563

566

568

570

572

576

578

579

580

B Notation

583

C Experimental Design Plans

C.1

Latin Squares . . . . . . . . . . . . . . . . . .

C.1.1 Standard Latin Squares . . . . . . . .

C.1.2 Orthogonal Latin Squares . . . . . . .

C.2

Balanced Incomplete Block Designs . . . . . .

C.3

Efficient Cyclic Designs . . . . . . . . . . . .

C.4

Alpha Designs . . . . . . . . . . . . . . . . . .

C.5

Two-Series Confounding and Fractioning Plans

607

607

607

608

609

615

616

617

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

D Tables

621

Index

647

Preface

Preface

This text covers the basic topics in experimental design and analysis and

is intended for graduate students and advanced undergraduates. Students

should have had an introductory statistical methods course at about the level

of Moore and McCabe’s Introduction to the Practice of Statistics (Moore and

McCabe 1999) and be familiar with t-tests, p-values, confidence intervals,

and the basics of regression and ANOVA. Most of the text soft-pedals theory

and mathematics, but Chapter 19 on response surfaces is a little tougher sledding (eigenvectors and eigenvalues creep in through canonical analysis), and

Appendix A is an introduction to the theory of linear models. I use the text

in a service course for non-statisticians and in a course for first-year Masters

students in statistics. The non-statisticians come from departments scattered

all around the university including agronomy, ecology, educational psychology, engineering, food science, pharmacy, sociology, and wildlife.

I wrote this book for the same reason that many textbooks get written:

there was no existing book that did things the way I thought was best. I start

with single-factor, fixed-effects, completely randomized designs and cover

them thoroughly, including analysis, checking assumptions, and power. I

then add factorial treatment structure and random effects to the mix. At this

stage, we have a single randomization scheme, a lot of different models for

data, and essentially all the analysis techniques we need. I next add blocking designs for reducing variability, covering complete blocks, incomplete

blocks, and confounding in factorials. After this I introduce split plots, which

can be considered incomplete block designs but really introduce the broader

subject of unit structures. Covariate models round out the discussion of variance reduction. I finish with special treatment structures, including fractional

factorials and response surface/mixture designs.

This outline is similar in content to a dozen other design texts; how is this

book different?

• I include many exercises where the student is required to choose an

appropriate experimental design for a given situation, or recognize the

design that was used. Many of the designs in question are from earlier

chapters, not the chapter where the question is given. These are important skills that often receive short shrift. See examples on pages 500

and 502.

xvii

xviii

Preface

• I use Hasse diagrams to illustrate models, find test denominators, and

compute expected mean squares. I feel that the diagrams provide a

much easier and more understandable approach to these problems than

the classic approach with tables of subscripts and live and dead indices.

I believe that Hasse diagrams should see wider application.

• I spend time trying to sort out the issues with multiple comparisons

procedures. These confuse many students, and most texts seem to just

present a laundry list of methods and no guidance.

• I try to get students to look beyond saying main effects and/or interactions are significant and to understand the relationships in the data. I

want them to learn that understanding what the data have to say is the

goal. ANOVA is a tool we use at the beginning of an analysis; it is not

the end.

• I describe the difference in philosophy between hierarchical model

building and parameter testing in factorials, and discuss how this becomes crucial for unbalanced data. This is important because the different philosophies can lead to different conclusions, and many texts

avoid the issue entirely.

• There are three kinds of “problems” in this text, which I have denoted

exercises, problems, and questions. Exercises are intended to be simpler than problems, with exercises being more drill on mechanics and

problems being more integrative. Not everyone will agree with my

classification. Questions are not necessarily more difficult than problems, but they cover more theoretical or mathematical material.

Data files for the examples and problems can be downloaded from the

Freeman web site at http://www.whfreeman.com/. A second resource is Appendix B, which documents the notation used in the text.

This text contains many formulae, but I try to use formulae only when I

think that they will increase a reader’s understanding of the ideas. In several

settings where closed-form expressions for sums of squares or estimates exist, I do not present them because I do not believe that they help (for example,

the Analysis of Covariance). Similarly, presentations of normal equations do

not appear. Instead, I approach ANOVA as a comparison of models fit by

least squares, and let the computing software take care of the details of fitting. Future statisticians will need to learn the process in more detail, and

Appendix A gets them started with the theory behind fixed effects.

Speaking of computing, examples in this text use one of four packages:

MacAnova, Minitab, SAS, and S-Plus. MacAnova is a homegrown package

that we use here at Minnesota because we can distribute it freely; it runs

Preface

xix

on Macintosh, Windows, and Unix; and it does everything we need. You can

download MacAnova (any version and documentation, even the source) from

http://www.stat.umn.edu/˜gary/macanova. Minitab and SAS

are widely used commercial packages. I hadn’t used Minitab in twelve years

when I started using it for examples; I found it incredibly easy to use. The

menu/dialog/spreadsheet interface was very intuitive. In fact, I only opened

the manual once, and that was when I was trying to figure out how to do

general contrasts (which I was never able to figure out). SAS is far and away

the market leader in statistical software. You can do practically every kind of

analysis in SAS, but as a novice I spent many hours with the manuals trying

to get SAS to do any kind of analysis. In summary, many people swear by

SAS, but I found I mostly swore at SAS. I use S-Plus extensively in research;

here I’ve just used it for a couple of graphics.

I need to acknowledge many people who helped me get this job done.

First are the students and TA’s in the courses where I used preliminary versions. Many of you made suggestions and pointed out mistakes; in particular

I thank John Corbett, Alexandre Varbanov, and Jorge de la Vega Gongora.

Many others of you contributed data; your footprints are scattered throughout

the examples and exercises. Next I have benefited from helpful discussions

with my colleagues here in Minnesota, particularly Kit Bingham, Kathryn

Chaloner, Sandy Weisberg, and Frank Martin. I thank Sharon Lohr for introducing me to Hasse diagrams, and I received much helpful criticism from

reviewers, including Larry Ringer (Texas A&M), Morris Southward (New

Mexico State), Robert Price (East Tennessee State), Andrew Schaffner (Cal

Poly—San Luis Obispo), Hiroshi Yamauchi (Hawaii—Manoa), and William

Notz (Ohio State). My editor Patrick Farace and others at Freeman were a

great help. Finally, I thank my family and parents, who supported me in this

for years (even if my father did say it looked like a foreign language!).

They say you should never let the camel’s nose into the tent, because

once the nose is in, there’s no stopping the rest of the camel. In a similar

vein, student requests for copies of lecture notes lead to student requests for

typed lecture notes, which lead to student requests for more complete typed

lecture notes, which lead . . . well, in my case it leads to a textbook on design and analysis of experiments, which you are reading now. Over the years

my students have preferred various more primitive incarnations of this text to

other texts; I hope you find this text worthwhile too.

Gary W. Oehlert

Chapter 1

Introduction

Researchers use experiments to answer questions. Typical questions might

be:

• Is a drug a safe, effective cure for a disease? This could be a test of

how AZT affects the progress of AIDS.

• Which combination of protein and carbohydrate sources provides the

best nutrition for growing lambs?

• How will long-distance telephone usage change if our company offers

a different rate structure to our customers?

• Will an ice cream manufactured with a new kind of stabilizer be as

palatable as our current ice cream?

• Does short-term incarceration of spouse abusers deter future assaults?

• Under what conditions should I operate my chemical refinery, given

this month’s grade of raw material?

This book is meant to help decision makers and researchers design good

experiments, analyze them properly, and answer their questions.

1.1 Why Experiment?

Consider the spousal assault example mentioned above. Justice officials need

to know how they can reduce or delay the recurrence of spousal assault. They

are investigating three different actions in response to spousal assaults. The

Experiments

answer questions

2

Treatments,

experimental

units, and

responses

Introduction

assailant could be warned, sent to counseling but not booked on charges,

or arrested for assault. Which of these actions works best? How can they

compare the effects of the three actions?

This book deals with comparative experiments. We wish to compare

some treatments. For the spousal assault example, the treatments are the three

actions by the police. We compare treatments by using them and comparing

the outcomes. Specifically, we apply the treatments to experimental units

and then measure one or more responses. In our example, individuals who

assault their spouses could be the experimental units, and the response could

be the length of time until recurrence of assault. We compare treatments by

comparing the responses obtained from the experimental units in the different

treatment groups. This could tell us if there are any differences in responses

between the treatments, what the estimated sizes of those differences are,

which treatment has the greatest estimated delay until recurrence, and so on.

An experiment is characterized by the treatments and experimental units to

be used, the way treatments are assigned to units, and the responses that are

measured.

Advantages of

experiments

Experiments help us answer questions, but there are also nonexperimental techniques. What is so special about experiments? Consider that:

1. Experiments allow us to set up a direct comparison between the treatments of interest.

2. We can design experiments to minimize any bias in the comparison.

3. We can design experiments so that the error in the comparison is small.

4. Most important, we are in control of experiments, and having that control allows us to make stronger inferences about the nature of differences that we see in the experiment. Specifically, we may make inferences about causation.

Control versus

observation

This last point distinguishes an experiment from an observational study. An

observational study also has treatments, units, and responses. However, in

the observational study we merely observe which units are in which treatment

groups; we don’t get to control that assignment.

Example 1.1

Does spanking hurt?

Let’s contrast an experiment with an observational study described in Straus,

Sugarman, and Giles-Sims (1997). A large survey of women aged 14 to 21

years was begun in 1979; by 1988 these same women had 1239 children

1.1 Why Experiment?

3

between the ages of 6 and 9 years. The women and children were interviewed and tested in 1988 and again in 1990. Two of the items measured

were the level of antisocial behavior in the children and the frequency of

spanking. Results showed that children who were spanked more frequently

in 1988 showed larger increases in antisocial behavior in 1990 than those who

were spanked less frequently. Does spanking cause antisocial behavior? Perhaps it does, but there are other possible explanations. Perhaps children who

were becoming more troublesome in 1988 may have been spanked more frequently, while children who were becoming less troublesome may have been

spanked less frequently in 1988.

The drawback of observational studies is that the grouping into “treatments” is not under the control of the experimenter and its mechanism is

usually unknown. Thus observed differences in responses between treatment

groups could very well be due to these other hidden mechanisms, rather than

the treatments themselves.

It is important to say that while experiments have some advantages, observational studies are also useful and can produce important results. For example, studies of smoking and human health are observational, but the link

that they have established is one of the most important public health issues

today. Similarly, observational studies established an association between

heart valve disease and the diet drug fen-phen that led to the withdrawal

of the drugs fenfluramine and dexfenfluramine from the market (Connolloy

et al. 1997 and US FDA 1997).

Mosteller and Tukey (1977) list three concepts associated with causation

and state that two or three are needed to support a causal relationship:

Observational

studies are useful

too

Causal

relationships

• Consistency

• Responsiveness

• Mechanism.

Consistency means that, all other things being equal, the relationship between two variables is consistent across populations in direction and maybe

in amount. Responsiveness means that we can go into a system, change the

causal variable, and watch the response variable change accordingly. Mechanism means that we have a step-by-step mechanism leading from cause to

effect.

In an experiment, we are in control, so we can achieve responsiveness.

Thus, if we see a consistent difference in observed response between the

various treatments, we can infer that the treatments caused the differences

in response. We don’t need to know the mechanism—we can demonstrate

Experiments can

demonstrate

consistency and

responsiveness

4

Ethics constrain

experimentation

Introduction

causation by experiment. (This is not to say that we shouldn’t try to learn

mechanisms—we should. It’s just that we don’t need mechanism to infer

causation.)

We should note that there are times when experiments are not feasible,

even when the knowledge gained would be extremely valuable. For example,

we can’t perform an experiment proving once and for all that smoking causes

cancer in humans. We can observe that smoking is associated with cancer in

humans; we have mechanisms for this and can thus infer causation. But we

cannot demonstrate responsiveness, since that would involve making some

people smoke, and making others not smoke. It is simply unethical.

1.2 Components of an Experiment

An experiment has treatments, experimental units, responses, and a method

to assign treatments to units.

Treatments, units, and assignment method specify the experimental design.

Analysis not part

of design, but

consider it during

planning

Some authors make a distinction between the selection of treatments to be

used, called “treatment design,” and the selection of units and assignment of

treatments, called “experiment design.”

Note that there is no mention of a method for analyzing the results.

Strictly speaking, the analysis is not part of the design, though a wise experimenter will consider the analysis when planning an experiment. Whereas

the design determines the proper analysis to a great extent, we will see that

two experiments with similar designs may be analyzed differently, and two

experiments with different designs may be analyzed similarly. Proper analysis depends on the design and the kinds of statistical model assumptions we

believe are correct and are willing to assume.

Not all experimental designs are created equal. A good experimental

design must

• Avoid systematic error

• Be precise

• Allow estimation of error

• Have broad validity.

We consider these in turn.

1.3 Terms and Concepts

Comparative experiments estimate differences in response between treatments. If our experiment has systematic error, then our comparisons will be

biased, no matter how precise our measurements are or how many experimental units we use. For example, if responses for units receiving treatment

one are measured with instrument A, and responses for treatment two are

measured with instrument B, then we don’t know if any observed differences

are due to treatment effects or instrument miscalibrations. Randomization, as

will be discussed in Chapter 2, is our main tool to combat systematic error.

Even without systematic error, there will be random error in the responses,

and this will lead to random error in the treatment comparisons. Experiments

are precise when this random error in treatment comparisons is small. Precision depends on the size of the random errors in the responses, the number of

units used, and the experimental design used. Several chapters of this book

deal with designs to improve precision.

Experiments must be designed so that we have an estimate of the size

of random error. This permits statistical inference: for example, confidence

intervals or tests of significance. We cannot do inference without an estimate

of error. Sadly, experiments that cannot estimate error continue to be run.

The conclusions we draw from an experiment are applicable to the experimental units we used in the experiment. If the units are actually a statistical

sample from some population of units, then the conclusions are also valid

for the population. Beyond this, we are extrapolating, and the extrapolation

might or might not be successful. For example, suppose we compare two

different drugs for treating attention deficit disorder. Our subjects are preadolescent boys from our clinic. We might have a fair case that our results

would hold for preadolescent boys elsewhere, but even that might not be true

if our clinic’s population of subjects is unusual in some way. The results are

even less compelling for older boys or for girls. Thus if we wish to have

wide validity—for example, broad age range and both genders—then our experimental units should reflect the population about which we wish to draw

inference.

We need to realize that some compromise will probably be needed between these goals. For example, broadening the scope of validity by using a

variety of experimental units may decrease the precision of the responses.

1.3 Terms and Concepts

Let’s define some of the important terms and concepts in design of experiments. We have already seen the terms treatment, experimental unit, and

response, but we define them again here for completeness.

5

Design to avoid

systematic error

Design to

increase

precision

Design to

estimate error

Design to widen

validity

Compromise

often needed