0% found this document useful (0 votes)
147 views209 pages

Probability and Statistics For Engineers

The document is a comprehensive textbook on Probability and Statistics for Engineers, authored by Benjamin Odoi, Abdulzeid Yen Anafo, and Seth Antanah. It covers a wide range of topics including R programming, probability theory, statistical methods, random variables, and various distributions. The content is structured into chapters with detailed sections, learning objectives, and discussion topics to facilitate understanding of the subject matter.

Uploaded by

lordoboh18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views209 pages

Probability and Statistics For Engineers

The document is a comprehensive textbook on Probability and Statistics for Engineers, authored by Benjamin Odoi, Abdulzeid Yen Anafo, and Seth Antanah. It covers a wide range of topics including R programming, probability theory, statistical methods, random variables, and various distributions. The content is structured into chapters with detailed sections, learning objectives, and discussion topics to facilitate understanding of the subject matter.

Uploaded by

lordoboh18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 209

Probability and Statistics

for Engineers
First Edition

Benjamin Odoi, Abdulzeid Yen Anafo and

Seth Antanah
Contents

1 An Introduction to R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Introduction 1

1.2 Downloading and Installing R 1

1.2.1 Installing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 Installing and Loading Add-on Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Basic R Operations and Concepts 3

1.3.1 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.2 Assignment, Object names, and Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.4 Functions and Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Getting Help 8

1.5 External Resources 9


Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Learning Objectives 11

2.2 Introduction 11

2.2.1 Determination of Probability of an Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.2 Probability of Compound Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.3 Multiplication Rule for P (A ∩ B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.4 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2.5 Some Rules of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.6 Application of Counting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.7 Permutation of Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.8 Definition: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3 Questions 32

2.4 Discussion Topics 39

3 Introduction to Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.1 Learning objectives 41

3.2 Introduction to Statistics 41

3.3 Why Statistics ? 42

3.4 Branches of Statistics 42

3.4.1 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.2 Inferential statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43


3.5 Variables 43

3.5.1 Qualitative vs. Quantitative Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.2 Discrete vs. Continuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6 Univariate vs. Bivariate Data 44

3.6.1 Univariate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6.2 Bivariate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6.3 Populations and Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.6.4 Population vs.Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.7 Summarizing data graphically 45

3.8 Summary Statistics 46

3.8.1 Measure of Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.8.2 Measure of Spread/ Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.8.3 How to Describe Data Patterns in Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.9 Unusual Features 46

3.9.1 Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.9.2 Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.9.3 How to Compare Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.9.4 Four Ways to Describe Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.10 Sampling Procedures 48

3.10.1 Simple random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.10.2 Systematic random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.10.3 Stratified Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.10.4 Cluster sampling (also called block sampling) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.11 Levels of Measurement (Types of Data) 54


3.12 Frequency Distribution 56

3.13 Measure of Location and Dispersion 61

3.13.1 Measures of Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.13.2 Relation between Measure of Location and Types of Frequency Curves. . . . . . . . . 66

3.13.3 Measures of Dispersion, Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.13.4 Variance and Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.13.5 Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.13.6 Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.13.7 Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.13.8 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Random Variables and Distribution . . . . . . . . . . . . . . . . . . . . . . . . 77

4.1 Introduction 77

4.2 Random Variable 77

4.2.1 Types of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.2.2 Discrete Probability Distribution Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.2.3 Continuous Probability Distribution Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.2.4 Probability Density Function (PDF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.2.5 Simulated Sampling Distributions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.2.6 Post Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.3 Discussion Topic 100

5 Special Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.1 Introduction 101

5.1.1 Discrete Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101


5.1.2 Post-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.1.3 Continuous Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.1.4 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.2 The Gamma Distribution 136

5.2.1 General Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5.3 Exponential Distribution 140

5.3.1 Mathematical Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

5.3.2 Post-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6 Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.0.2 Properties of a Point Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.0.3 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.0.4 Confidence Interval For A Population Proportion . . . . . . . . . . . . . . . . . . . . . . . . . 153

6.0.5 Confidence Interval For The Difference Between Two Population Proportions . . . 157

6.0.6 Confidence Intervals for Unknown Means in R . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6.0.7 Implementation in R for Confidence Intervals for Proportions . . . . . . . . . . . . . . . 160

7 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7.1 Tests of Hypotheses and Significance 163

7.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7.1.2 A Single Population Mean µ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

7.1.3 Tests on the Mean of a Normal Distribution: Variance Unknown . . . . . . . . . . . . 169

7.1.4 Tests on a Population Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

7.1.5 7.5 The Difference Between two Population Means . . . . . . . . . . . . . . . . . . . . . . . 173


I

7.2 Questions 179

8 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

8.1 Regression and Correlation Analysis 182

8.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

8.1.2 The Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

8.2 Method of Least Squares 186

8.3 Correlation Analysis 191

8.4 Questions 194


Preface

Even when flawless, the process of synthesis that all ‘data’ goes through before the
communication step entails by its very nature reshaping and loss of information. This
book is designed to cater to the needs of those who want to delve into the practical
aspects of statistics without delving deeply into the theoretical underpinnings of
the subject. It serves as a handy reference guide for common statistical techniques
frequently employed in fields like business, demography, and health

The book is divided into eight main sections:

1. Introduction to statistics
2. Introduction to probability
3. Random variables and distributions
4. Special distribution
5. Estimations
6. Hypothesis testing
7. Regression

The initial section comprises a single chapter that introduces the rationale behind
IV

studying statistics and establishes fundamental definitions essential for the course.
Sections two through four are subdivided into chapters, each dedicated to elucidating
a particular concept or technique that complements the overarching theme of the
respective section. To illustrate, the section on descriptive statistics is further divided
into two parts: one that delves into graphical methods for summarizing data and
another that explores numerical data summaries. These sections employ real-world
examples to elucidate the techniques, and readers can reinforce their understanding
through practice problems conveniently embedded within the chapters.
1. An Introduction to R

1.1 Introduction

Having worked through this chapter the student will be able to:

• Downloading and Installing R


• Communicating with R
• Basic R Operations and Concepts
• Assignment, Object names, and Data types
• Getting Help

1.2 Downloading and Installing R

The instructions for obtaining R largely depend on the user’s hardware and operating
system. The R Project has written an R Installation and Administration manual with
complete, precise instructions about what to do, together with all sorts of additional
information. The following is just a primer to get a person started
2 Chapter 1. An Introduction to R

1.2.1 Installing R

Visit one of the links below to download the latest version of R for your operating
system: Microsoft Windows:

1. http://cran.r-project.org/bin/windows/base/
2. MacOS: http://cran.r-project.org/bin/macosx/
3. Linux: http://cran.r-project.org/bin/linux/

1.2.2 Installing and Loading Add-on Packages

There are base packages (which come with R automatically), and contributed packages
(which must be downloaded for installation). For example, on the version of R being
used for this document the default base packages loaded at startup are

> getOption ( " d e f a u l t P a c k a g e s " )


[ 1 ] " d a t a s e t s " " u t i l s " " g r D e v i c e s " " g r a p h i c s " " s t a t s " " methods "

The base packages are maintained by a select group of volunteers, called “R Core”. In
addition to the base packages, there are literally thousands of additional contributed
packages written by individuals all over the world. These are stored worldwide on
mirrors of the Comprehensive R Archive Network, or CRAN for short. Given an
active Internet connection, anybody is free to download and install these packages
and even inspect the source code. To install a package named foo, open up R and
type install.packages("foo"). To install foo and additionally install all of the other
packages on which foo depends, instead type install.packages("foo", depends =
TRUE). The general command install.packages() will (on most operating systems)
open a window containing a huge list of available packages; simply choose one or
more to install. No matter how many packages are installed onto the system, each
one must first be loaded for use with the library function. For instance, the foreign
package [18] contains all sorts of functions needed to import data sets into R from other
1.3 Basic R Operations and Concepts 3

software such as SPSS, SAS, etc.. But none of those functions will be available until the
command library(foreign) is issued. Type library() at the command prompt (described
below) to see a list of all available packages in your library. For complete, precise
information regarding installation of R and add-on packages, see the R Installation
and Administration manual, http://cran.r-project.org/manuals.html.

1.3 Basic R Operations and Concepts

The R developers have written an introductory document entitled “An Introduction


to R”. There is a sample session included which shows what basic interaction with
R looks like. I recommend that all new users of R read that document, but bear in
mind that there are concepts mentioned which will be unfamiliar to the beginner.
Below are some of the most basic operations that can be done with R. Almost every
book about R begins with a section like the one below; look around to see all sorts of
things that can be done at this most basic level.

1.3.1 Arithmetic

# Addition
a <− 5
b <− 3
r e s u l t <− a + b
print ( r e s u l t )

# Subtraction
a <− 5
b <− 3
r e s u l t <− a − b
print ( r e s u l t )
4 Chapter 1. An Introduction to R

# Multiplication
a <− 5
b <− 3
r e s u l t <− a ∗ b
print ( r e s u l t )

# Division
a <− 6
b <− 2
r e s u l t <− a / b
print ( r e s u l t )

Notice the comment character /. Anything typed after a / symbol is ignored by R.

> options ( d i g i t s = 1 6)
> 10/3 # s e e more d i g i t s
[ 1 ] 3.333333333333333
> sqrt ( 2 ) # s q u a r e r o o t
[ 1 ] 1.414213562373095
> exp ( 1 ) # Euler ’ s c o n s t a n t , e
[ 1 ] 2.718281828459045
> pi
[ 1 ] 3.141592653589793
> options ( d i g i t s = 7 ) # back t o d e f a u l t

Note that it is possible to set digits up to 22, but setting them over 16 is not
recommended (the extra significant digits are not necessarily reliable). Above notice
the sqrt function for square roots and the exp function for powers of e, Euler’s number.
1.3 Basic R Operations and Concepts 5

1.3.2 Assignment, Object names, and Data types

It is often convenient to assign numbers and values to variables (objects) to be used


later. The = proper way to assign values to a variable is with the < − operator (with
a space on either side). The = symbol works too, but it is recommended by the
R masters to reserve = for specifying arguments to functions (discussed later). In
this book we will follow their advice and use < − for assignment. Once a variable is
assigned, its value can be printed by simply entering the variable name by itself.

> x <− 7∗41/ p i # don ’ t s e e t h e c a l c u l a t e d v a l u e


> x # take a look
[ 1 ] 91.35494

When choosing a variable name you can use letters, numbers, dots ı.ȷ, or underscore
ıȷ characters. You cannot use mathematical operators, and a leading dot may not be
followed by a number. Examples of valid names are: x, x1, y.value, and yhat . (More
precisely, the set of allowable characters in object names depends on one’s particular
system and locale; see An Introduction to R for more discussion on this.) Objects can
be of many types, modes, and classes. At this level, it is not necessary to investigate
all of the intricacies of the respective types, but there are some with which you need
to become familiar:

> sqrt ( −1) # i s n ’ t d e f i n e d


[ 1 ] NaN
> sqrt(−1+0 i ) # i s d e f i n e d
[ 1 ] 0+1 i
> sqrt ( as . complex( −1)) # same t h i n g
[ 1 ] 0+1 i
> ( 0 + 1 i )^2 # s h o u l d be −1
6 Chapter 1. An Introduction to R

[ 1 ] −1+0 i
> typeof ( ( 0 + 1 i ) ^ 2 )
[ 1 ] " complex "

1.3.3 Vectors

All of this time we have been manipulating vectors of length 1. Now let us move to
vectors with multiple entries.

R Entering data vectors; If you would like to enter the data 74, 31, 95, 61, 76, 34, 23, 54, 96
into R, you may create a data vector with the c function (which is short for
concatenate).

> x <− c ( 7 4 , 3 1 , 9 5 , 6 1 , 7 6 , 3 4 , 2 3 , 5 4 , 9 6 )

> x

[ 1 ] 74 31 95 61 76 34 23 54 96

> seq ( from = 1 , t o = 5 )

[1] 1 2 3 4 5

> seq ( from = 2 , by = −0.1 , length . out = 4 )

[ 1 ] 2.0 1.9 1.8 1.7

R Indexing data vectors: Sometimes we do not want the whole


vector, but just a piece of it. We can access the intermediate parts
with the [] operator. Observe (with x defined above)

> x[1]

[ 1 ] 74

> x[2:4]

[ 1 ] 31 95 61
1.3 Basic R Operations and Concepts 7

> x [ c (1 , 3 , 4 , 8)]

[ 1 ] 74 95 61 54

> x[−c ( 1 , 3 , 4 , 8 ) ]

[ 1 ] 31 76 34 23 96

#N o t i c e t h a t we used t h e minus s i g n t o s p e c i f y t h o s e e l e m e n t s t h a t we

> LETTERS [ 1 : 5 ]

[ 1 ] "A" "B" "C" "D" "E"

> l e t t e r s [ −(6:24)]

[ 1 ] "a" "b" " c " "d" " e " "y" " z "

1.3.4 Functions and Expressions

A function takes arguments as input and returns an object as output. There


are functions to do all sorts of things. We show some examples below.

> x <− 1 : 5

> sum( x )

[ 1 ] 15

> length ( x )

[1] 5

> min( x )

[1] 1

> mean( x ) # sample mean

[1] 3

> sd ( x ) # sample s t a n d a r d d e v i a t i o n

[ 1 ] 1.581139

> intersect

function ( x , y )
8 Chapter 1. An Introduction to R

y <− as . vector ( y )

unique ( y [ match( as . vector ( x ) , y , 0L ) ] )

<environment : namespace : base>

−> methods ( rev )

[ 1 ] rev . default rev . dendrogram∗

Non−v i s i b l e f u n c t i o n s a r e a s t e r i s k e d

−> w i l c o x . t e s t

function ( x , . . . )

UseMethod( " w i l c o x . t e s t " )

<environment : namespace : s t a t s >

> methods ( w i l c o x . t e s t )

1.4 Getting Help

When you are using R, it will not take long before you find yourself needing
help. Fortunately, R has extensive help resources and you should immediately
become familiar with them. Begin by clicking Help on Rgui. The following
options are available.

1. Console: gives useful shortcuts, for instance, Ctrl+L, to clear the R console
screen.
2. FAQ on R: frequently asked questions concerning general R operation.
3. FAQ on R for Windows: frequently asked questions about R, tailored to
the Microsoft Windows operating system.
1.5 External Resources 9

4. Manuals: technical manuals about all features of the R system including


installation, the complete language definition, and add-on packages.
5. R functions (text). . . : use this if you know the exact name of the
function you want to know more about, for example, mean or plot. Typing
mean in the window is equivalent to typing help("mean") at the command
line, or more simply, ?mean. Note that this method only works if the
function of interest is contained in a package that is already loaded into
the search path with library. It can bestarted from the command line
with the command help.start().

1.5 External Resources


• The R Project for Statistical Computing: (http://www.r-project.org/)
Go here first. The Comprehensive R Archive Network: (http://cran.r-
project.org/) This is where R is stored along with thousands of contributed
packages.
• There are also loads of contributed information (books, tutorials, etc.).
There are mirrors all over the world with duplicate information
2. Introduction to Probability

2.1 Learning Objectives

Having worked through this chapter the student will be able to:

• Interpret probabilities and use probabilities of outcomes to calculate


probabilities of events in discrete sample spaces.
• Interpret and calculate conditional probabilities of events.
• Use Bayes’ theorem to calculate conditional probabilities
• Discuss random variables.
• use counting techniques in calculating probabilities of events

2.2 Introduction

R Probabilistic Experiment: A probabilistic experiment is some


occurrence such as the tossing of coins, rolling dice,or observation
of rainfall on a particular day where a complex natural background
leads to a chance outcome.
12 Chapter 2. Introduction to Probability

R Trial: Each repetition of an experiment is called a trial. That is,


a trial is a single performance of an experiment.

R Outcome: The possible result of each trial of an experiment is


called an outcome. When an outcome of an experiment has equal
chance of occurring as the others the outcomes are said to be
equally likely. For example, the toss of a coin and a die yield the
possible outcomes in the sets, H, T and 1, 2, 3, 4, 5, 6 and a play of
a football match yields win(W ), loss(L), draw(D).

R Random variable: A random variable is a function that maps


events defined on a sample space into a set of values. Several
different random variables may be defined in relation to a given
experiment. Thus, in the case of tossing two coins the number
of heads observed is one random variable, the number of tails is
another, and the number of double heads is another. The random
variable “number of heads” associates the number 0 with the event
T T , the number 1 with the events T H and HT , and the number
2 with the event HH. The Figure below illustrates this mapping.

R Sample space: Sample space is the collection of all possible out-


comes at a probability experiment. We use the notation S for
sample space. Each element or outcome of the experiment is called
a sample point. For example,
2.2 Introduction 13

Computing in R;
Let’s examine the random experiment involving dropping a Sty-
rofoam cup from a height of four feet to the floor. After the cup
hits the ground, it eventually comes to rest, and there are three
possible outcomes: it could land upside down, right side up, or
on its side. These potential results of the random experiment are
represented as follows

> S <− data . frame ( l a n d s = c ( " down " , " up " , " s i d e

"))

> S

lands

1 down

2 up

3 side

Here the sample space contains the column lands which stores the
outcomes "down", "up", and "side".
We can also use the package "prob" in computing the sample space
of an experiment in R. Consider the random experiment of tossing
a coin. The outcomes are H and T. We can set up the sample
14 Chapter 2. Introduction to Probability

space quickly with the toss coin function:

> l i b r a r y ( prob )

> tosscoin (1)

toss1

1 H

2 T

R Tree Diagram: The tree diagram represents pictorially the out-


comes of random experiment. The probability of an outcome which
is a sequence of trials, is represented by any path of the tree. For
example,

1. Consider a couple planning to have three children, assuming


each child born is equally likely to be a boy (B) or girl (G).
2. A soccer team on winning (WT) or losing (LT) a toss can
defend either post A or B. It plays the match and either win
(W), draw (D) or lose (L). We illustrate the experiment on a
diagram as follows
2.2 Introduction 15

2.2.1 Determination of Probability of an Event

The probability of an event A, denoted, P (A), gives the numerical measure of


the likelihood of the occurrence of event A which is such that 0 ≤ P (A) ≤ 1. If
P (A) = 0, the event A is said to be impossible to occur and if P (A) = 1, A is said
to be certain. If A is the complement of the event A, then P (A/) = 1 − P (A),
called the probability that event A will not occur. There are three main schools
of thought in defining and interpreting the probability of an event. These are
the Classical Definition, Empirical Concept and the Subjective Approach. The
first two are referred to as the Objective Approach.

The Classical Definition

This is based on the assumption that the outcomes of an experiment are equally
likely. For example, if an experiment can lead to n mutually exclusive and
equally likely outcomes, then the probability of the event A is defined by

n(A) N umber of succesf ul outcome


P (A) = =
n(S) N umber of possible outcomes

The classical definition of probability of event A is referred to as a prior


probability because it is determined before any experiment is performed to
observe the outcomes of event A.
16 Chapter 2. Introduction to Probability

The Empirical Concept

This concept uses the relative frequencies of past occurrences to develop prob-
abilities for future. The probability of an event A happening in future is
determined by observing what fraction of the time similar events happened in
the past. That is,

N umber of times A occured in the past


P (A) =
T otal number of observations

The relative frequency of the occurrence of the event A used to estimate P (A)
becomes more accurate if trials are largely repeated. The relative frequency
approach of defining P (A) is sometimes called posterior probability because
P (A) is determined only after event A is observed.

The subjective Concept

The subjective concept of probability is based on the degree of belief through


the evidence available. The probability of an event A may therefore be assessed
through experience, intuitiveness, judgment or expertise. For example, deter-
mining the probability of getting a cure of a disease or going to rain today. This
approach to probability has been developed relatively recently and is related to
Bayesian Decision Analysis. Although the subjective view of probability has
enjoyed increased attention over the years, it has not been fully accepted by
statisticians who have traditional orientations.

Examples

R Consider the problem of a couple planning to have three children,


assuming each child born is equally likely to be a boy (B) or a girl
(G).

1. List the possible outcomes of this experiment


2.2 Introduction 17

2. What is the probability of the couple having exactly two girls?

Solution:

1. The sample space for this experiment is S = BBB, BBG, BGB,


BGG, GBG, GGB, GGG
2. Let A be the event of the couple having exactly two girls.
Then,

A = BGG, GBG, GGB


n(A) 3
P (A) = =
n(S) 8

R Suppose a card is randomly selected from a packet of 52 playing


cards.

1. What is the probability that it is a “Heart”?


2. What is the probability that the card bears the number 5 or
a picture of a queen?

Solution:

(1) Let the sample space be the set, S = playingcards, A =


Heartcards, B = Cardsnumbered5 and Q = Cardswithapictureof queen.
Then, n(S) = 52, n(A) = 13, n(B) = 4 and n(Q) = 4

i
n(A) 13 1
P (A) = = =
n(S) 52 4

ii

P (B or Q) = P (B) + P (Q)
n(B) n(Q) 4 4 2
= + = + =
n(S) n(S) 52 52 13
18 Chapter 2. Introduction to Probability

(2) The sample space, S = 4R, 2B, 3W-balls and let R = red balls.
Then
n(R) 4
P (R) = =
n(S) 9

R A die is tossed twice. List all the outcomes in each of the following
events and compute the probability of each event.

1. The sum of the scores is less than 4


2. Each toss results in the same score
3. The sum of scores on both tosses is a prime number
4. The product of the scores is at least 20

Solution:
The sample space for the experiment is the set of ordered paired
(m, n), where each takes the values 1, 2, 3, 4, 5 and 6. Thus,

(a) A = (1, 1), (1, 2), (2, 1)

3 1
P (A) = =
36 12

(b) B = each toss results in the same score


= (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)

6 1
P (B) = =
36 6

(c) D = sum of scores on both tosses is prime


D = (1, 1), (1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5), (3, 2),
(3, 4), (4, 1), (4, 3), (5, 2), (5, 6), (6, 1), (6, 5)

15 5
P (D) = =
36 12

(d) E = product of the scores is at least 20


2.2 Introduction 19

= (4, 5), (4, 6), (5, 4), (5, 5), (5, 6), (6, 4), (6 5), (6, 6)

8 2
P (E) = =
36 9

R Lets compute an event using R codes;


Given a data frame sample/probability space S, we may extract
rows using the operator:

> S <− t o s s c o i n ( 2 , makespace = TRUE)

t o s s 1 t o s s 2 probs

1 H H 0.25

2 T H 0.25

3 H T 0.25

4 T T 0.25

> S[1:3 , ]

t o s s 1 t o s s 2 probs

1 H H 0.25

2 T H 0.25

3 H T 0.25

> S [ c (2 , 4) , ]

t o s s 1 t o s s 2 probs

2 T H 0.25

4 T T 0.25

2.2.2 Probability of Compound Events

Two or more events are combined to form a single event using the set operations,
∩ and ∪. The event

1. (A ∩ B) occurs if either A or B both occur(s).


20 Chapter 2. Introduction to Probability

2. (A ∪ B) occurs if both A and B occur.

Definitions:

1. Mutually Exclusive Events: Two or more events which have no com-


mon outcome(s) (i.e. never occur at the same time) are said to be mutually
exclusive. If A and B are mutually exclusive events of an experiment,
then A ∩ B = ∅ and P (A ∪ B) = P (A) + P (B), since P (A ∪ B) = 0.
2. Independent Events: Two or more events are said to be independent if
the probability of occurrence of one is not influenced by the occurrence
or non-occurrence of the other(s). Mathematically, the two events, A
and B are said to be independent, if and only if P (A ∩ B) = P (A) · P (B).
However, if A and B are such that, P (A ∩ B) = P (A) · P (A) · P (B|A), they
are said to be conditionally independent.
3. Conditional Probability: Let A and B be two events in the sample
space, S with P (B) > 0. The probability that an event A occurs given that
event B has already occurred, denoted P (A/B), is called the conditional
probability of A given B. The conditional probability of A given B is
defined as.
P (A ∩ B)
P (A) = P (B) > 0
P (B)

In particular, if S is a finite equiprobable space, then


n(A∩B)
(a) P (A ∩ B) = n(S)
n(S)
(b) P (A) = n(S)
n(A∩B)
(c) P (A) = n(B)

4. Exhaustive Events: Two or more events defined on the same sample


space are said to be exhaustive if their union is equal to the sample space
(thus, if they partition the sample space mutually exclusively). Example:
If A1 , A2 , A3 ∈ S and A1 ∪ A2 ∪ A3 = S.
5. partition of sample space: The events form a partition of the same
sample space if the following hold:
2.2 Introduction 21

• Ai ̸= ∅ For all i = 1, 2, 3, . . . n
• Ai ∩ Aj = ∅ For all i ̸= j i = 1, 2, 3, . . . n and j = 1, 2, 3, . . . n

Pn
i=1 S

Examples

R In a certain population of women, 40% have had breast cancer,


20% are smokers and 13% are smokers and have had breast cancer.
If a woman is selected at random from the population, what is the
probability that she had breast cancer, smokes or both?
Let A and B be event such that P (A) = 0.6, P (B) = 0.5 and
P (A ∪ B) = 0.8. Find

1. P (A/B)
2. Are A and B independent ?

Solution:

1. Let B be the event of women with breast cancer and W the


event of women who smoke. Then, P (B) = 0.4 P (W ) =
0.2 P (B ∩ W ) = 0.13

P (B ∪ W ) = P (B) + P (W ) − P (B ∪ W )

= 0.47

2. Given that, P (A) = 0.6, P (B) = 0.5 and P (A ∪ B) = 0.8


Applying,

P (A ∩ B) =P (A) + P (B) − P (A ∪ B)

=0.3
22 Chapter 2. Introduction to Probability
P (A ∩ B)
P (A|B) = , P (B) > 0 (2.2.1)
P (B)

Hence, P (A) = 0.6


3. A and B are independent if P (A) · P (B) = P (A ∩ B)
Hence, P (A) · P (B) = 0.6 × 0.5 = 0.3 = P (A ∩ B) Thus, A and
B are independent.

R Example on Conditional Probability: Complex components


are assembled in a plant that uses two different assembly lines, A
and A/. Line A uses older equipment than A/, so it is somewhat
slower and less reliable. Suppose on a given day line A has assem-
bled 8 components, of which 2 have been identified as defective (B)
and 6 as non defective (B/), whereas A/ has produced 1 defective
and 9 non defective components. This information is summarized
in the accompanying table.

Condition
Total
B B/

2 6 8
1 9 10
3 15 18

Unaware of this information, the sales manager randomly selects 1


of these 18 components for a demonstration. Prior to the demon-
stration P(line A component selected )

N (A) 8
P (A) = = = 0.4 (2.2.2)
N 18

However, if the chosen component turns out to be defective, then


the event B has occurred, so the component must have been 1 of
2.2 Introduction 23

the 3 in the B column of the table. Since these 3 components are


equally likely among themselves after B has occurred,

P (A ∩ B)
P (A) = (2.2.3)
P (B)
2/18
= (2.2.4)
3/18
2
= (2.2.5)
3

R Exercise: Suppose that of all individuals buying a certain digital


camera, 60% includes an optional memory card in their purchase,
40% includes an extra battery, and 30% includes both a card and
battery. Given that the selected individual purchased an extra
battery, what is the probability that an optional card was also
purchased?

2.2.3 Multiplication Rule for P (A ∩ B)

The definition of conditional probability yields the following result, obtained


by multiplying both sides of the conditional probability equation by P (B).

P (A∩B)
• P (A/B) = P (B)

P (A∩B)
• P (A/B) × P (B) = P (B) × P (B)
• P (A/B) · P (B) = P (A ∩ B)

This rule is important because it is often the case that P (A ∩ B) is desired,


whereas both P (B) and P (A/B) can be specified from the problem description.
24 Chapter 2. Introduction to Probability

The Law of Total Probability

Let A1 , A2 , A3 , . . . , Ak be mutually exclusive and exhaustive events. Then for


any other event B,
k
P (B) = P (B/Ai ) × P (A)
X

Bayes’ Rule

The power of Bayes’ rule is that in many situations where we want to compute
P (A|B) it turns out that it is difficult to do so directly, yet we might have
direct information about P (B|A). Bayes’ rule enables us to compute P (A|B)
in terms of P (B|A).

P (A ∩ B) P (B|A)P (A)
P (A|B) = =
P (B) P (B)

Bayes Theorem

Let A and Ac constitute a partition of the sample space S such that with
P (A) > 0 and P (Ac ) > 0, then for any event B in S such that P (B) > 0,

P (A ∩ B) P (B|A)P (A)
P (A|B) = =
P (B) P (B|A)P (A) + P (B|Ac )P (Ac )

The denominator P (B)in the equation can be computed,

R Example: A paint-store chain produces and sells latex and


semigloss paint. Based on long-range sales, the probability that a
customer will purchase latex paint is 0.75. Of those that purchase
latex paint, 60% also purchase rollers. But only 30% of semi gloss
pain buyers purchase rollers. A randomly selected buyer purchases
a roller and a can of paint. What is the probability that the paint
is latex?
2.2 Introduction 25

Solution:

L =The customer purchases latex paint., P (L) = 0.75

S =The customer purchases semigloss paint., P (S) = 0.25

R =The customer purchases roller.

P (R|L) =0.6

P (R|S) =0.3

P (R) = P (R|L)P (L)+P (R|S)+P (S) = 0.6·0.75+0.3+0.25 = 0.53

P (L ∩ R)
P (L|R) =
P (R)
P (R|L)P (L)
=
P (R)
0.6 × 0.7
= = 0.857
0.6 × 0.75 × 0.3 × 0.25

2.2.4 Axioms of Probability

Given an experiment and a sample space, S , the objective of


probability is to assign to each event A a number P(A), called
the probability of the event A, which will give a precise measure
of the chance that A will occur. To ensure that the probability
assignments will be consistent with our intuitive notions of proba-
bility, all assignments should satisfy the following axioms (basic
properties) of probability.

1. For every event A, 0 ≤ P (A) ≤ 1


2. P (S) = 1
3. If A and B are mutually exclusive events, i.e A ∩ B then
P (A ∪ B) = P (A) + P (B)
4. If A1 , A2 , A3 , A4 , . . . An is a sequence of n mutually exclu-
sive events, then, P (A1 ∪ A2 ∪ A3 · · · ∪ An ) = P (A1 ) + P (A2 ) +
26 Chapter 2. Introduction to Probability

P (A3 ) + · · · + P (An )

From the above axioms the following preposition are derived,

• If ∅ is a set then P (∅) = 0


• If Ac is a complement of an event A, then P (Ac ) = 1 − P (A)

2.2.5 Some Rules of Probability

Additive Rule

1. Let A1 , A2 , A3 , . . . , An be events of the sample space, S. Then,


• P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 )
• P (A1 ∪A2 ∪A3 ) = P (A1 )+P (A2 )+P (A3 )−P (A1 ∩A2 )−
P (A1 ∩ A3 ) − P (A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A3 )
If events A1 , A2 , A3 , . . . , An are mutually exclusive, then,
(a) P (A1 ∪ A2 ) = P (A1 ) + P (A2 )
(b) P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 )
(c) P (A1 ∪ A2 ∪ A3 ... ∪ An ) = P (A1 ) + P (A2 ) + P (A3 ) + · · · +
P (An )

Multiplicative Rule

If events A1 , A2 , A3 , . . . , An are events of the same sample space


S, then
(a) P (A1 ∩ A2 ) = P (A1 ) · P (A2 |A1 )
(b) P (A1 ∩ A2 ∩ A3 ) = P (A1 ) · P (A2 |A1 ) · P (A3 |A2 ∪ A1 )

2.2.6 Application of Counting Techniques

The classical definition of probability of an event A, P (A) requires the knowl-


edge of the number of outcomes of A and the total possible outcomes of the
experiment,S . To find these outcomes we list such outcomes explicitly, which
may be impossible if they are too many. Counting Techniques may be useful
to determine the number of outcomes and compute P (A) . We shall exam-
2.2 Introduction 27

ine three basic counting techniques, namely the Multiplication Principle,


Permutation and Combination.

The Multiplication Principle

The Multiplication Principle, also known as the Basic Counting Principle states
that:

1. If an operation can be performed in ways, and a second operation can be


performed in n1 ways and so on for kth operation which can be performed
in nk ways, then the combined experiment or operations can be performed
in n1 · n2 · n3 . . . nK ways. For example: A homeowner doing some
remodeling requires the services of both a plumbing contractor and an
electrical contractor. If there are 12 plumbing contractors and 9 electrical
contractors available in the area, in how many ways can the contractors
be chosen? If we denote the plumbers by P1 , . . . , P12 and the electricians
by Q1 , ..., Q9 , then we wish the number of pairs of the form (Pi , Qj ). With
n1 = 12 and n2 = 9, the product rule yields N = (12)(9) = 108 possible
ways of choosing the two types of contractors.

Examples

R Tossing a coin has two possible outcomes and tossing a die has
six possible outcomes. Then the combined experiment, tossing
the coin and die together will result in (2 ∗ 6 = 12) twelve possible
outcomes provided below:

H1, H2, H3, H4, H5, H6, T 1, T 2, T 3, T 4, T 5, T 6

R Another example is the number of different ways for a man to get


dressed if he has 8 different shirts and 6 different pairs of trousers.
28 Chapter 2. Introduction to Probability

The combination of the 8 different shirts and the six different pairs
of trousers results in (8 ∗ 6 = 64) possible ways.

R In a certain examination paper, students are required to answer 5


out of 10 questions from Section A another 3 out of 5 questions
from Section B and 2 out of 5 questions from Section C. In
how many ways can the students answer the examination paper?
Solution:

1. The number of ways of answering the questions in Section A:


10 ∗ 9 ∗ 8 ∗ 7 ∗ 6 = 30 240.
2. The number of ways of answering the questions in Section B:
5 ∗ 4 ∗ 3 = 60.
3. The number of ways of answering the question in Section C:
5 ∗ 4 = 20.
4. Hence the students can answer the question in the three
sections in : 30240 ∗ 60 ∗ 20 = 36 280 000

Application of the multiplication principle results in the other two counting


techniques: Permutation and Combination, used to find the number of pos-
sible ways when a fixed number of items are to be picked from a lot without
replacement.

2.2.7 Permutation of Objects

An ordered arrangement of objects is called a permutation. For example, the pos-


sible permutations of the letters A, B and C are as follows: ABC, ACB, BAC, BCA, CAB, CBA
2.2 Introduction 29

Definitions:

1. The number of permutations of n! distinct objects, taken all together is:

n! = n(n − 1)(n − 2)(n − 3) . . . or (nP n )

2. The number of permutations of n distinct objects taken k at a time is:

n!
Pk =
n
n − k!

where k < n
3. The number of permutations of n objects consisting of groups of which n1
of the first group are alike,n2 of the second group are alike and so on for
the k th group with objects which are alike is:

n!
, where n1 + n2 + n3 + ...nk = n
n1 !n2 !n3 !, . . . nk !

4. Circular Permutations: Permutations that occur when objects are arranged


in a circle are called circular permutations. The number of ways of
arranging different objects in a circle is given by

n!
= (n − 1)!
n

Examples

R The number of permutations of 10 distinct digits taken two at a


time is:
10!
10
P2 = = 10 ∗ 9 = 90
(10 − 2)!

R A company codes its customers by giving each customer an eight


character code. The first 3 characters are the letters A, B and C in
30 Chapter 2. Introduction to Probability

any order and the remaining 5 are the digits 1, 2, 3, 4 and 5 also in
any order. If each letter and digit can appear only once then the
number of customers the company can code is obtained as follows:

1. The first 3 letters can be filled in 3!


2. The next 5 digits can be filled in 5!
3. Then the required number 3! ∗ 5! = 720

R The number of permutations of the letters of the word,


POSSIBILITY, which contains 3I’s and 2S’s is ?

R The number of arrangements of the letters of the word,


ADDING, if the two letters D and D are together
(ADDING)?

R In how many ways can 4 boys and 2 girls seat them-


selves in a row if
• the 2 girls are to sit next to each other?
• the 2 girls are not to sit next to each other?

Solution:

i If we regard the 2 girls as separate persons ( B1 B2 B3 B4 G1


G2 ), then the number of arrangements of 5 different persons,
taken all at a time = 5!
The 2 girls can exchange places and so the required number
of ways they can seat themselves = 5! X 2! = 240
ii The number of ways the boys can arrange themselves = 4!
The number of ways the 2 girls can occupy the arrowed places
in their mix with the boys is as shown below:
The required number of permutations (with the 2 girls not
2.2 Introduction 31

sitting next to each other) = 4! x 5 x 4 = 480

Lets compute permutations using "Permu" package in R ;

i n s t a l l . packages ( " permute " )

l i b r a r y ( permute )

r e s u l t <− a l l P e r m s ( 3 )

print ( r e s u l t )

data <− 1 : 3

c o m b i n a t i o n s <− combn ( data , length ( data ) )

apply ( c o m b i n a t i o n s , 2 , function ( x ) permute : : a l l P e r m s ( x ) )

library ( g t o o l s )

r e s u l t <− p e r m u t a t i o n s ( n=3, r =3, v =1:3)

print ( r e s u l t )

Combination of Objects

A Combination is a selection of objects in which the order of selection does not


matter.

2.2.8 Definition:

The number of ways in which objects can be selected from distinct objects,
irrespective of their order is defined by:
32 Chapter 2. Introduction to Probability

2.3 Questions

R Find the number of ways in which a committee of 4 can be chosen


from 6 boys and 5 girls if it must

1. consists of 2 boys and 2 girls


2. consists of at least 1 boy and 1 girl.

Solution:

1. The number of ways of choosing 2 boys from 6 and 2 girls


from 5 is as follows:

6 5 6! 5!
! !
=
2 2 4!2! 3!2!
= 15 ∗ 10 = 150

2. For the committee to contain at least 1 boy and 1 girl will


involve the following:
1B3G or 2B2G or 3B1G
The required number of ways 1 Ĺ 3 + 2 Ĺ 2 + 3 Ĺ 1
6 5 6 5 6 5

6! 5!  6! 5!  6! 5! 
= ∗ + ∗ + ∗
5!1! 2!3! 4!2! 2!3! 3!3! 4!1!
(6 ∗ 10) + (15 ∗ 10) + (20 ∗ 5) = 310

R A box contains 6 red, 3 white and 5 blue balls. If three balls are
drawn at random, one after the other without replacement, find
the probability that

1. all are red


2. 2 are red and 1 is white
3. at least 1 is red
4. 1 of each colour
2.3 Questions 33

Solution:

N o. of selection of f rom
1. Pr(all the 3 are red balls) = N o. of selection of
3
3 f rom
6
14

6
6∗5∗4 5
3
= =
14
3
14 ∗ 13 ∗ 12 91

(62)(31)
2. Pr(2 red and 1 white ball) =
(14
3)

15 ∗ 3 45
= =
14 ∗ 13 ∗ 2 364

3. Pr(at least 1 red ball) = 1 - Pr(none is a red ball)

8
8∗7
= 1− 3
= 1−
14
3
14 ∗ 13 ∗ 2
2 11
= 1− =
13 13

(61)∗(31)∗(51)
4. Pr(1 of each colour) =
(14
3)

6∗3∗5 45
= =
14 ∗ 13 ∗ 2 182

R A board consists of 12 men and 8 women. If a committee of 3


members is to be formed, what is the probability that

1. It includes at least one woman?


2. It includes more women than men?

Solution:
The number of ways of forming the committee of 3 from twelve
men and 8 women (12M + 8W ) − 20
3 = 1140
34 Chapter 2. Introduction to Probability

1. The probability that it includes at least 1 woman

= P r(W 2M ) + P r(2W 1M ) + P r(3W )


1 ∗ 2 + 2 ∗ 1 + 3
8 12 8 12 8
= 14
3
(8 ∗ 66) + (28 ∗ 12) + (56)
1140
528 + 336 + 56 46
= =
1140 57

2. The probability that it includes more women than men

= P r(2W 1M ) + P r(3W )
2 ∗ 1 + 3
8 12 8
= 14
3
(28 ∗ 12) + (56)
=
1140

R If the probability of achieving monthly production targets at


Goldfields Ghana Limited, (A), and Ashanti (Obuasi), (B), are
0.8 and 0.9 respectively, what is P (A ∩ B)?
Solution:
P(A ∩ B) = P(A)P(B|A)
But production in GGL and production at Ashanti are independent.
Hence,
P (B|A) = P (B) Thus, P(A∩B) = P(A) x P(B) =0.8 x 0.9 = 0.72.
Since the mines are independent of each other, their productions are
assumed to be independent of each other. (a) (a) For dependent
events. Here the first event is considered in determining the
probability of the second. The principle of conditional probability
is required. Thus, the probability of the joint events A and B is
P(A∩B) = P(A) x P(B|A).
2.3 Questions 35

R The Credit Manager at SSB collects data on 100 of her customers.


Of the 60 men, 40 have credit cards (C). Of the 40 women, 30 have
credit cards (C). Ten of the men with credit cards have balances
(B), whilst 15 of the women have balances (B). The Credit Manager
wants to determine the probability that a customer selected at
random is:

1. A woman with credit card


2. A man with a balance.

Solution:

1. P(W ∩ C) = P(W) × P(C|W)


P(W) = 100 ; P (C|W )
40
= 30
40

∴ P(W ∩ C) = 40
100 40 =
× 30 30
100 = 0.3
2. P(M ∩ B) = P(M) × P(B|M)
P(M) = 100 ;
60
P(B|M) = 10
60

∴ P(M∩B) = P(M)P(B|M) = 60
100 60 =
× 10 10
100 = 0.10
OR
P(M cap B) = P(B)P(M|B) = 25
100 25 =
× 10 10
100

R The probability that a mining company will make profit at an


annual production rate of 5000t/yr is 0.7 if the gold price is $660/oz.
If the gold price goes below $660/oz the probability will fall to 0.40.
The current world politics indicates that there is a 50% probability
that the dollar will be strong and gold price will fall below $660/oz.
If:
A: Gold price falls below $660/oz
B: The mine is profitable.

1. What is the probability that both A and B occur?


2. What is the probability that either A or B will occur?
36 Chapter 2. Introduction to Probability

Solution:

P (A) = 0.5, P (B) = 0.7, P (B|A) = 0.4

(a) P (A ∩ B) = P (A)P (B|A) = 0.5 × 0.4 = 0.2

(b) P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

= 0.5 + 0.7 − 0.2

=1

R A coin is tossed twice. What is the probability that at least one


head occurs?
Solution:
The sample space for this experiment is:

S = HH, HT, T H, T T

A = HH, HT, T H

∴ P (A) = 1/4 + 1/4 + 1/4 = 3/4

In general, if an experiment can result in any one of N different


2.3 Questions 37

equally likely outcomes, and if exactly n of these outcomes cor-


respond to event A, then the probability of event A is: P(A) =
n/N

R Number of ways being dealt 2 aces from 4 is: C24 = 4!


2!2! =6
Number of ways being dealt 3 aces from 4 is: C34 = 4!
3!1! =4
For each combination of 2 aces there are the number of combina-
tions of 3 jacks. Thus there are n = (6).(4) = 24 hands with 2 aces
and 3 jacks. The total number of 5 cards all of which

52!
N = C552 = = 2598960
5!47!

are equally likely, is:

n 24
P (C) = = = 0.9 × 10−6
N 2598960

R The probability of a certain lecturer arriving to lectures on time is


P(A) = 0.82. The probability of closing on time is P(D) = 0.83.
The probability that he arrives and departs on time is P(AD) =
0.78 . Find the probability that he will depart on time if he arrives
on time P(DA).
Solution:
P (D ∩ A) 0.78
P (D|A) = = = 0.95
P (A) 0.82

The notion of conditional probability provides the capacityto re-


evaluate the idea of probability information that is when it is
known that another event has occurred. The probability of P(AB)
is an “updating” of P(A) based on the knowledge that event B has
occurred.
38 Chapter 2. Introduction to Probability

R The Credit Manager at SSB collects data on 100 of her customers.


Of the 60 men, 40 have credit cards (C). Of the 40 women, 30 have
credit cards (C).Ten of the men with credit cards have balances
(B), whilst 15 of the women have balances (B). The Credit Manager
wants to determine the probability that a customer selected at
random is:

1. A woman with credit card


2. A man with a balance.

R The probability that a mining company will make profit at an


annual production rate of 5000t/yr is 0.7 if the gold price is
$660/oz. If the gold price goes below 660/oz the probability will
fall to 0.40. The current world politics indicates that there is a
50% probability that the dollar will be strong and gold price will
fall below $660/oz. If:
A: Gold price falls below $660/oz
B: The mine is profitable.

1. What is the probability that both A and B occur?


2. What is the probability that either A or B will occur?

R A coin is tossed twice. What is the probability that at least one


head occurs?

R If a player picks 5 cards, find the probability of holding 2 aces and


3 jacks.
2.4 Discussion Topics 39

2.4 Discussion Topics

R Suppose two people each flip a fair coin simultaneously. Will the
results of the two flips usually be independent? Under what sorts
of circumstances might they not be independent? (List as many
such circumstances as you can.)

R Suppose you are able to repeat an experiment many times, and


you wish to check whether or not two events are independent. How
might you go about this?
3. Introduction to Statistics

3.1 Learning objectives

Having worked through this chapter the student will be able to:

• Discuss the reasons for studying statistics as an engineer.


• Identify basic statistical concepts.
• Identify the levels of measurements
• discuss the sampling procedures
• understand data visualization using several graphical devices (Using R
programming)

3.2 Introduction to Statistics

Statistics is a way to get information from data. Statistics is a discipline which


is concerned with:

• summarizing information to aid understanding,


• drawing conclusions from data,
• estimating the present or predicting the future, and
42 Chapter 3. Introduction to Statistics

• designing experiments and other data collection.

In making predictions, Statistics uses the concept of probability, which models


chance mathematically and enables calculations of chance in complicated cases.

3.3 Why Statistics ?

The field of statistics deals with the collection, presentation, analysis, and use
of data to make decisions, solve problems, and design products and processes.
In simple terms, statistics is the science of data.
Because many aspects of engineering practice involve working with data, ob-
viously knowledge of statistics is just as important to an engineer as are the
other engineering sciences. Specifically, statistical techniques can be powerful
aids in designing new products and systems, improving existing designs, and
designing, developing, and improving production processes.
Statistical analysis provides objective ways of evaluating patterns of events or
patterns in our data by computing the probability of observing such patterns
by chance alone.
Insisting on the use of statistical analyses on which to draw conclusions is an
extension of the argument that objectivity is critical in science. Without the
use of statistics, little can be learnt from most research studies.
Because of the increasing use of statistics in so many areas of our lives, it has
become very desirable to understand and practice statistical thinking. This is
important even if you do not use statistical methods directly.

3.4 Branches of Statistics

3.4.1 Descriptive statistics

This is the branch of statistics that involves the organization, summarization,


and display of data. Two general techniques are used to accomplish this goal.
3.5 Variables 43

• Organize the entire set of scores into a table or a graph that allows
researchers (and others) to see the whole set of scores. (summarizing data
graphically)
• Compute one or two summary values (such as the average) that describe
the entire group. (summarizing data numerically).

3.4.2 Inferential statistics

This is the branch of statistics that involves using a sample to draw conclu-
sions about a population. A basic tool in the study of inferential statistics is
probability.

3.5 Variables

In statistics, a variable has two defining characteristics:

• A variable is an attribute that describes a person, place, thing, or idea.


• The value of the variable can "vary" from one entity to another.

For example, a person’s hair color is a potential variable, which could have the
value of "blond" for one person and "brunette" for another.

3.5.1 Qualitative vs. Quantitative Variables

Variables can be classified as qualitative (aka, categorical) or quantitative (aka,


numeric).

1. Qualitative. Qualitative variables take on values that are names or labels.


The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie,
shepherd, and terrier) would be examples of qualitative or categorical
variables.
2. Quantitative. Quantitative variables are numeric. They represent a
measurable quantity. For example, when we speak of the population of a
44 Chapter 3. Introduction to Statistics

city, we are talking about the number of people in the city - a measurable
attribute of the city. Therefore, population would be a quantitative
variable. In algebraic equations, quantitative variables are represented by
symbols (e.g., x, y, or z).

3.5.2 Discrete vs. Continuous Variables

Quantitative variables can be further classified as discrete or continuous. If a


variable can take on any value between its minimum value and its maximum
value, it is called a continuous variable; otherwise, it is called a discrete variable.

3.6 Univariate vs. Bivariate Data

Statistical data are often classified according to the number of variables being
studied.

3.6.1 Univariate data

When we conduct a study that looks at only one variable, we say that we
are working with univariate data. Suppose, for example, that we conducted
a survey to estimate the average weight of high school students. Since we are
only working with one variable (weight), we would be working with univariate
data.

3.6.2 Bivariate data

When we conduct a study that examines the relationship between two variables,
we are working with bivariate data. Suppose we conducted a study to see if
there was a relationship between the height and weight of high school students.
Since we are working with two variables (height and weight), we would be
working with bivariate data.
3.7 Summarizing data graphically 45

3.6.3 Populations and Samples

The study of statistics revolves around the study of data sets. This lesson
describes two important types of data sets - populations and samples. Along
the way, we introduce simple random sampling, the main method used in this
tutorial to select samples.

3.6.4 Population vs.Sample

The main difference between a population and sample has to do with how
observations are assigned to the data set. A population includes all of the
elements from a set of data. A sample consists of one or more observations from
the population. Depending on the sampling method, a sample can have fewer
observations than the population, the same number of observations, or more
observations. More than one sample can be derived from the same population.

A measurable characteristic of a population, such as a mean or standard


deviation, is called a parameter; but a measurable characteristic of a sample is
called a statistic. We will see in future lessons that the mean of a population is
denoted by the symbol µ; but the mean of a sample is denoted by the symbol
X̄.

3.7 Summarizing data graphically

Selected graphs for qualitative data


46 Chapter 3. Introduction to Statistics

• Pie chart
• Bar Chart (Also frequency distribution)

Selected graphs for Numerical data

• Box plot
• Dot plot
• Stem-and-leaf
• Histogram

3.8 Summary Statistics

3.8.1 Measure of Location

These provide an indication of the center of the distribution where most of the
scores tend to cluster. There are three principal measures of central tendency:
Mode, Median, and Mean.

3.8.2 Measure of Spread/ Variability

Variability is the measure of the spread in the data. The three common
variability concepts are: Range, Variance and Standard deviation.

3.8.3 How to Describe Data Patterns in Statistics

Graphic displays are useful for seeing patterns in data. Patterns in data are
commonly described in terms of: Center, Spread, Shape, Symmetry, Skewness
and Kurtosis

3.9 Unusual Features

Sometimes, statisticians refer to unusual features in a set of data. The two


most common unusual features are gaps and outliers.
3.9 Unusual Features 47

3.9.1 Gaps

Gaps refer to areas of a distribution where there are no observations. The


first figure below has a gap; there are no observations in the middle of the
distribution.

3.9.2 Outliers

Sometimes, distributions are characterized by extreme values that differ greatly


from the other observations. These extreme values are called outliers. The
second figure below illustrates a distribution with an outlier. Except for one
lonely observation (the outlier on the extreme right), all of the observations fall
between 0 and 4. As a "rule of thumb", an extreme value is often considered
to be an outlier if it is at least 1.5 interquartile ranges below the first quartile
(Q1), or at least 1.5 interquartile ranges above the third quartile (Q3).

3.9.3 How to Compare Data Sets

Common graphical displays (e.g., dot plots, box plots, stem plots, bar charts)
can be effective tools for comparing data from two or more data sets.

3.9.4 Four Ways to Describe Data Sets

When you compare two or more data sets, focus on four features:

• Center: Graphically, the center of a distribution is the point where about


half of the observations are on either side.
• Spread: The spread of a distribution refers to the variability of the
data. If the observations cover a wide range, the spread is larger. If the
observations are clustered around a single value, the spread is smaller.
• Shape: The shape of a distribution is described by symmetry, skewness,
number of peaks, etc.
• Unusual: features Unusual features refer to gaps (areas of the distribution
where there are no observations) and outliers.
48 Chapter 3. Introduction to Statistics

3.10 Sampling Procedures

Statisticians employ different procedures in choosing the observations that will


constitute their random samples of the population. The objective of these
procedures is to select samples that will be representative of the population
from where they originate. These samples, also known as random samples, will
have the property that each sample has the same probability of being drawn
from the population as another sample. There are two types of sampling

3.10.1 Simple random sampling

Simple random sampling is used to make statistical inferences about a popula-


tion. It helps ensure high internal validity: randomization is the best method to
reduce the impact of potential confounding variables. However, simple random
sampling can be challenging to implement in practice. To use this method,
there are some prerequisites:

• You have a complete list of every member of the population.


• You can contact or access each member of the population if they are
selected.
• You have the time and resources to collect data from the necessary sample
size.

How is a simple random sampling perforned?

• Define the population


• Decide on the sample size
• Randomly select your sample
• Collect data from your sample

Example:
26 65780
There are 5
different samples of 5 letters that can be obtained
from the 26 letters of the alphabet. If a procedure for selecting a
sample of 5 letters was devised such that each of these 65780 samples
3.10 Sampling Procedures 49

had an equal probability (equal to 1/65780) of being selected, then


the sample selected would be a random sample.

3.10.2 Systematic random sampling

Systematic sampling is a method that imitates many of the randomization


benefits of simple random sampling but is slightly easier to conduct.
You can use systematic sampling with a list of the entire population like you
would in simple random sampling. However, unlike with simple random sam-
pling, you can also use this method when you cannot access a list of your
population in advance.

Examples

R The (testable) population list alternates between men (on the even
numbers) and women (on the odd numbers). You choose to sample
every tenth individual, which will therefore result in only men
being included in your sample. This would be unrepresentative of
the population.

R You run a department store and are interested in how you can
improve the store experience for your customers. To investigate
this question, you ask an employee to stand by the store entrance
and survey every 20th visitor who leaves, every day for a week.
Although you do not necessarily have a list of all your customers
ahead of time, this method should still provide you with a rep-
resentative sample of your customers since their order of exit is
essentially random.
50 Chapter 3. Introduction to Statistics

R In inspecting a batch of 1000 pipes for defects, we can choose to


inspect every 10th item in the batch. The items inspected are the
10th, 20th, 30th, and so on until the 1000th item. In doing so, we
must ensure that each 10th item is not specially produced by a
special process or machine; otherwise, the proportion of defects in
the sample consisting of every 10th item will be fairly homogenous
within the sample, and the sample will not be representative of
the entire batch of 1000 pipes.

How is a systematic random sampling performed?

• Define and list your population, ensuring that it is not ordered in a cyclical
or periodic order.
• Decide on your sample size and calculate your interval,k, by dividing your
population by your target sample size.
• Choose every k th member of the population as your sample.

3.10.3 Stratified Sampling

In a stratified sample, researchers divide a population into homogeneous sub


populations called strata (the plural of stratum) based on specific characteristics
(e.g., race, gender identity, location, etc.). Every member of the population
studied should be in exactly one stratum.
Each stratum is then sampled using another probability sampling method, such
as cluster sampling or simple random sampling, allowing researchers to estimate
statistical measures for each sub-population.
Researchers rely on stratified sampling when a population’s characteristics are
diverse and they want to ensure that every characteristic is properly represented
in the sample. This helps with the generalizability and validity of the study, as
well as avoiding research biases like undercoverage bias.
3.10 Sampling Procedures 51

How do we perform stratified sampling?

• Define your population and subgroup


• Separate the population into strata
• Decide on the sample size for each stratum
• Randomly sample from each stratum

Example:
In determining the distribution of incomes among engineers in the
Bay Area, we can divide the population of engineers into sub-
populations corresponding to each major engineering speciality (elec-
trical, chemical, mechanical, civil, industrial, etc.). Random samples
can then be selected from each of these sub-populations of engineers.
The logic behind this sampling structure is the reasonable assump-
tion that the income of an engineer depends, to a large extent, on
his particular speciality.

3.10.4 Cluster sampling (also called block sampling)

This is a sampling procedure that randomly selects clusters of observations


from the population under study, and then chooses all, or a random selection,
of the elements of these clusters, as the observations of the sample. How is
Cluster sampling performed? STEP 1
52 Chapter 3. Introduction to Statistics
3.10 Sampling Procedures 53

Example:
In determining the distribution of incomes among engineers in the Bay Area,
we can divide the population of engineers into sub-populations corresponding
54 Chapter 3. Introduction to Statistics

to each major engineering speciality (electrical, chemical, mechanical, civil,


industrial, etc.). Random samples can then be selected from each of these
sub-populations of engineers. The logic behind this sampling structure is the
reasonable assumption that the income of an engineer depends, to a large
extent, on his particular speciality.

3.11 Levels of Measurement (Types of Data)

Variables can be classified on the basis of their level of measurement. The


way we classify variables greatly affects how we can use them in our analysis.
Variables can be

• Ordinal; Ordinal data have natural ordering where a number is present in


some kind of order by their position on the scale. These data are used for
observation like customer satisfaction, happiness, etc., but we can’t do
any arithmetical tasks on them.
Ordinal data is qualitative data for which their values have some kind
of relative position. These kinds of data can be considered “in-between”
qualitative and quantitative data. The ordinal data only shows the
sequences and cannot use for statistical analysis. Compared to nominal
data, ordinal data have some kind of order that is not present in nominal
data.

• Norminal Nominal Data is used to label variables without any order or


quantitative value. The color of hair can be considered nominal data, as
one color can’t be compared with another color.
The name “nominal” comes from the Latin name “nomen,” which means
“name.” With the help of nominal data, we can’t do any numerical tasks
or can’t give any order to sort the data. These data don’t have any
meaningful order; their values are distributed into distinct categories.
3.11 Levels of Measurement (Types of Data) 55

• interval, Measurements on a numerical scale in which the value of zero


is arbitrary but the difference between values is important. Of all four
levels of measurement, only the ratio scale is based on a numbering sys-
tem in which zero is meaningful. Therefore, the arithmetic operations
of multiplication and division also take on a rational interpretation. A
ratio scale is used to measure many types of data found in business and
geoscientific analyses. Variables such as costs, profits, inventory levels
and grades are expressed as ratio measures. The value of zero dollars to
measure revenues, for example, can be logically interpreted to mean that
no sales have occurred. Furthermore, a firm with a 40 percent market
share has twice as much of the market as a firm with a 20 percent market
share. Measurements such as weight, time, and distance are also measured
on a ratio scale since zero is meaningful, and an item that weighs 100
pounds is one-half as heavy as an item weighing 200 pounds.

• ratio. Numerical measurements in which zero is a meaningful value and


the difference between values is important. You may notice that the four
levels of measurement increase in sophistication, progressing from the
crude nominal scale to the more refined ratio scale. Each measurement
offers more information about the variable than did the previous one. This
distinction among the various degrees of refinement is important, since
different statistical techniques require different levels of measurements.
While most statistical tests require interval or ratio measurements, other
tests, called nonparametric tests (which will be examined later in this
text), are designed to use nominal or ordinal data.
56 Chapter 3. Introduction to Statistics

3.12 Frequency Distribution

Graphical representation makes unwieldy data readily intelligible and brings to


light the salient features of the data at a glance. It makes visual comparison of
data easier. It facilitates the comparison of two frequency distributions.
Several graphical devices are often used to portray shapes of distributions.
The following types of graphs are commonly used in representing frequency
distributions.

• Stem-and –leaf display;


Stem plots have two basic parts: stems and leaves. The final digit of the
data values is taken to be a leaf, and the leading digit(s) is (are) taken to
be stems. We draw a vertical line, and to the left of the line we list the
stems. To the right of the line, we list the leaves beside their corresponding
stem. There will typically be several leaves for each stem, in which case
the leaves accumulate to the right. It is sometimes necessary to round the
data values, especially for larger data sets. Here’s an example of creating
a simple stem-and leaf in R :

# Sample d a t a

data <− c ( 2 3 , 3 5 , 3 6 , 4 2 , 4 5 , 4 7 , 4 8 , 4 9 , 5 2 , 5 3 , 5 6 , 5 8 )

# C r e a t e a stem−and−l e a f p l o t

stem ( data )

> library ( aplpack )

> stem . l e a f ( UKDriverDeaths , depth = FALSE)

1 | 2 : r e p r e s e n t s 120

l e a f u n i t : 10

n : 192

10 | 57

11 | 136678
3.12 Frequency Distribution 57

12 | 123889

13 | 0255666888899

14 | 00001222344444555556667788889

15 | 0000111112222223444455555566677779

16 | 01222333444445555555678888889

17 | 11233344566667799

18 | 00011235568

19 | 01234455667799

20 | 0000113557788899

21 | 145599

22 | 013467

23 | 9

24 | 7

• Dot plot:
dot plots are a valuable tool for exploratory data analysis. They offer a
concise and informative representation of data distribution, aiding in the
identification of patterns and outliers. Whether used independently or in
conjunction with other visualization techniques, dot plots contribute to a
richer understanding of datasets. Here’s an example of creating a simple
dot plot in R :

# Sample d a t a

c a t e g o r i e s <− c ( " Cat␣A" , " Cat␣B" , " Cat␣C" , " Cat␣D" )

v a l u e s <− c ( 1 5 , 1 0 , 2 5 , 1 8 )

# Create a dot p l o t

d o t c h a r t ( v a l u e s , l a b e l s = c a t e g o r i e s , main = " Dot␣ P l o t ␣Example " )


58 Chapter 3. Introduction to Statistics

• Box-and-whiskers display (box plot);


The box-and-whisker plot is a powerful graphical tool used in statistics
and data analysis to depict the distribution of a dataset. It provides a
concise summary of the central tendency, spread, and presence of outliers
within the data. This visualization technique is especially useful when
dealing with large datasets or comparing multiple distributions. Here’s
an example of creating a simple box-and-whiskers in R :

# Sample d a t a

data <− c ( 2 5 , 3 0 , 3 5 , 4 0 , 4 5 , 5 0 , 5 5 , 6 0 , 6 5 , 7 0 , 7 5 )

# C r e a t e a box−and−w h i s k e r p l o t

boxplot ( data , main = " Box−and−Whisker ␣ P l o t ␣Example " , y l a b = " Values " )

• Histogram;
A histogram is a graphical representation of the distribution of a dataset.
It provides a visual summary of the underlying frequency distribution of
a continuous or discrete variable. The main purpose of a histogram is
to show the underlying frequency distribution of a set of continuous or
discrete data. Here’s an example of creating a simple histogram in R :

# Sample d a t a

data <− c ( 2 2 , 2 8 , 3 0 , 3 5 , 4 0 , 4 2 , 4 5 , 5 0 , 5 5 , 6 0 , 6 5 )

# Create a histogram

h i s t ( data , main = " Histogram ␣Example " , x l a b = " Values " )

• Pareto chart;
A Pareto chart is a type of chart that combines both bar and line charts
to highlight the most significant factors in a dataset. It is named after the
Italian economist Vilfredo Pareto, who observed that roughly 80% of the
effects come from 20% of the causes. Pareto charts are particularly useful
for identifying the most important factors or issues within a dataset and
3.12 Frequency Distribution 59

prioritizing them based on their impact.Here’s an example of creating a


simple pie chart in R using the Pareto function:

# Sample d a t a

c a t e g o r i e s <− c ( " Cat␣A" , " Cat␣B" , " Category ␣C" , " Cat

␣D" , " Caty ␣E" )

f r e q u e n c i e s <− c ( 3 0 , 2 0 , 1 5 , 1 0 , 2 5 )

# Calculate cumulative percentage

c u m u l a t i v e_p e r c e n t a g e <− cumsum( f r e q u e n c i e s ) / sum(

f r e q u e n c i e s ) ∗ 100

# C r e a t e Pareto c h a r t

par ( mar = c ( 5 , 5 , 2 , 5 ) ) # S e t margin f o r b e t t e r

plotting

barplot ( f r e q u e n c i e s , names . a r g = c a t e g o r i e s , main =

" P areto ␣ Chart " , y l a b = " Frequency " )

par (new = TRUE)

plot ( c u m u l a t i v e_p e r c e n t a g e , type = " b " , col = " r e d " ,

pch = 1 9 , a x e s = FALSE, x l a b = " " , y l a b = " " )

axis ( s i d e = 4 , a t = seq ( 0 , 1 0 0 , by = 1 0 ) , l a b e l s =

seq ( 0 , 1 0 0 , by = 1 0 ) , col . axis = " r e d " , l a s = 1 )

mtext ( " Cumulative ␣ P e r c e n t a g e " , s i d e = 4 , l i n e = 3 )

• Pie chart
A pie chart is a circular statistical graphic that is divided into slices to
illustrate numerical proportions. Each slice represents a proportionate
part of the whole, and the total sum of all slices is equal to 100%. Pie
charts are commonly used to display the distribution of a categorical
60 Chapter 3. Introduction to Statistics

variable as a percentage of the whole.


Here’s an example of creating a simple pie chart in R using the pie function:

# Sample d a t a

c a t e g o r i e s <− c ( " Category ␣A" , " Category ␣B" , "

Category ␣C" , " Category ␣D" , " Category ␣E" )

p e r c e n t a g e s <− c ( 2 5 , 2 0 , 1 5 , 1 0 , 3 0 )

# Create a p ie chart

p i e ( p e r c e n t a g e s , l a b e l s = c a t e g o r i e s , main = " P ie ␣

Chart ␣Example " , col = rainbow ( length ( c a t e g o r i e s ) )

• Bar chart;
A bar chart is a graphical representation of data in which rectangular
bars of equal width are drawn with lengths proportional to the values
they represent. The bars can be oriented horizontally or vertically. Bar
charts are used to visually represent and compare the magnitudes of
different categories or groups. Each bar in the chart typically corresponds
to a specific category, and the length of the bar represents the value or
frequency of that category. Here’s an example of creating a simple bar
chart in R :

# Sample d a t a

c a t e g o r i e s <− c ( " Category ␣A" , " Category ␣B" , "

Category ␣C" , " Category ␣D" , " Category ␣E" )

v a l u e s <− c ( 2 5 , 4 0 , 1 5 , 3 0 , 2 0 )

# Create a bar chart

barplot ( v a l u e s , names . a r g = c a t e g o r i e s , col = "


3.13 Measure of Location and Dispersion 61

s k y b l u e " , main = " Bar ␣ Chart ␣Example " , y l a b = "

Values " )

3.13 Measure of Location and Dispersion

In addition to the histogram, the information in the frequency distribution can


be further summarised by means of just two numbers. The first is the location
of the data, and the various numbers that provide information about this are
known as ‘measures of location’ or ‘measures of central tendency’. ‘Location of
the data’ refers to a value that is typical of all the sample observations.
The second important aspect of the data is the dispersion of the observations.
This implies how the data are scattered (dispersed). This is also called ‘measure
of variation’.

3.13.1 Measures of Location

The Mode:

The mode is defined as the observation in the sample which occurs most
frequently. If there is only one mode then it is unimodal otherwise it is
multimodal

Simple R codes for computing the mode

# Example d a t a s e t

data <− c ( 1 , 2 , 2 , 3 , 4 , 4 , 4 , 5 )
62 Chapter 3. Introduction to Statistics

# Create a frequency t a b l e

f r e q_table <− table ( data )

# Find t h e mode u s i n g t h e i n d e x o f t h e maximum f r e q u e n c y

mode_r e s u l t <− as . numeric (names( f r e q_table ) [ which .max(

f r e q_table ) ] )

# P r i n t t h e mode r e s u l t

print (mode_r e s u l t )

The Arithmetic Mean

It is the most commonly used measure of locations. Let the variable x take the
values x1 , x2 . . . xn . The arithmetic mean is defined as:

i=1
1 X
xi
N n

For large data it may be advantageous to classify the data. If these n observa-
tions have corresponding frequencies, the arithmetic mean is computed using
the formula,
x1 f1 + x2 f2 + · · · + xn fn
x= (3.13.1)
n

can be rewritten as,


i=1
1X
x= xi fi (3.13.2)
n n

Simple R codes for computing the Arithmetic Mean;

# Sample d a t a

data <− c ( 1 0 , 1 5 , 2 0 , 2 5 , 3 0 )

# C a l c u l a t e t h e mean
3.13 Measure of Location and Dispersion 63

mean_r e s u l t <− mean( data )

# Print the r e s u l t

print (mean_r e s u l t )

The Geometric Mean

The geometric mean is the average of a set of products, the calculation of which
is commonly used to determine the performance results of an investment or
portfolio. It is technically defined as "the nth root product of n numbers."

1 X
 
G = antilog fi log xi (3.13.3)
N

where xi = x1 , x2 , . . . , xn and fi = f1 , f2 , . . . fn The geometric mean may be used


to show percentage changes in a series of positive numbers. As such, it has
wide application in business and economics, since we are often interested in
determining the percentage change in sales, gross national product, or any other
economic series. The geometric mean (GM) is found by taking the nth root of
the product of n numbers. Thus:

1
GM = (X1 , X2 , . . . , Xn ) n

GM is most often used to calculate the average growth rate over time of some
given series. R codes for computing geometric mean;

# Sample d a t a

data <− c ( 2 , 4 , 8 , 1 6 , 3 2 )

# C a l c u l a t e t h e g e o m e t r i c mean

g e o m e t r i c_mean_r e s u l t <− exp (mean( log ( data ) ) )

# Print the r e s u l t
64 Chapter 3. Introduction to Statistics

print ( g e o m e t r i c_mean_r e s u l t )

R A farm labourer wishes to determine the average growth rate of


his monthly income based on the figures in the table below. If the
average annual growth rate of monthly salary is less than 10% he
will resign. Using GM should he resign?

Year Revenue Percentage of Previous Year

1992 50 000 -
1993 55 000 55/50 =1.10
1994 66 000 66/55= 1.20
1995 70 000 70/66=1.06
1996 78 000 78/70 =1.11

Solution:

1
GM = (1.10 × 1.20 × 1.06 × 1.11)( 4 ) = 1.16 (3.13.4)

Harmonic Mean

In statistics, harmonic mean is used to find the average rate.the harmonic mean
is the reciprocal of the arithmetic mean of the reciprocals.

Note:

• Harmonic mean is important in problems in which variable-


values are compared with a constant quantity of another variable
, i.e. time, distance covered within a certain time, etc.

• Another word for average. Mean almost always refers to arith-


metic mean. In certain contexts, however, it could refer to the
3.13 Measure of Location and Dispersion 65

geometric mean, harmonic mean, or root mean square

R codes for computing Harmonic mean;

# Sample data

data <− c ( 2 , 4 , 8 , 1 6 , 3 2 )

# C a l c u l a t e t h e harmonic mean

harmonic_mean_result <− 1 / mean ( 1 / data )

# Print the r e s u l t

p r i n t ( harmonic_mean_result )

Example: Compute the Geometric and Harmonic means for the


numbers 4 and 9.
Solution:
2
Harmonic Mean = = 5.53 (3.13.5)
1
4 + 19

Median

If the sample observations are arranged in order from smallest to largest, the
median is defined as the middle observation if the number of observations is
odd, and as the number halfway between the two middle observations if the
number of observations is even. The general formulae for the median is given as

n
− fm−1
M D = bL + 2
×c (3.13.6)
fm

where,
bL = lower boundary of the median class
n = number of observations fm = the number of observations in the median
class
fm − 1 = the cumulative frequency of the class preceding the median class.
66 Chapter 3. Introduction to Statistics

c = class interval of the median class


Note: Because of the distorting effect of extreme observations on the mean, the
median is often the preferred measure in such situations as salary negotiations.
R codes for computing median;

# Sample data

data <− c ( 1 5 , 2 0 , 2 5 , 3 0 , 3 5 )

# C a l c u l a t e t h e median

m e d i a n _ r e s u l t <− median ( data )

# Print the r e s u l t

p r i n t ( median_result )

R Example 6:
If bL = 199.5, n = 300, fm−1 = 116, fm = 73, c = 50 Solu-
tion:

150 − 116
M edian = 199.5 + 5̇0
73
= 222.79

3.13.2 Relation between Measure of Location and Types of Frequency Curves.


3.13 Measure of Location and Dispersion 67

3.13.3 Measures of Dispersion, Skewness

It should be clear that a measure of central tendency by itself can exhibit


only one of the important characteristics of a distribution and therefore while
studying a distribution it is equally important to know how the variates are
clustered around or away from the point of central tendency. The variation of
the points about the mean is called dispersion. Spread or dispersion can be
classified into three groups.

• Measures of the difference between representative variate values such as


the range-the interquartile or the interdecile range.
• Measures obtained from the deviations of every variate value from some
central value such as the mean deviation from the mean or the mean
deviation from the median or the standard deviation.
• Measures obtained from the variations of all the variates among themselves,
such as mean difference.

Range

It is the difference between the extreme values of the variate i.e. (xn − x1 ) when
the values are arranged in ascending order.

The Interquartile range

It is the difference between the 75% and 25% i.e. (X75% − X25% ). The interdecile
range is the difference between the ninth and first decile i.e. X0.9 − X0.1 . This
combines eighty percent of the total frequency while the interquartile range
contains fifty percent. They are only mainly used in descriptive statistics
because of the mathematical difficulty in handling them in advanced statistics.
68 Chapter 3. Introduction to Statistics

Average Deviation or Mean Deviation

The mean deviation is defined as:

|(xi − x̄)|
P
MD =
n

For very large n, M.D may equal zero as some deviations may be negative
and others positive but the individual deviations could be numerically large,
thus giving a poor expression of the intrinsic dispersion. The mean absolute
deviation (M.A.D). provides a better and more useful measure of dispersion. R
codes;

# Sample d a t a

data <− c ( 1 5 , 2 0 , 2 5 , 3 0 , 3 5 )

# C a l c u l a t e t h e mean

mean_v a l u e <− mean( data )

# C a l c u l a t e t h e mean d e v i a t i o n

mean_d e v i a t i o n <− mean( abs ( data − mean_v a l u e ) )

# Print the r e s u l t

print (mean_d e v i a t i o n )

3.13.4 Variance and Standard Deviation

In order to minimise the inefficiency of the mean deviation outlined earlier a


better option is the sum of the square deviations i.e. (known simply as the ‘sum
of squares’. The mean of this sum of squares is the sample variance; denoted
symbolically as:
(xi − x̄)2
P
s2 =
n
3.13 Measure of Location and Dispersion 69

For theoretical reasons, the sum of squares is divided by (n˘1) rather than n
because it represents a better estimate of the standard deviation.

(xi − x̄)2
P
s =
2
n−1

# Sample d a t a

data <− c ( 1 5 , 2 0 , 2 5 , 3 0 , 3 5 )

# Calculate the variance

v a r i a n c e_r e s u l t <− var ( data )

# Print the r e s u l t

print ( v a r i a n c e_r e s u l t )

# Calculate the standard deviation

s t d_d e v i a t i o n_r e s u l t <− sd ( data )

# Print the r e s u l t

print ( s t d_d e v i a t i o n_r e s u l t )

For n > 35 there is practically no significant difference in the definitions. Sample



standard deviation S is defined as: S = S 2 or more appropriately:

1 ( xi )2
P !
s = (3.13.7)
X
2
x2i −
n−1 n

For classified data, if the data have k classes

1 ( xi fi )2
P !
s =
X
2 2
xi fi −
n−1 n

Where xi = midpoint (class mark) of the ith class fi = the number of observa-
tions n the ith class n = the total number of observations
n=
Pk
i=1 fi
70 Chapter 3. Introduction to Statistics

x̄ = 1
xi fi A test in probability and statistics was taken
P
sample mean n

by 51 students at UMaT. The scores ranged from 50% to 95% and were classified
into 8 classes of width 6 units. Find the variance and standard deviation.

xi fi 3825
P
x̄ = = = 75
n 51
xi − x̄2 fi 5328
P 
S =
2
=
n−1 50 = 106.56

S = 106.56 = 10.23

It could also be worked as in Table 3.13.4 Table 3.13.4

   2
Class Lim- Class Mark Frequency xi f i xi − x̄ xi − x̄ fi
its (fi)

48-54 51 2 102 -24 1152


54-60 57 3 171 -18 972
60-66 63 5 315 -12 720

66-72 69 8 552 -6 288


72-78 75 10 750 0 0
78-84 81 12 972 6 432
84-90 87 10 870 12 1440
90-96 93 1 93 18 324

Totals 51 = 3825 5328

Table 3.13.4
3.13 Measure of Location and Dispersion 71

xi fi x2i xi f i x2i fi
51 2 2601 102 5202
57 3 3249 171 9747
63 5 3969 315 19845
69 8 4761 552 38088
75 10 5625 750 56250
81 12 6561 972 78732
87 10 7569 870 75690
93 1 8649 93 8649
Totals 3825 292203

2 #
1
" P
xi fi
S2 =
X
x2i fi −
n−1 n
2 #
1 (3825
"
= 292203 −
50 51

= 106.56

S = 10.23

3.13.5 Coefficient of Variation

Whilst the variance is very important in measuring dispersion it has certain


limitations in its applications in comparing distributions that:

• Have significantly different means


• Are measured in different units

In such situations, it is better to use the coefficient of variation which assesses


the degree of dispersion of a data set relative to its mean.

s
CV = × 100%

R codes;

# Sample d a t a

data <− c ( 1 5 , 2 0 , 2 5 , 3 0 , 3 5 )

# Calculate the c o e f f i c i e n t of variation


72 Chapter 3. Introduction to Statistics

c o e f f i c i e n t_o f_v a r i a t i o n_r e s u l t <− sd ( data ) / mean( data )

∗ 100

# Print the r e s u l t

print ( c o e f f i c i e n t_o f_v a r i a t i o n_r e s u l t )

3.13.6 Skewness

Measures describing the symmetry of distributions are called ‘Coefficients of


Skewness’. One such measure is given by:

(xi − x̄)3
P
α3 =
S3

# I n s t a l l and l o a d t h e e1071 p a c k a g e

# i n s t a l l . p a c k a g e s ( " e1071 " )

l i b r a r y ( e1071 )

# Sample d a t a

data <− c ( 1 5 , 2 0 , 2 5 , 3 0 , 3 5 )

# Calculate the skewness

s k e w n e s s_r e s u l t <− s k e w n e s s ( data )

# Print the r e s u l t

print ( s k e w n e s s_r e s u l t )

3.13.7 Kurtosis

Measures of the degree of peakedness of a distribution are called ‘coefficients of


kurtosis’ or briefly ‘kurtosis’. It is often measured as:

(xi − x̄)4
P
α4 =
S4
3.13 Measure of Location and Dispersion 73

R codes;

# I n s t a l l and l o a d t h e e1071 p a c k a g e

# i n s t a l l . p a c k a g e s ( " e1071 " )

l i b r a r y ( e1071 )

# Sample d a t a

data <− c ( 1 5 , 2 0 , 2 5 , 3 0 , 3 5 )

# Calculate the kurtosis

k u r t o s i s_r e s u l t <− k u r t o s i s ( data )

# Print the r e s u l t

print ( k u r t o s i s_r e s u l t )

3.13.8 Questions

R A sample of size 40 produces the following arranged data. Note


that the data has a missing value of x at the x(39) (the second
largest number).This will NOT prevent you from answering the
questions below.
74 Chapter 3. Introduction to Statistics

1. Calculate range,IQR,and median of these data.


2. Given that the mean of these datais 63.50(exactly) and the
standard deviation is 12.33, what proportion of the data lie
within one standard deviation of the mean?
3. Z decides to delete the smallest observation, 14.1, from these
data.Thus, Z has a data set with n = 39. Calculate the range,
IQR, and median of Zeid’s new data set.
4. Refer to (3).Calculate the mean of Z’s new data set.

R Annual precipitation in US Cities. The vector precip contains


average amount of rainfall (in inches) for each of 70 cities in the
United States and Puerto Rico. Let us take a look at the data:

> str ( precip )

Named num [ 1 : 7 0 ] 67 5 4 . 7 7 4 8 . 5 14 1 7 . 2 2 0 . 7 13

43.4 40.2 . . .

− attr ( ∗ , " names " )= c h r [ 1 : 7 0 ] " Mobile " " Juneau

" " Phoenix " " L i t t l e ␣Rock " . . .

> precip [ 1 : 4 ]

Mobile Juneau Phoenix L i t t l e Rock

67.0 54.7 7.0 48.5


3.13 Measure of Location and Dispersion 75

R Lengths of Major North American Rivers. The U.S. Geological


Survey recorded the lengths (in miles) of several rivers in North
America. They are stored in the vector rivers in the datasets
package (which ships with base R). See ?rivers. Let us take a look
at the data with the str function.

str ( rivers )

num [ 1 : 1 4 1 ] 735 320 325 392 524 . . .

R Perform a summary of all variables in RcmdrTestDrive data sets.


You can do this with the command

Answer :

> summary( RcmdrTestDrive )


4. Random Variables and Distribution

4.1 Introduction

Having worked through this chapter the student will be able to:

• Discuss random variables


• Determine probabilities from probability density functions
• Determine probabilities from cumulative distribution functions and cumu-
lative distribution functions from probability density functions, and the
reverse

4.2 Random Variable

R Probabilistic Experiment: A probabilistic experiment is some


occurrence such as the tossing of coins, rolling dice,or observation
of rainfall on a particular day where a complex natural background
leads to a chance outcome.
78 Chapter 4. Random Variables and Distribution

R Random variable: A random variable is a function that maps


events defined on a sample space into a set of values. Several
different random variables may be defined in relation to a given
experiment. Thus, in the case of tossing two coins the number
of heads observed is one random variable, the number of tails is
another, and the number of double heads is another. The random
variable “number of heads” associates the number 0 with the event
T T , the number 1 with the events T H and HT , and the number
2 with the event HH. The Figure below illustrates this mapping.

R Variate: In the discussion of statistical distributions it is conve-


nient to work in terms of variate. A variate is a generalization of the
idea of a random variable and has similar probabilistic properties
but is defined without reference to a particular type of probabilistic
experiment. A variate is the set of all random variables that obey
a given probabilistic law. The number of heads and the number of
tails observed in independent coin tossing experiments are elements
of the same variate since the probabilistic factors governing the
numerical part of their outcome are identical. A multivariate is
a vector or a set of elements,each of which is a variate.A matrix
4.2 Random Variable 79

variate is a matrix or two-dimensional array of elements, each of


which is a variate. In general, dependencies may exist between
these elements.

R Random number:A random number associated with a given


variate is a number generated at a realization of any random
variable that is an element of that variate.

4.2.1 Types of Random Variables

There two types of random variables, The two random variables in the above
examples are representatives of the two types of random variables that we will
consider. These definitions are not quite precise, but more examples should
make the idea clearer.

R Discrete Random Variable:A random variable X is discrete


if the values it can take are separated by gaps. For example, X
is discrete if it can take only finitely many values (for example,
the number of nuclear decays which take place in a second in a
sample of radioactive material– the number is an integer but we
can’t easily put an upper limit on it.)

R Continuous Random Variable: A random variable is continuous


if there are no gaps between its possible values. In the first example,
the height of a student could in principle be any real number
between certain extreme limits. A random variable whose values
range over an interval of real numbers, or even over all real numbers,
is continuous. In general, quantities such as pressure, height, mass,
80 Chapter 4. Random Variables and Distribution

weight, density, volume, temperature, and distance are examples


of continuous random variables

We begin by considering discrete random variables.

4.2.2 Discrete Probability Distribution Variable

Cumulative Distribution Function (CDF)

Given a discrete random variable X, and its probability distribution function


P (X = x) = f (x), we define its cumulative distribution function, CDF, as:

F (x) = P (X ≤ k)

where,
x
F (x) = P (X ≤ x) = P (X = t).
X

t=xmin
4.2 Random Variable 81

Properties of the CDF

The CDF has the following properties

• F(x) is non decreasing


• limx→−∞ F (x) = 0; limx→∞ F (x) = 1
• F (x) is continuous from the right i.e (limx→0+ = F (x) for all x)

R Question: A discrete random variable X whose probability distri-


bution function is:

x
P (X = x) = x ∈ 1, 2, 3, 4, 5 (4.2.1)
15

Find F (3), in other words: find P (X ≤ 3).


Solution
We use the cumulative distribution function and state:

3
P X ≤3 = P X =t
 X 

i=1

That is:

P X ≤ 3 = P X = 1 +P X = 2 + X = 3
   

Using the fact that P X = x = x


we find:

15

P X ≤3 = X =1 + X =2 + X =3
   

1 2 3
= + +
15 15 15
6
P X ≤3 =

15

Finally we can state P X ≤ 3 = 6


= 0.4.

15
82 Chapter 4. Random Variables and Distribution

R Examples:

1 A discrete random variable X has probability distribution


function defined by:

x2
P X =x =

30

Where x = {1, 2, 3, 4}.


Calculate the probability that X ≤ 2.
2 A discrete random variable X has probability distribution
function defined by:

x
f (x) =
15

Where x = {1, 2, 3, 4, 5}.


Calculate the probability that X < 4.

Probability Mass Function (PMF), Discrete density function (DDF), Proba-

biltiy function

Let X be a discrete random variable, abd suppose that the possible values that
it can assume are given by x1 , x2 , . . . xn arranged in some order. Suppose alsp
that these values are assumed with probabilities given by

P (X = xk ) = f (xk ) k = 1, 2, . . . (4.2.2)

Properties of the PMF

• f (x)0
• x f (x) =1
P
4.2 Random Variable 83

Examples

R Questions 1: Show that the following can be probability mass


function and explain your answers.

1. f (x) = 1
5 where x = 0, 1, 2, 3, 4, 5
x2
2. f (x) = 30 where x = 0, 1, 2, 3, 4
3. f (x) = x−2
5 where x = 1, 2, 3, 4, 5

R Question 2: Suppose that a pair of fair coins is tossed and let the
random variable X denote the number of heads minus the number
of tails.

1. Obtain the probability distribution for X


2. Construct a graph for this distribution
3. Find P (X = 1), f (−2), P (X ≤ 2), P (−2 ≤ X < 2), P (X < 0)

R Question 3: A shipment of 8 similar microcomputers to a retail


outlet contains 3 that are defective. If a school makes random
purchase of 2 of these computers, find the probability distribution
for the number of defectives

R NOTE: A probability distribution is a display of all possible


outcomes of an experiment along with the probabilities of each
outcome. In fact, it is a list of all possible outcomes of some
experiment and the probability associated with each outcome

4.2.3 Continuous Probability Distribution Variable

If X is a continuous random variable, the probability that takes on any one


particular value is generally zero. Therefore, we cannot define a continuous
84 Chapter 4. Random Variables and Distribution

random variable in the same way as for a discrete random variable. In order to
arrive at a probability distribution for a continuous random variable we note
that the probability that lies between two different values is meaningful. Thus,
a continuous random variable is the type whose spaces are not composed of a
countable number of points but takes on values in some interval or a union of
intervals of the real line.

4.2.4 Probability Density Function (PDF)

If the set of all possible values of a random variable X , takes on an uncountable


infinite number of values or values in some interval or a union of intervals of
the real line, it is called a continuous random variable if there exists a function
f , called probability density function of X such that the following properties;

1. f (x) ≤ 0 (Non negative)


R∞
2. −∞ f (x)dx =1
3. P (a ≤ x ≤ b) = ∞∞
−∞ f (x)dx where −∞ ≤ a ≤ b ≤ ∞
4.2 Random Variable 85

Examples

R A fair coin is tossed three times. Let the random variable represent
the number of heads which come up.

(a) find the probability distribution corresponding to the random


variable
(b) Construct a probability graph.

Solution:
(a). The sample space is

S = {HHH, T HH, HT H, HHT, HT T, T HT, T T H, T T T }

The probability of each outcome is 18 , since all the outcomes are


equally likely sample events. With each sample point we can
associate a number for the random variable X , as shown in the
table below:

Sample point HHH THH HTH HHT HTT THT TTH TTT
Number of Head 3 2 2 2 1 1 1 0
p(X =x) 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8

The table above shows that the random variable can take the values
0, 1, 2, 3. The next task is to compute the probability distribution
p(Xi) of X. Thus,

1
p(3) = P (X = 3) = P (HHH) =
8
p(2) = P (X = 2) = P ({HHT } ∪ {HT H} ∪ {T HH})

= P ({HHT } + {HT H} + {T HH})


1 1 1 3
= + + =
8 8 8 8
86 Chapter 4. Random Variables and Distribution

p(1) = P (X = 1) = P ({T T H} ∪ {T HT } ∪ {HT T })

= P ({T T H} + {T HT } + {HT T })
1 1 1 3
= + + =
8 8 8 8
1
p(0) = P (X = 0) = P (HHH) =
8

Thus, the probability distribution of X is tabulated as:

xi 0 1 2 3
1 3 3 1
p(xi ) 8 8 8 8

(b). The diagram which follows graphically describes the above


distribution.

R A shipment of 8 similar microcomputers to a retail outlet contains


3 that are defective. If a school makes random purchase of 2 of
these computers, find the probability distribution for the number
of defectives
Solution:
Let X be a random variable whose values x are the possible numbers
of defective computers purchased by the school. Then x can be
4.2 Random Variable 87

any of the numbers 0, 1 and 2. Now

3 5
10
f (0) = P (X = 0) = 0 2
=
8
2
28
3 5
15
f (1) = P (X = 1) = 1 1
=
8
2
28
3 5
3
f (2) = P (X = 2) = 2 0
=
8
2
28

Thus, the probability distribution of X is: As a check, we can

xi 0 1 2
10 15 3
f (x) 28 28 28

see whether the three probabilities we found will sum up to one


because as usual, the total probability associated with a number
of mutually exclusive, exhaustive events must be one.
Note: A probability distribution is a display of all possible outcomes
of an experiment along with the probabilities of each outcome. In
fact, it is a list of all possible outcomes of some experiment and
the probability associated with each outcome.

R Verify that the following probability distribution functions are


probability mass functions.

(2x + 3). x = 1, 2, 3


 1
 21
(i). p(x) =
0

otherwise



k(x − 1). x = 3, 4, 5



(ii). p(x) =
0

otherwise

Solution:
(i). For the probability distribution function to be a probability
88 Chapter 4. Random Variables and Distribution

mass function the following must satisfy:

p(x) > 0 f orall x and


3 3
1 X
p(x) = (2x + 3)
X

x=1
21 x=1
1
= [{2(1) + 3}{2(2) + 3} + {2(3) + 3}]
21
1
= [5 + 7 + 9]
21

=1

(ii). The value is determined by assuming that is a probability


function. Thus,

5 5
p(x) = k(x − 1) = 1
X X

x=3 x=3

=⇒ k[(3 − 1) + (4 − 1) + (5 − 1)] = 1

=⇒ k[2 + 3 + 4] = 1

=⇒ 9k = 1
1
=⇒ k =
9

R Suppose that the error in the reaction temperature, in o ( for a


controlled laboratory experiment is a continuous random variable
X having the probability density function:

2
if − 1 < x2

 x3


f (x) = (4.2.3)
0 if xelsewhere


Verify,
4.2 Random Variable 89

• If f (x) is a PDF
• P (0 < x ≥ 1)

Solution:

(a).

Z∞ Z3 2
x
f (x)dx = dx
3
−∞ −1
2
x3 8 1
= = + =1
9 −1
9 9

(b).

Z1 2
x
P (0 < x eq1) = dx =
3
0
1
x3 1 1
= = −0 =
9 0
9 9

R Find the constant C such that the function below is a probability


density function:

if 0 < x3

cx2


f (x) = (4.2.4)
0 if xelsewhere


Compute P (1 < x < 2)


Solution:
90 Chapter 4. Random Variables and Distribution

(a).

Z∞ Z3
f (x)dx = cx2 dx
−∞ 0
3
cx3
= = 9c
3 0
Z∞
1
But since f (x)dx = 1, 9c = 1 ∴ c=
9
−∞

(b).

Z2 2
x
P (1 < x ≤ 2) = dx
9
1
2
x3 8 1 7
= = − =
27 1
27 27 27

R A machine produced copper wire, and occasionally there is a flaw


at some point along the wire. The length of wire (in meters)
produced between successive flaws is a continuous random variable
X with p.d.f of the form

c(1 + x)−3 0>x



f (x) =
0 x≤0


where c is a constant

Z ∞
f (x)dx = 1
−∞
Z ∞
c(1 + x)−3 dx = 1
0
4.2 Random Variable 91

Let U = (1 + x) and du = dx apply power rule for integral

Z ∞
c U −3 du = 1
0
" #−∞
U −2
c =1
−2 0
1
 
c =1
2
c=2

For each of the following functions, find the constant c so that f(x)
is a p.d.f of a random variable X.

(i) f (x) = 4xc , 0≤x≤1



(ii) f (x) = c x, 0≤x≤4

(iii) f (x) = c/x3/4 , 0<x<1

Solution:
92 Chapter 4. Random Variables and Distribution

(i)

f (x) = 4xc
Z 1
f (x)dx = 1
0
Z 1
4xc dx = 1
0
Z 1
4 xc dx = 1
0
" #1
xc+1
4 =1
c+1 0
" #1
x1
4 =1
c+1 0

c+1 = 4

c=3

(ii)


f (x) = c x
Z 4
c x1/2 dx = 1
0
" 1 #4
x 2 +1
c 1 =1
2 +1 0
43/4
" #
c =1
3/2

c = 3/16
4.2 Random Variable 93

(iii)

f (x) = c/x3/4
Z 1
c x−3/4 = 1
0
" 3 #1
x− 4 +1
c =1
− 34 + 1 0
c
=1
1/4

c = 1/4

R For each of the following functions, find the constant c so that f (x) is a PDF
of a random variable X.

1. f (x) = 4xc, 0 ≤ x ≤ 1

2. f (x) = c 4, 0 ≤ x ≤ 4

R The probability density function of a continuous random variable X is given by

Find the cumulative distribution function and sketch its graph.

R Consider flipping two fair coins. Let X1 if the first coin is heads, and X0 if the
first coin is tails. Let Y 1 if the two coins show the same thing (i.e., both heads
or both tails), with Y 0 otherwise. Let ZXY and W XY .
94 Chapter 4. Random Variables and Distribution

1. What is the probability function of Z?


2. What is the probability function of W?

4.2.5 Simulated Sampling Distributions in R

Certain comparisons hold significance, yet the description of their sampling distribution
isn’t as straightforward or neat in an analytical sense. So, what’s the next step
in such scenarios? Interestingly, having precise analytical details of the sampling
distribution isn’t always necessary. In many cases, using a simulated distribution
as an approximation suffices. This part will guide you through that process. It’s
worth highlighting that R programming excels in computing these simulated sampling
distributions, showing a distinct advantage over other statistical software like SPSS
or SAS.

The Interquartile Range

> i q r s <− r e p l i c a t e ( 1 0 0 , IQR(rnorm ( 1 0 0 ) ) )


#We can l o o k a t t h e mean o f t h e s i m u l a t e d v a l u e s
> mean( i q r s ) # c l o s e t o 1
[ 1 ] 1.322562
#and we can s e e t h e s t a n d a r d d e v i a t i o n
> sd ( i q r s )
[ 1 ] 0.1694132
#Now l e t s t a k e a l o o k a t a p l o t o f t h e #s i m u l a t e d v a l u e s

The Median Absolute Deviation

> mads <− r e p l i c a t e ( 1 0 0 , mad(rnorm ( 1 0 0 ) ) )


We can l o o k a t t h e mean o f th e s i m u l a t e d v a l u e s
> mean( mads ) # c l o s e t o 1 . 3 4 9
4.2 Random Variable 95

[ 1 ] 0.9833985
and we can s e e t h e s t a n d a r d d e v i a t i o n
> sd ( mads )
[ 1 ] 0.1139002

4.2.6 Post Test

1. 1 Define the following terms;

i. discrete random variable


ii. continuous random variable
x
2. Let f (x) = 15 , x = 1,2,3,4,5, zero elsewhere be the p.d.f. of X.

i Pr[1 or 2]
ii P r[1 ≤ X ≤ 3]
iii P r[ 12 ≤ X ≤ 52 ]

3. i f (x) = cx3 , 0<x<1


c
ii f (x) = x4
, 0<x<∞

R A fair coin is tossed three times. Let X represent the number of heads
which come up

(a). find the cumulative distribution function and


(b). sketch the graph

Solution:
96 Chapter 4. Random Variables and Distribution

(a). To obtain the cumulative distribution function, we need the following


steps:
Step 1:
Find the probability distribution of the random variable X. The proba-
bility distribution of this example has been found in an earlier example
and produces the results here for convenience.

xi 0 1 2 3
1 3 3 1
p(xi ) 8 8 8 8

Step 2:
Find the cumulative distribution function:

F (0) = p(X ≤ 0) = p(0 ≤ X < 1)

= p(X < 0) + p(X = 0)

= 0 + p(0)
1
=
8

F (1) = p(X ≤ 1) = p(0 ≤ X < 2)

= p(X < 0) + p(X = 0) + p(X = 1)

= 0 + p(0) + p(1)
1 3
= 0+ +
8 8
4
=
8
4.2 Random Variable 97

F (2) = p(X ≤ 2) = p(0 ≤ X < 3)

= p(X < 0) + p(X = 0) + p(X = 1) + p(X = 2)

= 0 + p(0) + p(1) + p(2)


1 3 3
= 0+ + +
8 8 8
7
=
8

F (2) = p(X ≤ 2) = p(0 ≤ X ≤ 3)

= p(X < 0) + p(X = 0) + p(X = 1) + p(X = 2) + p(X = 3)

= 0 + p(0) + p(1) + p(2) = p(3)


1 3 3 1
= 0+ + + +
8 8 8 8
=1

Hence the cumulative distribution function is



0 x<0









1
0≤x<1





 8


F (x) = 4
1≤x<2

 8



7
2≤x<3





 8



1 x≥3

(b), The graph of F (x) is shown in the figure below:

Finding the probability distribution from the cumulative distribution


function is a straightforward situation. If F (x) is the cumulative distri-
98 Chapter 4. Random Variables and Distribution

bution function of a discrete random variable X, we then find the points


at which the cumulative distribution function jumps, and the jump sizes.
The probability function has masses exactly at those jump points, with
the probability masses being equal in magnitude to the respective jump
sizes. It is for this reason that it is called probability mass function.

R Suppose you are given the cumulative distribution function below:


0 x<0









1
0≤x<1




 8


F (x) = 4
1≤x<2

 8



 78 2≤x<3








1 x≥3

Find its probability distribution.


Solution:
It would be noted from the graph of the cumulative distribution function that
the magnitudes or the heights (that is,p(xi ) of the jumps (steps) at 0,1,2,3 are
4.2 Random Variable 99
1 3 3 1
8, 8, 8, 8 respectively, hence


x=0


 1
8






x=1

 48


F (x) =
x=2


 3
8






x=3

 18

Note:
We can obtain this result without the graph by finding the difference in the
adjacent values of F (x).
The probability density function of a continuous random variable X is given by


0 x<0








f (x) = x
0≤x≤2

 2



0 x>2

Find the cumulative distribution function and sketch its graph.


Solution:
if x < 0, then
100 Chapter 4. Random Variables and Distribution

R∞
F (x) = f (t)dt = 0
−∞
If 0 < x < 2, then
R0 Rx t
F (x) = f (t)dt = 0 + 2 dt
−∞ 0
x
t2 x2
= 0+ 4 0 = 4

If x > 2, then
R0 R2 Rx
F (x) = f (t)dt + f (t)dt + f (t)dt
−∞ 0 2
R2 Rx
= 0 + f (t)dt + f (t)dt
0 2
R2
= 0 + 2t dt
0
2
t2
= 0+ 4 0 +0 =1

Figure 2 Cumulative Distribution Function (Continuous Case)

4.3 Discussion Topic

R According to Professor Doob, the two of them had an argument about whether
random variables should be called “random variables” or “chance variables.”
They decided by flipping a coin — and “random variables” won. (Source:
Statistical Science 12 (1997), No. 4, page 307.) Which name do you think
would have been a better choice?
5. Special Distribution

5.1 Introduction

Having worked through this chapter the student will be able to:

• Understand the assumptions for each of the discrete and continuous probability
distributions presented.
• Select an appropriate discrete and continuous probability distribution to calculate
probabilities in specific applications.
• Calculate probabilities, determine means and variances for each of the discrete
and continuous probability distributions presented.

5.1.1 Discrete Probability Distribution

Bernoulli Distribution:

A single trial of an experiment may result in one of the two mutually exclusive
outcomes such as defective and non-defective, dead or alive, yes or no, male or female,
etc. Such a trial is called and a sequence of these trials form a process, satisfying the
following conditions:
102 Chapter 5. Special Distribution

• Each trial results in one of the two mutually exclusive outcomes, success and
failure.
• The probability of a success, p remains constant, from trial to trial. The
probability of failure is denoted by q = 1 − p
• The trials are independent. That is, the outcome of any particular trial is not
affected by the outcome of any other trial.

Definition

A random variable, is said to have a Bernoulli distribution if it assumes the values 0


and 1 for the two outcomes. The probability distribution for the success in the trial p
is defined by
P (x) = px (1 − p)1−x , x = 0 or 1 (5.1.1)

and 0 < p < 1. where the mean and variance of the distribution are as follows:
• µ = E(x) = p;
• σ = V ar(X) = p(1 − p)
An important distribution arising from counting the number of successes in a fixed
number of independent Bernoulli trials is the Binomial distribution.

Implementation in R

In R, you can work with the Bernoulli distribution mainly through the rbinom function,
which is part of the base R distribution and is typically used for generating random
numbers from a binomial distribution. However, since the Bernoulli distribution is a
special case of the binomial distribution (where the number of trials is 1), you can
use rbinom for this purpose.

# Number o f random v a l u e s you want t o g e n e r a t e


n <− 10
5.1 Introduction 103

# Probability of success
p <− 0 . 5

# Generate random v a l u e s from B e r n o u l l i d i s t r i b u t i o n


b e r n o u l l i_samples <− rbinom ( n , s i z e = 1 , prob = p )

# Print the generated values


print ( b e r n o u l l i_samples )

R Example 35: An urn contains 5 red and 15 green balls. Draw one ball at
random from the urn. Let X=1 if the ball drawn is red, and X=0 if a green
ball is drawn. Obtain;

• the p.d.f. of X,
• mean of X and
• variance of X.
104 Chapter 5. Special Distribution

Solution: The p.d.f of a Bernoulli distribution is

f (x) = px q 1−x , x = 0, 1

where

p = 5/20andq = 15/20
5x 15
  1−x
f (x) = , x = 0, 1
20 20
1
5 x 15 1−x 5 15 5 15
X = E(X) = ) ( ) = (0)( )0 ( )1 + (1)( )1 ( )0
X
x(
x=0
20 20 20 20 20 20

M ean of
5
=( )
20
1
5 1 15 0 5
V (X) = x2 f (x) − [E(X)]2 = (1)( ) ( ) = ( )2
X

0
20 20 20

V ariance of X
5 5 3
=( ) − ( )2 = ( )
20 20 16

The Binomial Distribution

The binomial distribution is a discrete probability distribution, where the experiment


is repeated n times under identical conditions and each of the n trials is independent of
each other which results in one of the two outcomes. Thus, in the event of independent
trials (often called Bernoulli trials) let p be the probability that an event will happen
(success) and q = 1 − p the probability that the event will fail in any single trial. Such
experiments are called Binomial experiments and the probability that the event will
happen exactly times in trials is given by the probability function:

Implementation in R

In R, you can work with the binomial distribution using the following functions:dbinom:
Gives the probability of observing exactly k successes in n trials.
5.1 Introduction 105

# Define the parameters f o r the binomial d i s t r i b u t i o n


n <− 10 # Number o f t r i a l s
p <− 0 . 5 # P r o b a b i l i t y o f s u c c e s s i n each t r i a l

# dbinom : C a l c u l a t e t h e p r o b a b i l i t y o f g e t t i n g e x a c t l y k
successes in n t r i a l s
# For example , p r o b a b i l i t y o f g e t t i n g e x a c t l y 5 s u c c e s s e s o u t
o f 10 t r i a l s
k <− 5
prob_e x a c t l y_5 <− dbinom( k , s i z e = n , prob = p )
print ( paste ( " P r o b a b i l i t y ␣ o f ␣ e x a c t l y " , k , " s u c c e s s e s : " , prob_
e x a c t l y_5 ) )

# pbinom : C a l c u l a t e t h e c u m u l a t i v e p r o b a b i l i t y o f g e t t i n g k
or f e w e r s u c c e s s e s
# For example , c u m u l a t i v e p r o b a b i l i t y o f g e t t i n g 5 or f e w e r
successes
cum_prob_up_t o_5 <− pbinom( k , s i z e = n , prob = p )
print ( paste ( " Cumulative ␣ p r o b a b i l i t y ␣ o f ␣up␣ t o " , k , " s u c c e s s e s :
" , cum_prob_up_t o_5 ) )

# qbinom : Determine t h e q u a n t i l e f u n c t i o n f o r a g i v e n
cumulative p r o b a b i l i t y
# For example , t h e number o f s u c c e s s e s a s s o c i a t e d w i t h a 50%
cumulative p r o b a b i l i t y
quantile_50_p e r c e n t <− qbinom ( 0 . 5 , s i z e = n , prob = p )
106 Chapter 5. Special Distribution

print ( paste ( " Q u a n t i l e ␣ a t ␣50%␣ c u m u l a t i v e ␣ p r o b a b i l i t y : " ,


quantile_50_p e r c e n t ) )

# rbinom : Generate random s a m p l e s from a b i n o m i a l


distribution
# For example , g e n e r a t i n g 15 random s a m p l e s
num_samples <− 15
random_samples <− rbinom (num_samples , s i z e = n , prob = p )
print ( paste ( " 15 ␣random␣ samples ␣ from ␣ b i n o m i a l ␣ d i s t r i b u t i o n : " ,
t o S t r i n g ( random_samples ) ) )

 
    n n!
f x = Pr X = x =   px q n−x =   px q n−x
x x! n − x !

where the random variable X denotes the number of success in n trials and x =
0, 1, 2, 3, 4, 5 . . .
The shape of the distribution depends on the two parameters n and p.

1. when p < 0.5 and n is small, the distribution will be skewed to the right.
2. when p > 0.5 and n is small, the distribution will be skewed to the left
3. when p = 0.5 the distribution will be symmetric.
4. In all cases, as n gets larger the distribution gets closer to being a symmetric,
bell-shaped distribution.

Properties

1. Mean = np
2. Variance =npq

3. Standard Deviation = npq
5.1 Introduction 107

R If 20% of the bolts produced by a machine are bad. Determine the probability
that out of 4 bolts chosen at random.

• one is defective
• none is defective
• at most 2 bolts will be defective.

Solution:

n = 4, − = 0.2, q = 0.8
4
!
P [X = 1] = f (1) = 0.20 .83 = 0.4096
1
4
!
P [X = 0] = f (0) = 0.20 0.84 = 0.4096
0

P [X ≤ 2] = P [X = 0, 1, 2] = P [X = 0] + P [X = 0] + P [X = 1] + P [X = 2]

0.4096 + 0.4096 + 0.1536 = 0.9728

or = 1 − P [X > 2] = 1 − P (X = 3) − P (X = 4)
4 4
! !
1− 0.23 0.81 − 0.24 0.80
3 4

or 1 − P [X ≥ 3] =

= 1 − 0.0256 − 0.0016

= 0.9728

R Suppose that it is known that 30% of a certain population is immune to some


disease. If a random sample of 10 is selected from this population. What is the
probability that it will contain exactly 4 immune persons?
Solution:
(a) Suppose that it is known that 30% of a certain population is immune to
some disease. If a random sample of 10 is selected from this population. What
108 Chapter 5. Special Distribution

is the probability that it will contain exactly 4 immune persons?

n = 10, p = 0.3, x=4


10
!
f (4) = (0.3)4 (0.7)6
4

= 0.2

In a certain population 10% of the population is color-blind. If a random sample


of 25 people is drawn from this population (use table). Find the probability
that

(i) P (X ≥ 5) = 1 − P (X < 5) = 0.0980

(ii) P (X ≤ 4) = 0.902 or 1 − P (X ≥ 5) = 1 − 0.0980 = 0.902

(iii) P (6X ≤ 10) = p(6) + p(7) + p(8)... + p(10) = 0.0333

= 0.0334 i.e P (X ≥ 6)

R From the experiment “toss four coins and count the number of tails” what is
the variance of X?

R Roll a fair 6 – sided die 20 times and count the number of times that 6 shows
up. What is the standard development of your random variable?

n = 20, p = 1/6 q = 5/6

V (x) = npq

= 20 × 1 × 5 = 1006
q q
σ= V (X) = 100/36 = 10/6
5.1 Introduction 109

R The following data are the number of seeds germinating out of 10 on damp
filter paper for 80 sets of seeds. Fit a binomial distribution to these data.

x 0 1 2 3 4 5 6 7 8 9 10 Total
f 6 20 28 12 8 6 0 0 0 0 0 80

Solution:
Here n = 10. N = 80 and fi = 80
P

fi xi 174
P
Arithmetic mean =
80
P
fi
174 174 174
np = p= = = 0.2175, q = 1 − p = 0.7825
80 80n 800

Hence the binomial distribution to be fitted is b(x;10,0.2175). These are


approximately,
x 0 1 2 3 4 5 6 7 8 9 10 Total
f 6.89 19.14 23.94 17.74 8.63 2.88 0.67 0.1 0.01 0.00 0.00 80

Negative Binomial Distribution (Pascal’s Distribution)

Let us consider an experiment in which the properties are the same as those listed
for a binomial experiment with the exception that the trials will be repeated until a
fixed number of successes occur. Therefore, instead of finding the probability of x
successes in n trials, where n is fixed, we are now interested in the probability that
the. . . kth success occurs on the xth trial. Experiments of this kind are called ‘negative
binomial experiments’. (Walpole and Myres, 1993). The number X of trials to produce
k successes in a negative binomial experiment is called a “negative binomial random
variable” and its probability distribution is called the “negative binomial distribution”.
Since its probabilities depend on the number of successes desired and the probability
of success on a given trial, we shall denote them by the symbol b∗ (x; k, p). For the
general formula b∗ (x; k, p), consider the probability of a success on the trial preceded
110 Chapter 5. Special Distribution

by k − 1 successes and x − k failures in some specified order. The probability for the
specified order ending in success is pk−1 q x−k p = pk q x−k . The total number of sample
points in the experiment ending in success, after the occurrence of k − 1 successes
and x − k failures in any order is equal to the number of partitions of x − 1 trials
into two groups with k − 1 successes corresponding to one group and x − k failures
 
x−1
corresponding to the other group. This number is given by the term k−1 . each
mutually exclusive and occurring with equal probability pk q x−k . We obtain the general
 
x−1
formula by multiplying pk q x−k by k−1 . In other words:

! !

 x − 1 k−1 x−k x − 1 k x−k
b x; k, p) = p q p= p q x = k, k + 1, ......
k−1 k−1

p = probability of success
q = (1-p) = probability of failure
x = total number of trials on which the k th success occurs.

Areas of application of negative binomial distribution include many biological


situations such as death of insects, number of insect bites per fruit (e.g. mango).

Implementation in R

Here’s an R code snippet demonstrating various functions related to the negative


binomial distribution

# Load t h e MASS p a c k a g e f o r f u n c t i o n s r e l a t e d t o t h e n e g a t i v e
binomial d i s t r i b u t i o n
i n s t a l l . packages ( "MASS" )
l i b r a ry (MASS)
5.1 Introduction 111

# Define the parameters f o r the n e g a t i v e binomial


distribution
s i z e <− 5 # Number o f s u c c e s s e s
prob <− 0 . 5 # P r o b a b i l i t y o f s u c c e s s i n each t r i a l

# dnbinom : C a l c u l a t e t h e p r o b a b i l i t y o f g e t t i n g a s p e c i f i c
number o f f a i l u r e s
# For example , p r o b a b i l i t y o f g e t t i n g e x a c t l y 3 f a i l u r e s
before 5 successes
f a i l u r e s <− 3
prob_3_f a i l u r e s <− dnbinom( f a i l u r e s , s i z e = s i z e , prob = prob
)
print ( paste ( " P r o b a b i l i t y ␣ o f ␣ e x a c t l y " , f a i l u r e s , " f a i l u r e s : " ,
prob_3_f a i l u r e s ) )

# pnbinom : C a l c u l a t e t h e c u m u l a t i v e p r o b a b i l i t y o f g e t t i n g a
c e r t a i n number o f f a i l u r e s or f e w e r
# For example , c u m u l a t i v e p r o b a b i l i t y o f g e t t i n g 3 or f e w e r
failures
cum_prob_up_t o_3_f a i l u r e s <− pnbinom( f a i l u r e s , s i z e = s i z e ,
prob = prob )
print ( paste ( " Cumulative ␣ p r o b a b i l i t y ␣ o f ␣up␣ t o " , f a i l u r e s , "
f a i l u r e s : " , cum_prob_up_t o_3_f a i l u r e s ) )

# qnbinom : Determine t h e q u a n t i l e f u n c t i o n f o r a g i v e n
cumulative p r o b a b i l i t y
112 Chapter 5. Special Distribution

# For example , t h e number o f f a i l u r e s a s s o c i a t e d w i t h a 50%


cumulative p r o b a b i l i t y
quantile_50_p e r c e n t <− qnbinom ( 0 . 5 , s i z e = s i z e , prob = prob )
print ( paste ( " Q u a n t i l e ␣ a t ␣50%␣ c u m u l a t i v e ␣ p r o b a b i l i t y : " ,
quantile_50_p e r c e n t ) )

# rnbinom : Generate random s a m p l e s from a n e g a t i v e b i n o m i a l


distribution
# For example , g e n e r a t i n g 15 random s a m p l e s
num_samples <− 15
random_samples <− rnbinom(num_samples , s i z e = s i z e , prob =
prob )
print ( paste ( " 15 ␣random␣ samples ␣ from ␣ n e g a t i v e ␣ b i n o m i a l ␣
d i s t r i b u t i o n : " , t o S t r i n g ( random_samples ) ) )

Examples

R Consider an exploration company that is determined to discover two new fields


in a virgin basin it is prospecting, and will drill as many holes as required to
achieve its goal. We can investigate the probability that it will require 2, 3,
4,. . . , n exploratory holes before two discoveries are made. The same conditions
that govern the binomial distribution may be assumed, except that the number
of trials is not fixed. Solution:
x dry holes will be drilled before r discoveries are made.

r+x−1
!
P= (1 − p)x pr
x
(r + x − 1)!
= (1 − p)x pr
(r − 1)!x!
5.1 Introduction 113

If the regional success ratio is assumed to be 10% then the probability that a
two-hole program will meet the company’s goal of two discoveries will be:

(2 + 0 − 1)!
P= (1 − 0.1)0 (0.1)2
(2 − 1)!0!
1!
= ∗ 0.90 ∗ 0.12
1!0!
= 1 ∗ 1 ∗ 0.001 = 0.01

The probability that five holes will be required to achieve two successes is:

(2 + 3 − 1)!
p= (1 − 0.1)3 (0.1)2
(2 − 1)!3!
24
= × 0.729 × 0.01 = 0.029
1×6
or
x − 1 k x−k 4 4!
! !
p p q = (0.1)2 (0.9)3 = 0.01 × 0.729 = 0.029
k−1 1 (4 − 1)!1!

R Find the probability that a person tossing three coins will get either all heads
or all tails for the second time in the fifth toss?

Solution:

1 1 1
x = 5, k = 2, p =+ =
8 8 4
! !2 !3
1 4 1 3 27
b∗ (5; 2, ) = =
4 1 4 4 256

The negative binomial distribution derives its name from the fact that each
term in the expansion of pk(1-p)x-k corresponds to the values of b*(x; k,p ) for
x = k, k+1, k+2, ...
114 Chapter 5. Special Distribution

Geometric Distribution

The geometric distribution is a special case of the negative binomial distribution for
which k = 1. This is the probability distribution for the number of trials required for
a single success. Thus:
g(x; p) = pq x−1

Implementation in R

In R, you can work with the geometric distribution using functions that are similar to
those for the binomial and negative binomial distributions. Here’s an R code snippet
demonstrating various functions related to the geometric distribution, along with
comments explaining each part:

# D e f i n e t h e parameter f o r t h e g e o m e t r i c d i s t r i b u t i o n
prob <− 0 . 5 # P r o b a b i l i t y o f s u c c e s s i n each t r i a l

# dgeom : C a l c u l a t e t h e p r o b a b i l i t y o f o b s e r v i n g a s p e c i f i c
number o f f a i l u r e s b e f o r e t h e f i r s t s u c c e s s
# For example , p r o b a b i l i t y o f g e t t i n g e x a c t l y 3 f a i l u r e s
before the f i r s t success
f a i l u r e s <− 3
prob_3_f a i l u r e s <− dgeom( f a i l u r e s , prob = prob )
print ( paste ( " P r o b a b i l i t y ␣ o f ␣ e x a c t l y " , f a i l u r e s , " f a i l u r e s ␣
b e f o r e ␣ f i r s t ␣ s u c c e s s : " , prob_3_f a i l u r e s ) )

# pgeom : C a l c u l a t e t h e c u m u l a t i v e p r o b a b i l i t y o f o b s e r v i n g a
c e r t a i n number o f f a i l u r e s or f e w e r b e f o r e t h e f i r s t
success
5.1 Introduction 115

# For example , c u m u l a t i v e p r o b a b i l i t y o f g e t t i n g 3 or f e w e r
f a i l u r e s before the f i r s t success
cum_prob_up_t o_3_f a i l u r e s <− pgeom( f a i l u r e s , prob = prob )
print ( paste ( " Cumulative ␣ p r o b a b i l i t y ␣ o f ␣up␣ t o " , f a i l u r e s , "
f a i l u r e s ␣ b e f o r e ␣ f i r s t ␣ s u c c e s s : " , cum_prob_up_t o_3_f a i l u r e s
))

# qgeom : Determine t h e q u a n t i l e f u n c t i o n f o r a g i v e n
cumulative p r o b a b i l i t y
# For example , t h e number o f f a i l u r e s a s s o c i a t e d w i t h a 50%
cumulative p r o b a b i l i t y before the f i r s t success
quantile_50_p e r c e n t <− qgeom ( 0 . 5 , prob = prob )
print ( paste ( " Q u a n t i l e ␣ a t ␣50%␣ c u m u l a t i v e ␣ p r o b a b i l i t y ␣ o f ␣
f a i l u r e s ␣ b e f o r e ␣ f i r s t ␣ s u c c e s s : " , quantile_50_p e r c e n t ) )

# rgeom : Generate random s a m p l e s from a g e o m e t r i c


distribution
# For example , g e n e r a t i n g 15 random s a m p l e s
num_samples <− 15
random_samples <− rgeom(num_samples , prob = prob )
print ( paste ( " 15 ␣random␣ samples ␣ from ␣ g e o m e t r i c ␣ d i s t r i b u t i o n : " ,
t o S t r i n g ( random_samples ) ) )

Examples

R In a certain theodolite manufacturing process, it is known that on the average,


1 in every 100 is defective. What is the probability that the fifth item inspected
116 Chapter 5. Special Distribution

is the first defective theodolite found?


Solution:
Using the geometric distribution with x = 5; p=0.01, we have:

g(5, 0.01) = p1 (1 − p)5−1) = 0.011 × 0.994 = 0.0096

For geometric distribution:

1 1−p
µ= = 100 σ2 = = 9900
P p2

Poisson Distribution

Experiments yielding numerical values of a random variable (x), the number of


successes occuring during a given time interval or in a specified region, are often called
poisson experiments. The given time interval may be of any length, such as a minute,
a day, a week, a month or even a year. Hence, a poisson experiment might generate
observations for the random variable representing the number of telephone calls per
hour received by an office, the number of days school is closed due to snow during
the winter, or the number of postponed games due to rain during a basketball season.
The specified region could be a line segment, an area, a volume or perhaps a material.
In this case, might represent the number of field mice per acre, the number of bacteria
in a given culture, or the number of typing errors per page.

The Poisson process:


A Poisson experiment is derived from the Poisson process and possesses the following
properties:

1. The number of successes occurring in one time interval or specified region are
independent of those occurring in any other disjoint time interval or region of
5.1 Introduction 117

space.
2. The probability of a single success occurring during a very short time interval
or in a small region is proportional to the length of the time interval or the size
of the region and does not depend on the number of successes occuring outside
this time interval or region.
3. The probability of more than one success occuring in such a short time interval
or falling in such a small region is negligible.

The probability distribution of the Poisson random variable is called the Poisson
distribution and is denoted by P (x; µ) since its values depend only on µ , the average
number of successes occuring in the given time interval or specified region. This
formula is given by the definition below:

Definition: The probability distribution of the Poisson random variable , representing


the number of successes occurring in a given time interval or specified region is given
by:
e−µ µx
P (x; µ) = , x = 1, 2, 3, 4, 5, . . .
x!

where µ is the average number of successes occuring in the given time interval or
specified region and e = 2.7183
Theorem: The mean and variance of the Poisson distribution both have the value µ.

Implementation in R

In R, you can work with the Poisson distribution using the following functions:

1. dpois - Gives the probability of observing exactly x events (probability mass


function).
2. ppois - Calculates the cumulative probability of observing x or fewer events
(cumulative distribution function).
3. qpois - Determines the quantile function for a given cumulative probability.
118 Chapter 5. Special Distribution

4. rpois - Generates random samples from the Poisson distribution.

# D e f i n e t h e parameter f o r t h e Poisson d i s t r i b u t i o n
lambda <− 4 # The a v e r a g e number o f e v e n t s i n t h e i n t e r v a l ( e
. g . , 4 e v e n t s per time u n i t )

# dpois : Calculate the p r o b a b i l i t y of observing exactly x


events
# For example , p r o b a b i l i t y o f o b s e r v i n g e x a c t l y 3 e v e n t s
x <− 3
prob_x_e v e n t s <− dpois ( x , lambda )
print ( paste ( " P r o b a b i l i t y ␣ o f ␣ e x a c t l y " , x , " e v e n t s : " , prob_x_
events ) )

# ppois : Calculate the cumulative p r o b a b i l i t y of observing x


or f e w e r e v e n t s
# For example , c u m u l a t i v e p r o b a b i l i t y o f o b s e r v i n g 3 or f e w e r
events
cum_prob_up_t o_x_e v e n t s <− ppois ( x , lambda )
print ( paste ( " Cumulative ␣ p r o b a b i l i t y ␣ o f ␣up␣ t o " , x , " e v e n t s : " ,
cum_prob_up_t o_x_e v e n t s ) )

# q p o i s : Determine t h e q u a n t i l e f o r a g i v e n c u m u l a t i v e
probability
# For example , t h e number o f e v e n t s a s s o c i a t e d w i t h a 50%
cumulative p r o b a b i l i t y
quantile_50_p e r c e n t <− qpois ( 0 . 5 , lambda )
5.1 Introduction 119

print ( paste ( " Q u a n t i l e ␣ a t ␣50%␣ c u m u l a t i v e ␣ p r o b a b i l i t y : " ,


quantile_50_p e r c e n t ) )

# r p o i s : Generate random s a m p l e s from t h e Poisson


distribution
# For example , g e n e r a t i n g 15 random s a m p l e s
num_samples <− 15
random_samples <− rpois (num_samples , lambda )
print ( paste ( " 15 ␣random␣ samples ␣ from ␣ P o i s s o n ␣ d i s t r i b u t i o n : " ,
t o S t r i n g ( random_samples ) ) )

Examples

R Suppose that an urn contains 100,000 marbles and 120 are red. If a random
sample of 1000 is drawn what are the probabilities that 0, 1, 2, 3, and 4
respectively will be red.

120
n = 100, p= = 0.0012, q = 0.9988
100000

Solution:

1000
!
Binomial = 0.00122 0̇.99881000−x
x

F or x=3

i.e. 166167000 × 1.728 − 09 × 0.30206


120 Chapter 5. Special Distribution

Using the Poisson method,

λ = np = 1000 × 0.0012 = 1.2

e−12 = 0.3012
e−1.2 1.23
f (3) = = 0.0867
3!
P (X > 5) = 1 − p(X ≤ 4)

= 1 − 0.9985 = 0.0015

R Let X have a Poisson distribution with a mean of λ = 5. Find

i. P (X ≤ 6)
‘ii. P (X > 5)
iii. P (X = 6)
iv. P (X ≥ 4)

Solution:

6
5x e−5
P (X ≤ 6) = = 0.762
X
i.
x=0
x!

ii. P (X > 5) = 1 − P (X ≤ 5) = 1 − 0.616 = 0.384

iii. P (X = 6) = P (X ≤ 6) − P (X ≤ 5) = 0.762 − 0.616 = 0.146

iv. P (X ≥ 4) = 1 − P (X < 4)

R A hospital administrator, who has been studying daily emergency admissions


over a period of several years, has come to the conclusion that they are dis-
tributed according to the Poisson law. Hospital records reveal that emergency
admissions have averaged three per day during this period. If the administrator
is correct in assuming a Poisson distribution. Find the probability that
5.1 Introduction 121

i. Exactly two emergency admissions will occur on a given day.


ii. No emergency admissions will occur on a particular day.
iii. Either 3 or 4 emergency cases will be admitted on a particular

Solution:

e−λ λx
i. p(X = 2) = λ=3
x!
e−3 32 0.05(9)
= = = 0.225
2! 2
e−3 30
ii. p(X = 0) = = 0.05
0!
e−3 33 e−3 34
iii. p(X = 3) + p(X = 4) = +
3! 4! 
−3 27 81
=e +
6 24
9 27
 
= 0.05 +
2 8
= 0.05(7.875)

= 0.394

R Fit a Poisson distribution to the following data which gives the number of yeast
cells per square for 400 squares

No. of cells per square (x) 0 1 2 3 4 5 6 7 8 9 10 Total

No. of squares (f) 103 143 98 42 8 4 2 0 0 0 0 400

Solution:
The expected theoretical frequency for r successes is N e−m mr /r!, but m is not
122 Chapter 5. Special Distribution

given in this example. The mean of the Poisson distribution is m. Hence

f x 529
P
m= P = = 1.32
f 400
λx e−λ (1.32)x e−1.32
P (X = x) = =
x! x!
thus,
e−1.32 (1.32)x
f = 400 , theref ore,
x!

No. of cells per square (x) 0 1 2 3 4 5 6 7 8 9 10 Total

No. of squares (f) 107 141 93 41 14 4 0 0 0 0 0 400

R In a manufacturing process in which glass is being produced, defects or bubbles


occur, occasionally rendering the pieces undesirable for marketing. If it is
known that on the average 1 in every 1000 of these items produced have one or
more bubbles. What is the probability that a random sample of 8000 will yield
fewer than 7 items possessing bubbles?
Solution:

6
P (X < 7) = b(x; 8000, 0.001)
X

x=0
6
= p(x; 8)
X

x=0

= 0.3134

R 4. Suppose it is known that the probability of recovery from a certain disease


is 0.4. If 15 people are stricken with the disease what is the probability that
5.1 Introduction 123

1. or more will recover?


2. 4 or more will recover?
3. at least 5 will recover?
4. fewer than three recover?

5.1.2 Post-Test

1. Suppose that 24% of a certain population have blood group B, for a sample of
size 20 drawn from this population, find the probability that

a) Exactly 3 persons with blood group B will be found.


b) Three or more persons ≡ the characteristics of interest will be found
c) Fewer than three will be found.
d) Exactly five will be found.

2. In a large population, 16% of the members are left-handed. In a random sample


of size 10, find

a) The probability that exactly 2 will be left-handed p(X = 2)


b) P (X ≥ 2)
c) P(X < 2)
d) P (1 ≤ X ≤ 4)

3. Suppose mortality rate of a certain disease is 0.1, suppose 10 people in a


community contract the disease, what is the probability that

a) None will survive


b) 50% will be
c) At least 3 will die
d) Exactly 3 will die

4. Suppose it is known that the probability of recovery from a certain disease is


0.4. If 15 people are stricken with the disease what is the probability that
124 Chapter 5. Special Distribution

a) or more will recover?


b) 4 or more will recover?
c) at least 5 will recover?
d) fewer than three recover?

5. In the study of a certain aquatic organism, a large number of samples were


taken from a pond, and the number of organisms in each sample was counted.
The average number of organisms per sample was found to be two. Assuming
the number of organisms to be Poisson distributed. Find the probability that:

a) The next sample taken will contain one or more organisms.


b) The next sample taken will contain exactly three organisms.
c) The next sample taken will contain fewer than five organisms.

6. It has been observed that the number of particles emitted by a radioactive


substance, which reach a given portion of space during time t, follows closely
the Poisson distribution with parameter =100. Calculate the probability that:

a) No particles reach the portion of space under consideration during time t;


b) Exactly 120 particles do so;
c) At least 50 particles do so.

7. The phone calls arriving at a given telephone exchange within one minute
follow the Poisson distribution with parameter value equal to ten. What is the
probability that in a given minute:

a) No calls arrive?
b) Exactly 10 calls arrive?
c) At least 10 calls arrive
5.1 Introduction 125

5.1.3 Continuous Probability Distribution

Normal Distribution

The graph of the normal distribution which is a bell-shaped smooth curve approxi-
mately describes many phenomena that occur in nature, industry and research. In
addition, errors in scientific measurements are extremely well approximated by a
normal distribution. Thus, the normal distribution is one of the most widely used
probability distributions for modelling random experiments. It provides a good model
for continuous random variables involving measurements such as time, heights/weights
of persons, marks scored in an examination, amount of rainfall, growth rate and many
other scientific measurements.
Definition:
The probability density function for the normal random variable X which is simply
called normal distribution is defined by:


1 1−µ 2
 √1 e− 2 (σ )

−∞ < x < ∞


σ 2π
f (x) = 

0,

elsehere

where
σ > 0, µ > 0 and −∞ < x < ∞

.
and the mean and variance of the measurements X, are E(x) = µ and V ar(x) = σ 2 .
If the random variable is modelled by the normal distribution with mean, µ and
variance, σ then it is simply denoted as x ∼ N (µ, σ)
This fact considerably simplifies the calculations of probabilities concerning normally
distributed variables, as seen in the following illustration: Suppose that X, let c1 <
126 Chapter 5. Special Distribution

c2, and since , then

X − µ c2 − µ  X − µ c1 − µ 
=P < −P <
σ σ σ σ
c − µ c − µ
2 1
=Φ −Φ
σ σ

Note that
Φ(−x) = 1 − Φ(x)

Properties

• Mean = E(x) = µ
• Variance = σ 2
• Standard Deviation = σ

1. If Z is N (0, 1), find;


i. P (0.53 < Z < 2.06)
ii. P (Z > 2.89)
2. If X is N (75, 100), find
i. P (X < 60) ii. P(6
3. If X is normally distributed with a mean of 6 and a variance 25, find
P (6 ≤ X ≤ 12)

Solution:
5.1 Introduction 127

1.

(i) P (0.53 < Z < 2.06) = Φ(2.06) − Φ(0.53) = 0.9803 − 0.7019 = 0.2784

(ii) P (Z > 2.89) = 1 − Φ(2.89) = 1 − 0.9981 = 0.0019

2.

X − 75 60 − 75 
P (X < 60) = P < = P (Z < −1.5) = 0.0668
10 10
6−6 12 − 6 
P (6 ≤ X ≤ 12) = P ≤Z ≤ = P (0 ≤ Z1.2)
5 5
= Φ(1.2) − Φ(0) = 0.8849 − 0.5000 = 0.3849

Reasons for importance

1. Many data sets well-modelled by normal distribution; for example heights


and weights.
2. Many data sets can be transformed to near normality; for example
log(income) = normal.
3. Many distributions approach normality in some limit.
128 Chapter 5. Special Distribution

Standard normal distribution

A measure of the number of standard deviations the data falls above or below the
mean.
observation − mean
Z= (5.1.2)
SD

Wecan calculate Z scores for distributions of any shape, but with normal distributions
we use Z scores to calculate probabilities. Observations that are more than 2 SD
away from the mean are typically considered unusual. Another reason we use Z
scores is if the distribution of X is nearly normal then the Z scores of X will have
a Z distribution (unit normal). Note that the Z distribution is a special case of
the normal distribution where mean(µ) = 0 and standarddeviation(σ) = 1 . Linear
transformations of normally distributed random variable are also normally distributed.
5.1 Introduction 129

Hence, if
X −µ
Z=
σ

where X N (µ, σ). Here,

Calculating Probabilities - Z Table

The area under the unit normal curve from −∞ to a is given by

P (Z < a) = ω(a)

The area under the unit normal curve from a to b where a ≤ b is given by
130 Chapter 5. Special Distribution

The area under the unit normal curve outside of a to b where a ≤ b is given by

Examples

R Find the area under the standard normal curve between z = −1.5 and z = 1.25.
Solution
The area under the standard normal curve between z = −1.5 and z = 1.25 is
shown
5.1 Introduction 131

From the Standard Normal Table, the area to the left of z = 1.25 is 0.8944 and
the area to the left of z = -1.5 is 0.0668. So, the area between z = -1.5 and z =
1.25 Area = 0.8944 - 0.0668 = 0.8276
Interpretation: So, 82.76% of the area under the curve falls between z = -1.5
and z = 1.25.

R A survey indicates that people use their cellular phones an average of 1.5 years
before buying a new one. The standard deviation is 0.25 year. A cellular phone
user is selected at random. Find the probability that the user will use their
current phone for less than 1 year before buying a new one. Assume that the
variable x is normally distributed.

Solution: The graph shows a normal curve with µ = 1.5 and σ = 0.25 on a
shaded area for x less than 1. The z-score that corresponds to 1 year is

1 − 1.15
z= = −2
0.25

The Standard Normal Table shows that P(z < - 2) = 0.0288


Interpretation: The probability that the user will use their cellular phone for
less than 1 year before buying a new one is 0.0228

R The results of an examination were Normally distributed. 10% of the candidates


had more than 70 marks and 20% had fewer than 35 marks. Find the mean
and standard deviation of the marks.
132 Chapter 5. Special Distribution

R The weights of chocolate bars are normally distributed with mean 205 g and
standard deviation 26 g. The stated weight of each bar is 200 g.

1. Find the probability that a single bar is underweight


2. Four bars are chosen at random. Find the probability that fewer than
two bars are underweight.

Solution
(a) Let W be the weight of a chocolate bar, W N (205, 262 ) Then

W − µ 200 − 205
Z= = = −1923077
σ 2.6
5.1 Introduction 133

P (W < 200) = P (Z < −192) = 1 − Φ(192) = 1 − 09726


Interpretation: probability of an underweight bar is 00274.
(b) We want the probability that 0 or 1 bars chosen from 4 are underweight. Let U
be underweight and C be correct weight,
For

P (1 underweight) = P (CCCU ) + P (CCU C) + P (CU CC) + P (U CCC)

= 4 × 00274 × 097263 = 01008

P (0 underweight) = 0.92764 = 0.7404

the probability that fewer than two bars are underweight is 0.841.

Implementation in R

In R, you can work with the normal distribution using several functions::
1. dnorm: Probability Density Function (PDF) - gives the height of the probability
distribution at each point for a given mean and standard deviation.
2. pCumulative Distribution Function (CDF) - calculates the probability that a
normally distributed random variable will be less than or equal to a given value.
3. qnorm: Quantile Function - finds the quantile (the inverse of the CDF) for a
given probability.
4. rnorm: Generates random numbers from the normal distribution.

# D e f i n e t h e p a r a m e t e r s f o r t h e normal d i s t r i b u t i o n
mean <− 0 # Mean ( )
sd <− 1 # Standard D e v i a t i o n ( )

# dnorm : C a l c u l a t e t h e d e n s i t y ( h e i g h t o f t h e p r o b a b i l i t y
d i s t r i b u t i o n ) at a s p e c i f i c value
134 Chapter 5. Special Distribution

# For example , d e n s i t y a t v a l u e 1
v a l u e <− 1
density_a t_v a l u e <− dnorm( value , mean = mean, sd = sd )
print ( paste ( " D e n s i t y ␣ a t ␣ v a l u e " , value , " : " , density_a t_v a l u e )
)

# pnorm : C a l c u l a t e t h e c u m u l a t i v e p r o b a b i l i t y up t o a
s p e c i f i c value
# For example , p r o b a b i l i t y o f b e i n g l e s s than or e q u a l t o 1
cum_prob_up_t o_v a l u e <− pnorm( value , mean = mean, sd = sd )
print ( paste ( " Cumulative ␣ p r o b a b i l i t y ␣up␣ t o ␣ v a l u e " , value , " : " ,
cum_prob_up_t o_v a l u e ) )

# qnorm : Determine t h e q u a n t i l e f o r a g i v e n c u m u l a t i v e
probability
# For example , f i n d i n g t h e v a l u e a s s o c i a t e d w i t h a 50%
cumulative p r o b a b i l i t y
quantile_50_p e r c e n t <− qnorm ( 0 . 5 , mean = mean, sd = sd )
print ( paste ( " Value ␣ a t ␣50%␣ c u m u l a t i v e ␣ p r o b a b i l i t y : " , quantile_
50_p e r c e n t ) )

# rnorm : Generate random s a m p l e s from t h e normal d i s t r i b u t i o n


# For example , g e n e r a t i n g 15 random s a m p l e s
num_samples <− 15
random_samples <− rnorm(num_samples , mean = mean, sd = sd )
print ( paste ( " 15 ␣random␣ samples ␣ from ␣ normal ␣ d i s t r i b u t i o n : " ,
5.1 Introduction 135

t o S t r i n g ( random_samples ) ) )

5.1.4 Uniform Distribution

Suppose that a continuous random variable X can assume values in a bounded interval
only, say the open interval (a, b), and suppose the p.d.f. of X is given as

1
f (x; a, b) = f (x) = ,a < x < b
b−a
= 0, elsewhere.

This distribution is referred to as the Uniform or Rectangular Distribution on the


interval (a, b) and is simply written as X ∼ U (a, b), where ‘a‘ and ‘b‘ are the parameters
of the distribution. It provides a probability model for selecting a point at random
from the interval (a, b).
Properties
a+b
Mean µ = 2
(b−a)2
Variance σ 2 = 12
q
(b−a)2
Standard Deviation σ = 12

R The hardness of a certain alloy (measured on Rockwell scale) is a random


variable X. Assume that X ∼ U [50, 75].

a) Find P [60 < X < 70]


b) Find E(X)
c) Find Var(X)

Solution:
136 Chapter 5. Special Distribution

Z70
1
a) P [60 < X < 70] = dx
b−a
60
1
= [x]70
75 − 50 60

2
=
5

Z75
1 125
b) E(X) = xdx =
b−a 2
50

Or
b + a 75 + 50 125
E(X) = = =
2 2 2

Z75
1 125 2 625
c) V ar(X) = E(X ) − E (X) =
2 2
x2 dx − =
25 2 12
50

Or
2
b−a 625
V ar(X) = =
12 12

5.2 The Gamma Distribution

The Gamma distribution arises in the study of waiting times, for example, in the
lifetime of devices. It is also useful in modeling many nonnegative continuous variables.
The gamma distribution requires the knowledge of the gamma function.
Definition:
5.2 The Gamma Distribution 137

The gamma function is defined by

Z∞
Γ(α) = xα−1 e−xdx (5.2.3)
0

for α > 0 and for all x > 0


Γ(α) is "gamma function of α Integrating by parts with u = xα−1 and dv = e−x dx, we
obtain

Z∞
Γ(α) = −e−x xα−1 |∞
0 + (α − 1)xα−2 e−x dx
0
Z∞
= (α − 1) x2 e−x dx
0

Which yields the recursion formula

Γ(α) = (α − 1)Γ(α − 1)

Repeated application of the recursion formula gives

Γ(α) = (α − 1)(α − 2)Γ(α − 2)

= (α − 1)(α − 2)(α − 3)Γ(α − 3)

and so forth. Note that when α = n, where n is a positive integer,

Γ(n) = (n − 1)(n − 2)(n − 3) ⊠ ⊠ ⊠ Γ(1) = (n − 1)!


138 Chapter 5. Special Distribution

5.2.1 General Gamma Distribution

The continuous random variable has a gamma distribution, with parameters α β and
if its density function is given by:


x
1
xα−1 e− β

x>0


 β α Γ(α)
f (x) = 

0

elsewhere

where α > 0 and β > 0.


Property of Gamma Distribution

a) E(X) = αβ
b) Var(X) = αβ 2

Implementation in R

In R, the Gamma distribution can be computed as follows:

# D e f i n e t h e p a r a m e t e r s f o r t h e gamma d i s t r i b u t i o n
shape <− 2 # Shape parameter ( or k )
s c a l e <− 3 # S c a l e parameter ( or )

# dgamma : C a l c u l a t e t h e d e n s i t y ( h e i g h t o f t h e p r o b a b i l i t y
d i s t r i b u t i o n ) at a s p e c i f i c value
# For example , d e n s i t y a t v a l u e 5
v a l u e <− 5
density_a t_v a l u e <− dgamma( value , shape = shape , s c a l e =
scale )
print ( paste ( " D e n s i t y ␣ a t ␣ v a l u e " , value , " : " , density_a t_v a l u e )
)
5.2 The Gamma Distribution 139

# pgamma : C a l c u l a t e t h e c u m u l a t i v e p r o b a b i l i t y up t o a
s p e c i f i c value
# For example , p r o b a b i l i t y o f b e i n g l e s s than or e q u a l t o 5
cum_prob_up_t o_v a l u e <− pgamma( value , shape = shape , s c a l e =
scale )
print ( paste ( " Cumulative ␣ p r o b a b i l i t y ␣up␣ t o ␣ v a l u e " , value , " : " ,
cum_prob_up_t o_v a l u e ) )

# qgamma : Determine t h e q u a n t i l e f o r a g i v e n c u m u l a t i v e
probability
# For example , f i n d i n g t h e v a l u e a s s o c i a t e d w i t h a 50%
cumulative p r o b a b i l i t y
quantile_50_p e r c e n t <− qgamma( 0 . 5 , shape = shape , s c a l e =
scale )
print ( paste ( " Value ␣ a t ␣50%␣ c u m u l a t i v e ␣ p r o b a b i l i t y : " , quantile_
50_p e r c e n t ) )

# rgamma : Generate random s a m p l e s from t h e gamma d i s t r i b u t i o n


# For example , g e n e r a t i n g 15 random s a m p l e s
num_samples <− 15
random_samples <− rgamma(num_samples , shape = shape , s c a l e =
scale )
print ( paste ( " 15 ␣random␣ samples ␣ from ␣gamma␣ d i s t r i b u t i o n : " ,
t o S t r i n g ( random_samples ) ) )
140 Chapter 5. Special Distribution

5.3 Exponential Distribution

An exponential distribution is a special gamma distribution for which α = 1. It has


many applications in the field of statistics, particularly in the areas of reliability theory
and waiting times or queueing problems. It is a continuous distribution that can be
related to the Poisson distribution in the discrete sense.
Definition:
A continuous random variable has an exponential distribution with parameter if its
density function is given by:


1− x
 β1 e β

x>0


f (x) = where β > 0. (5.3.4)


0

elsewhere

Properties
Mean E(X) = θ
Variance V ar(X) = θ2
Standard Deviation σ = θ
1
So if λ is the mean of changes in the unit interval, then θ = λ is the mean waiting
time for the first change.

Let the p.d.f of X be f (x) = 1


⊠ , 0≤x<∞
 x/2
R 2

i What is the mean and variance of X?


ii Calculate P(X > 3)
iii Calculate P (X > 5|X > 2)
iv Calculate P(X < 2)

Solution:
5.3 Exponential Distribution 141

i. E(X) = θ = 2 and V ar(X) = θ2 = 4


∞ x
ii. P (X > 3) = 1 R − 52
2 ⊠ dx = ⊠ = 0.2231
2
3
R∞ x
1
2 ⊠− 2 3
3
5
iii. P (X > 5|X > 2) = R∞ x
= ⊠2
⊠−2/2
= ⊠− 2 = 0.2231
1
2 ⊠− 2
2
2
iv. P(X < 2) = 1R − x2
⊠ = 1 − ⊠−1 = 0.6321
2
0

Implementation in R

In R, you can work with the exponential distribution using several functions as stated
in codes snippet:

# D e f i n e t h e parameter f o r t h e e x p o n e n t i a l d i s t r i b u t i o n
r a t e <− 0 . 5 # Rate parameter ( )

# dexp : C a l c u l a t e t h e d e n s i t y ( h e i g h t o f t h e p r o b a b i l i t y
d i s t r i b u t i o n ) at a s p e c i f i c value
# For example , d e n s i t y a t v a l u e 2
v a l u e <− 2
density_a t_v a l u e <− dexp ( value , r a t e = r a t e )
print ( paste ( " D e n s i t y ␣ a t ␣ v a l u e " , value , " : " , density_a t_v a l u e )
)

# pexp : C a l c u l a t e t h e c u m u l a t i v e p r o b a b i l i t y up t o a s p e c i f i c
value
# For example , p r o b a b i l i t y o f b e i n g l e s s than or e q u a l t o 2
cum_prob_up_t o_v a l u e <− pexp ( value , r a t e = r a t e )
print ( paste ( " Cumulative ␣ p r o b a b i l i t y ␣up␣ t o ␣ v a l u e " , value , " : " ,
cum_prob_up_t o_v a l u e ) )
142 Chapter 5. Special Distribution

# q e x p : Determine t h e q u a n t i l e f o r a g i v e n c u m u l a t i v e
probability
# For example , f i n d i n g t h e v a l u e a s s o c i a t e d w i t h a 50%
cumulative p r o b a b i l i t y
quantile_50_p e r c e n t <− qexp ( 0 . 5 , r a t e = r a t e )
print ( paste ( " Value ␣ a t ␣50%␣ c u m u l a t i v e ␣ p r o b a b i l i t y : " , quantile_
50_p e r c e n t ) )

# r e x p : Generate random s a m p l e s from t h e e x p o n e n t i a l


distribution
# For example , g e n e r a t i n g 15 random s a m p l e s
num_samples <− 15
random_samples <− rexp (num_samples , r a t e = r a t e )
print ( paste ( " 15 ␣random␣ samples ␣ from ␣ e x p o n e n t i a l ␣ d i s t r i b u t i o n :
" , t o S t r i n g ( random_samples ) ) )

5.3.1 Mathematical Expectations

A very important concept in probability and statistics is that of mathematical expec-


tation, expected value or briefly expectation of a random variable. The expectation
of X is very often called the mean of X and is denoted by µx or simply µ when a
particular random variable is understood. This expected value of x gives a simple
value, which acts as a representative, or average of the value of x and for this reason
it is often called a measure of central tendency. Consider that the random variable x
5.3 Exponential Distribution 143

has the values x1 , x2 , x3 , . . . . The mean or expected value of x is:

X
µ = E(x) = xf (x) f or discrete case

Z ∞
µ = E(x) = f (x)dx f or continuous case
−∞

5.3.2 Post-Test

1. Let X have an exponential distribution with a mean of θ = 20. Compute

i P (10 < X < 30)


ii P (0 < X < 30)
iii P (X > 30)
iv P (X > 40|X > 10

2. Telephone calls enter a college switchboard according to a Poisson process on


the average of two every 3 minutes. Let X denote the waiting time until the
first call that arrives after 10 A.M.

i What is the p.d.f. of X?


ii Find P (X > 2)

3. Customers arrive randomly at a bank teller’s window. Given that one customer
arrived during a particular 10-minute period, let X equal the time within the 10
minutes that the customer arrived. If X is U(0, 10), find

i The p.d.f of X
ii P (X ≥ 8)
iii P (2 ≤ X < 8)
iv E(X)
v Var(X)

4. Explain the relationship that exists between the Poisson and the Exponential
distributions.
144 Chapter 5. Special Distribution

5. If X is N (75, 100), find P < 35 and P (70 < X < 100)


6. If Z is N (0, 1), find values of c such that
i P (Z ≥ c) = 0.025
ii P (|Z| ≤ c) = 0.95
iii P(Z > c) = 0.05
7. Let X be N (µ, σ 2 ), so that P (X < 89) = 0.90 and P (X < 94) = 0.95. Find µ
and σ 2
8. Show that the random variable Z = (X ∼ µ)/σ is distributed N(0,1).
9. Suppose that Z ∼ N (0, 1). Find the following probabilities:
i P (Z ≤ 1.53
ii P (Z > −0.48)
iii P (0.35 < Z < 2.01)
iv P(|Z| > 1.28)
10. Find the value of ‘a‘ and ‘b‘ such that
• P (Z ≤ a) = 0.648
• P (|Z| ≤ b) = 0.95
6. Estimations

6.0.1 Introduction

The basic reasons for the need to estimate population parameters from sample
information is that it is ordinarily too expensive or simply infeasible to enumerate
complete populations to obtain the required information. The cost of complete
censuses may be prohibitive in finite populations while complete enumerations are
impossible in the case of infinite populations. Hence, estimation procedures are useful
in providing the means of obtaining estimates of population parameters with desired
degree of precision. We now consider estimation, the first of the two general areas
of statistical inference. The second general area is hypothesis testing which will be
examined later. The subject of estimation is concerned with the methods by which
population characteristics are measured from sample information. The objectives are
to present:

1. properties for judging how well a given sample statistic estimates the parent
population parameter.
2. several methods for estimating these parameters.
146 Chapter 6. Estimations

There are basically two types of estimation: point estimation and interval estimation.
In point estimation, a single sample statistic, such as X̄, s , or p is calculated from the
sample to provide a best estimate of the true value of the corresponding population
parameter such as µ, σ or p . Such a statistic is termed a point estimator. The
function or rule that is used to estimate the value of a parameter is called an estimator.
An estimate is a particular value calculated from a particular sample of observations.
On the other hand, an interval estimate consists of two numerical values defining an
interval which, with varying degrees of confidence, we feel includes the parameter
being estimated.

6.0.2 Properties of a Point Estimator

Unbiasedness:
If the expected value or mean of all possible values of a statistic over all possible
samples is equal to the population parameter being estimated, the sample statistic
is said to be unbiased. That is, if the expected value of an estimator is equal to the
corresponding population parameter, the estimator is unbiased

R The sample mean is an unbiased estimator of the population mean

n
1X
E(X̄) = E( xi ) = µ (6.0.1)
n i=1

Efficiency:
The most efficient estimator among a group of unbiased estimators is the one with
the smallest variance. This concept refers to the sampling variability of an estimator.

R
147

Consistency:
An estimator is consistent if as the sample size increases, the probability increases that
the estimator will approach the true value of the population parameter. Alternatively,
an estimator is consistent if it satisfies the following conditions:

1. V ar(θ̄) −→ 0 as n → ∞
2. becomes unbiased as n → ∞

6.0.3 Interval Estimation

For most practical purposes, it would not suffice to have merely a single value estimate
of a population parameter. Any single point estimate will be either right or wrong.
Therefore, instead of obtaining only a single estimate of a population parameter,
it would certainly seem to extremely useful and perhaps necessary to obtain two
estimators, say X̄1 andX̄2 , and say with some confidence that the interval between
X̄1 and X̄2 includes the true mean µ . Thus, an interval estimate of a population
parameter θ is a statement of two values between which it is estimated that the
parameter lies. We shall be discussing the construction of confidence intervals as a
means of interval estimation. The confidence we have that a population parameter, θ ,
will fall within some confidence interval will equal (1 − α) , where α is the probability
that the interval does not contain θ (i.e. the probability , is an allowance for error).
To construct a 95% confidence interval α = 0.05 . That is, the probability is 0.05 that
the value θ will not lie within the interval.
Note that,

α + (conf idence interval) = 1

The larger the confidence interval, the smaller the probability of error α for the interval
estimator
148 Chapter 6. Estimations

Confidence Interval For µ and (σ) unknown

A confidence interval is constructed on the basis of sample information. It also depends


on the size of . Assume the population variance σ 2 is known and the population is
normal, then 100(1 − α)% the percent C.I for µ is given by

! !
σ σ
X̄ − Za/2 √ ≤ µ ≤ X̄ + Za/2 √ (6.0.2)
n n

Simply written as
!
σ
X̄ − Za/2 √ (6.0.3)
n

where Za/2 is the Z value representing an area a/2 to the right and left tails of the
standard normal probability distribution.

R The yield of a chemical process is being studied. From previous experience


yield is known to be normally distributed and . The past five days of plant
operation have resulted in the following percent yields: 91.6, 88.75, 90.8, 89.95,
and 91.3. Find a 95% two-sided confidence interval on the true mean yield.
Solution
n = 5, σ = 3, x̄ = 90.48, Za/2 = Z0.025 = 1.96

3 3
   
90.48 − Z0.025 √ ≤ µ ≤ 90.48 + Z0.025 √
5 5
90.48 − 1.96(1.3416) ≤ µ ≤ 90.48 + 1.96(1.3416)

87.8505 ≤ µ ≤ 93.1095

R A manufacturer produces piston rings for an automobile engine. It is known


that ring diameter is normally distributed with millimeters. A random sample
of 15 rings has a mean diameter of millimeters.
149

1. Construct a 99% two-sided confidence interval on the mean piston ring


diameter.
2. Construct a 95% confidence interval on the mean piston ring diameter.

Solution

n = 15, σ = 0.001, x̄ = 74.036, Zα/2 = Z0.005 = 2.575, Zα/2 = Z0 .025 = 1.96

0.001 0.001
   
(a) 74.036 − Z0.005 √ ≤ µ ≤ 74.036 + Z0.005 √
15 15
74.036 − 2.575(0.000258) ≤ µ ≤ 74.036 − 2.575(0.000258)

74.036 − 0.000665 ≤ µ ≤ 74.036 + 0.000665

74.0353 ≤ µ ≤ 74.0367
0.001 0.001
   
(b) 74.036 − Z0.025 √ ≤ µ ≤ 74.036 + Z0.025 √
15 15
74.036 − 0.00051 ≤ µ ≤ 74.036 + 0.00051

74.0355 ≤ µ ≤ 74.0365

R ASTM Standard E23 defines standard test methods for notched bar impact
testing of metallic materials. The Charpy V-notch (CVN) technique mea-
sures impact energy and is often used to determine whether or not a material
experiences a ductile-to-brittle transition with decreasing temperature. Ten
measurements of impact energy (J) on specimens of A238 steel cut at 60ºC are
as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, and 64.3. Assume
that impact energy is normally distributed with. We want to find a 95% CI for,
the mean impact energy. The resulting 95% CI?

Solution:
150 Chapter 6. Estimations

σ σ
   
x̄ − Zα/2 √ ≤ µ ≤ x̄ + Zα/2 √
n n
1 1
   
64.46 − Z0.025 √ ≤ µ ≤ 64.46 + Z0.025 √
10 10
1 1
   
64.46 − 1.96 √ ≤ µ ≤ 64.46 + 1.96 √
10 10
64.46 − 0.6198 ≤ µ ≤ 64.46 + 0.6198

63.84 ≤ µ ≤ 65.08

That is, based on the sample data, a range of highly plausible values for mean
impact energy for A238 steel at 60°C is 63.84J ≤ µ ≤ 65.08J.

Exercise:

1. A confidence interval estimate is desired for the gain in a circuit on


a semiconductor device. Assume that gain is normally distributed
with standard deviation σ.

(a) Find a 95% CI for µ when n = 10 and x̄ = 1000.


(b) Find a 95% CI for µ when n = 25 and x̄ = 1000.
(c) Find a 99% CI for µ when n = 10 and x̄ = 1000.
(d) Find a 99% CI for µ when n = 25 and x̄ = 1000.

2. A civil engineer is analyzing the compressive strength of concrete.


Compressive strength is normally distributed with σ 2 = 100 (psi)2 . A
random sample of 12 specimens has a mean compressive strength of
x̄ = 3250psi.

(a) Construct a 95% two-sided confidence interval on mean compres-


sive strength.
151

(b) Construct a 99% two-sided confidence interval on mean compres-


sive strength. Compare the width of this confidence interval with
the width of the one found in part

Confidence Interval For µ(σ unknown/n ≥ 30)

In practice, the standard deviation σ of a population , is not likely to be known.


When σ is unknown and n is 30 or more, we proceed as before and estimate σ with
the sample standard deviation s. the resulting 1 − α large sample confidence interval
for becomes
! !
s s
X̄ − Za/2 √ ≤ µ ≤ X̄ + Za/2 √ (6.0.4)
n n

R 1. A sample of 40 ten-year-old girls gave a mean weight of 71.5 and standard


deviation of 12 pounds respectively. Assuming normality, find the

1. 90% confidence interval for µ.


2. 95% confidence interval for µ.
3. 99% confidence interval for µ.

R A hospital administrator took a sample of 45 overdue accounts from which he


computed a mean of $250 and a standard deviation of $75. Assuming that the
amounts of all overdue accounts are normally distributed. Find the

1. 90% confidence interval for µ.


2. 95% confidence interval for µ.
3. The 99% confidence interval for µ.

Confidence Interval For µ(σ unknown /n < 30)

When the is not known and the sample size is small, the procedure for interval
estimation of population mean is based on a probability distribution known as the
152 Chapter 6. Estimations

student t-distribution. When the population variance is unknown, and the sample
size is small, the correct distribution for constructing a confidence interval for is the
t-distribution. Here, an estimate s must be calculated from the sample to substitute
for the unknown population standard deviation. The t-distribution is used such that

X̄ − µ
t= √
s/ n

where v
u (Xi − X̄)2
uP
s=t (6.0.5)
(n − 1)

The t-distribution is based on the assumption that the population is normal. A


100(1 − α)% CI for the population mean, with the population normal and unknown is
given by
! !
s s
x̄ − ta/2,v √ ≤ µ ≤ x̄ + ta/2,v √ (6.0.6)
n n

where v = n − 1. Notice that a requirement for the valid use of the t-distribution is
that the sample must be drawn from a normal distribution.

R A sample of 25 ten-year-old boys yielded a mean weight and standard deviation


of 73 and 10 pounds respectively. Assuming a normally distributed population,
find 90, 95 and 99 percent confidence intervals for the mean of the population
from which the sample came.
Solution: n = 25, x̄ = 73 and s = 10

10
 
x̄ ± ta/2 √
25
x̄ ± t0.05 (2)

75 ± 1.711 (2)

(69.578, 76.422)
153

6.0.4 Confidence Interval For A Population Proportion

It is often necessary to construct confidence intervals on a population proportion.


For example, suppose that a random sample of size n has been taken from a large
(possibly infinite) population and that X(≤ n) observations in this sample belong
to a class of interest. Then P̄ = X/n is a point estimator of the proportion of the
population p that belongs to this class. Note that n and p are the parameters of a
binomial distribution. Furthermore, we know that the sampling distribution of P̄ is
approximately normal with mean p and variance p(1 − p)/n if p is not too close to
either 0 or 1 and if n is relatively large. Typically, to apply this approximation we
require np that and n(1 − p) be greater than or equal to 5. We will make use of the
normal approximation in this regard.
Definition:
If n is large, the distribution of

X − np P̄ − p
Z=q =q (6.0.7)
p(1−p)
np(1 − p) n
154 Chapter 6. Estimations

is approximately standard normal. The 100(1 − α)% CI for p then given by;

s s
P̄ (1 − P̄ ) P̄ (1 − P̄ )
P̄ − Za/2 ≤ p ≤ P̄ + Za/2 (6.0.8)
n n

This procedure depends on the adequacy of the normal approximation to the


binomial. To be reasonably conservative, this requires that np and n(1 − p) be
greater than or equal to 5. In situations where this approximation is inappropriate,
particularly in cases where n is small, other methods must be used.
Examples

R A manufacturer of electronic calculators is interested in estimating the fraction


of defective units produced. A random sample of 800 calculators contains 10
defectives. Compute a 99% confidence interval on the fraction defective.
Solution:
n = 800, x = 10, p̄ = x/n = 0.0125, np̄ = 10 and n(1 − p) = 790

s s
0.0125(1 − 0.0125) 0.0125(1 − 0.0125)
0.0125 − Z0.05 ≤ p ≤ 0.0125 + Za/2
800 800
0.0125 ± 2.575(0.003928)

0.0125 ± 0.0101 = (0.0025, 0.0226)

R Of 1000 randomly selected cases of lung cancer, 823 resulted in death within
10 years. Construct a 95% confidence interval on the death rate from lung cancer.

Solution:

n = 100, x = 823, p̄ = x/n = 0.823np̄ = 823, n(1 − p̄) = 177


155
s s
p̄(1 − p̄ p̄(1 − p̄
p̄ − Zα/2 ≤ p ≤ p̄ + Zα/2
n n
s s
0.823(1 − 0.823 0.823(1 − 0.823
0.823 − Z0.05 ≤ p ≤ 0.823 + Zα/2
1000 1000

0.823 ± 1.96(0.0121)

0.823 ± 0.0237

(0.7993, 0.8467)

Exercice:

i. A random sample of 50 suspension helmets used by motorcycle riders


and automobile race-car drivers was subjected to an impact test,
and on 18 of these helmets some damage was observed. Find a 95%
confidence interval on the true proportion of helmets of this type
that would show damage from this test.
ii. In a random sample of 85 automobile engine crankshaft bearings, 10
have a surface finish that is rougher than the specifications allow.
Construct the 95% confidence interval for the proportion of bearings
in the population that exceeds the roughness specification.
iii. A survey was conducted to study the dental health practices and
attitudes of a certain urban adult population. Of 300 adults inter-
viewed, 123 said that they regularly had a dental checkup twice a
year. Obtain a 95% CI for p, based on these data.

Confidence Interval for the Difference Between Two Population Means


(Variances Known)

There are instances where we are interested in estimating the difference between
156 Chapter 6. Estimations

two population means. Here, from each of the populations a sample is drawn and
from the data of each, the sample means x1 and x2 respectively, are computed. The
estimator x1 − x2 yields an unbiased estimate of µ1 − µ2 , the difference between the
population means. The quantity

X̄1 − X̄2 − (µ1 − µ2 )


r (6.0.9)
σ12 σ22
n1 + n2

has N (0, 1) distribution.


The 100(1 − α) % confidence interval for µ1 − µ2 is given by,

v 
u 2
t σ1
u σ22 
X̄1 − X̄2 ± Za/2  + (6.0.10)
n1 n2

for large sample sizes n1 and n2 respectively.

Exercise:

i Tensile strength tests were performed on two different grades of


aluminum spars used in manufacturing the wing of a commercial
transport aircraft. From past experience with the spar manufacturing
process and the testing procedure, the standard deviations of tensile
strengths are assumed to be known. The data obtained are as follows:
n1 = 10 x¯1 = 87.6,σ1 = 1 n1 = 12 x¯1 = 84.5 and σ2 = 1.5. If µ1 and
µ2 denote the true mean tensile strengths for the two grades of spars,
find a 90% confidence interval on the difference in mean strength
µ1 − µ2 .
ii Two machines are used for filling plastic bottles with a net volume of
16.0 ounces. The fill volume can be assumed normal, with standard
157

deviation σ1 = 0.020 and σ2 = 0.025 ounces. A member of the quality


engineering staff suspects that both machines fill to the same mean
net volume, whether or not this volume is 16.0 ounces. A random
sample of 10 bottles is taken from the output of each machine.

Machine 1 Machine 2
16.03 16.01 16.02 16.03
16.04 15.96 15.97 16.04
16.05 15.98 15.96 16.02
16.05 16.02 16.01 16.01
16.02 15.99 15.99 16.00

Find a 95% confidence interval for the difference in means.


iii Two different formulations of an oxygenated motor fuel are being
tested to study their road octane numbers. The variance of road
octane number for formulation 1 is σ12 = 1.5 and for formulation 2
it is σ22 = 1.2. Two random samples of size n1 = 15 and n2 = 20 are
tested, and the mean road octane numbers observed are x¯1 = 89.6 and
x¯2 = 92.5. Assume normality. Construct a 95% confidence interval
on the difference in mean road octane number.

6.0.5 Confidence Interval For The Difference Between Two Population Proportions

The magnitude of the difference between two population proportions is often of interest.
An unbiased point estimator of the difference in two populations is provided by the
difference in the sample proportions, p¯1 − p¯2 . When n1 and n2 are large and the
population proportions are not too close to 0 or 1, the central limit theorem applies

P¯2 − P¯2 − (p1 − p2 )


Z=r
p1 (1−p1 )
n1 + p2 (1−p
n2
2)

where is a standard normal random variable and thus, normal distribution theory
may be employed to obtain confidence intervals. A 100(1 − α)% confidence interval
158 Chapter 6. Estimations

for p1 − p2 is given by

s s
p¯1 (1 − p¯1 p¯2 (1 − p¯2 ) p¯1 (1 − p¯1 ) p¯2 (1 − p¯2 )
[p¯1 − p¯2 ]−Zα/2 + ≤ p1 −p2 ≤ [p¯1 − p¯2 ]+Zα/2 +
n1 n2 n1 n2

Examples

R Two different types of injection-molding machines are used to form plastic parts.
A part is considered defective if it has excessive shrinkage or is discolored. Two
random samples, each of size 300, are selected, and 15 defective parts are found
in the sample from machine 1 while 8 defective parts are found in the sample
from machine 2. Construct a 95% confidence interval on the difference in the
two fractions defective.

R Two hundred patients suffering from a certain disease were randomly divided
into two equal groups. Of the first group, who received the standard treatment,
78 recovered within three days. Out of the other 100, who were treated by a
new method, 90 recovered within three days. The physician wished to estimate
the true difference in the proportions who would recover within three days.
Find 95% CI for p1 − p2 .

R In a study designed to assess the side effects of two drugs, 50 animals were
given Drug A and 50 animals were given Drug B. Of the 50 receiving Drug A,
11 showed undesirable side effects, while 8 of those receiving Drug B reacted
similarly. Find the 90 and 95 percent confidence intervals for PA − PB .
159

6.0.6 Confidence Intervals for Unknown Means in R

Results from an Experiment on Plant Growth. The PlantGrowth data frame gives
the results of an experiment to measure plant yield (as measured by the weight of the
plant). We would like to a 95% confidence interval for the mean weight of the plants.
Suppose that we know from prior research that the true population standard deviation
of the plant weights is 0.7 g. The parameter of interest is µ, which represents the true
mean weight of the population of all plants of the particular species in the study. We
will first take a look at a stemplot in R of the data:

> l i b ra ry ( a p l p a c k )
> with ( PlantGrowth , stem . l e a f ( weight ) )
1 | 2: represents 1.2
l e a f unit : 0.1
n : 30
1 f | 5
s |
2 3. | 8
4 4∗ | 11
5 t | 3
8 f | 455
10 s | 66
13 4 . | 889
( 4 ) 5∗ | 1111
13 t | 2233
9 f | 555
s |
6 5 . | 88
160 Chapter 6. Estimations

4 6∗ | 011
1 t | 3

We use R to compute for the confidence interval.

> l i b ra ry ( TeachingDemos )
> temp <− with ( PlantGrowth , z . t e s t ( weight , s t d e v = 0 . 7 ) )
> temp
One Sample z−t e s t
data : weight
z = 3 9 . 6 9 4 2 , n = 3 0 . 0 0 0 , Std . Dev . = 0 . 7 0 0 , Std . Dev . o f th e
sample mean = 0 . 1 2 8 , p−v a l u e < 2 . 2 e −16
a l t e r n a t i v e h y p o t h e s i s : t r u e mean i s not equal t o 0
95 p e r c e n t c o n f i d e n c e i n t e r v a l :
4.822513 5.323487
sample e s t i m a t e s :
mean o f weight
5.073

The confidence interval bounds are shown in the sixth line down of the output. We
can make a plot with

> l i b ra ry (IPSUR)
> plot ( temp , " Conf " )

6.0.7 Implementation in R for Confidence Intervals for Proportions

> tab <− x t a b s ( ~ gender , data = RcmdrTestDrive )


> prop . t e s t ( rbind ( tab ) , c o n f . l e v e l = 0 . 9 5 , c o r r e c t = FALSE)
1−sample p r o p o r t i o n s t e s t without c o n t i n u i t y c o r r e c t i o n
161

data : rbind ( tab ) , null p r o b a b i l i t y 0 . 5


X−s q u a r e d = 2 . 8 8 1 , df = 1 , p−v a l u e = 0 . 0 8 9 6 3
a l t e r n a t i v e h y p o t h e s i s : t r u e p i s not equal t o 0 . 5
95 p e r c e n t c o n f i d e n c e i n t e r v a l :
0.4898844 0.6381406
sample e s t i m a t e s :
p
0.5654762
> A <− as . data . frame ( T i t a n i c )
> l i b ra ry ( r e s h a p e )
> B <− with (A, u n t a b l e (A, Freq ) )
7. Hypothesis Testing

Learning Objectives
Having worked through this chapter the student will be able to:

• Structure engineering decision-making problems as hypothesis tests.


• Test hypotheses on the mean of a normal distribution using either a Z-test or a
t-test procedure.
• Test hypotheses on the variance or standard deviation of a normal distribution.
• Test hypotheses on a population proportion.

7.1 Tests of Hypotheses and Significance

7.1.1 Introduction

We now discuss the subject of hypothesis testing, which as earlier noted is one of
the two basic classes of statistical inference. Testing of hypotheses involves using
statistical inference to test the validity of postulated values for population parameters.
If the hypothesis specifies the distribution completely it is called simple, otherwise
it is called composite. For example, a demographer interested in the mean age of
164 Chapter 7. Hypothesis Testing

residents in a certain local government area might pose a simple hypothesis such as
µ = 24 or he might specify a composite hypothesis as µ < 24 or µ > 24.
A statistical test is usually structured in terms of two mutually exclusive hypotheses
referred to as the null hypothesis and the alternative hypothesis denoted by H0 and
H1 respectively.
Two types of error occur in hypothesis testing; these are type I error and type II error.
Type I error occurs if H0 is rejected when it is true. The probability of a type I error
is the conditional probability, P(reject H0 |H0 is true) is denoted by α
Hence,

α = P (reject H0 |H0 is true) and

1 − α = P (accept H0 |H0 is true)

Type II error if H0 is accepted when it is false. Its probability is denoted by the


symbol, where β, where

β = P (accept H0 |H0 is f alse) and

1 − β = P (reject H0 |H0 is f alse) called power of the test

Types I and II error can be explained as follows:

H0 is true H0 is false
1−α β
Accept H0
(correct decision) (Type II errors)
α 1−β
Reject H0
(Type II errors) (correct decision)
7.1 Tests of Hypotheses and Significance 165

Standard format of hypothesis testing: this format involves five steps.


Step 1: State the null and alternative hypotheses.
Step 2: Determine the suitable test statistics.
This involves choosing the appropriate random variable to use in deciding to accept
or reject the null hypothesis.

Unknown Parameter H0 Appropriate Test Statistic

’µ’ X̄ − µ0
Z= √
σ/ n
σ known, population normal

’µ’
X̄−µ
Z= √0
s/ n
if n is ‘large‘ usually n ≥ 30
σ known, population normal
’µ’
X̄−µ
t= √ ,
s/ n
with (n-1) df
σ unknown, n small, population normal
’p’  
x/n −p0
population normal, n large Z= q
p̄(1−p̄)
n

Step 3: Determine the critical region using the cumulative distribution table for the
test statistic. The set of values that lead to the rejection of the null hypothesis is
called the critical region. A statistical test may be a one-tail or two-tail test. Whether
one uses a one- or two- tail test of significance depends upon how the alternative
hypothesis is formulated.

Types of Hypoth- Decision Rule H0


H0 H1
esis Rejected if
Z < Za/2 or
Two-tail µ = µ0 µ = µ0
Z < −Za/2
Right-tail µ ≤ µ0 µ > µ0 Z > Za/2
Left-tail µ ≥ µ0 µ < µ0 Z < −Za/2

Step 4: Compute the values of the test statistic based on the sample information,
166 Chapter 7. Hypothesis Testing

e.g. Ze , te , χ2e
Step 5: Make a statistical decision and interpretation. H0 is rejected if the computed
value of the test statistic falls in the critical region otherwise it is accepted.
Possible situation in testing a statistical hypothesis

Hypothesis is correct H0 Hypothesis is incorrect


Hypothesis is Accepted Correct decision Type II error β
Hypothesis is Rejected Type I error α Correct decision

Type I error: We reject a hypothesis when it should be accepted. P(reject Ho|Ho


true).
Type II error: We accept a hypothesis when it should be rejected. P(accepting
Ho|Ho false).

7.1.2 A Single Population Mean µ

We shall consider testing of hypothesis about a population mean under three different
conditions:

1 When sampling is from a normally distributed population with known variance.


2 When sampling is from a normally distributed population with unknown variance.
3 When sampling is from a population that is not normally distributed.

Sampling From Normally Distributed Populations: Population Variance Known

Examples

R A researcher is interested in the mean level of some enzyme in a certain popula-


tion. The data available to the researcher are the enzyme determinations made
on a sample of 10 individuals from the population of interest, and the sample
mean is 22. If the sample came from a population that is normally distributed
with a known variance, σ 2 = 45. Can the researcher conclude that the mean
7.1 Tests of Hypotheses and Significance 167

enzyme level in this population is different from 25? Take α = 0.05.

Solutions:
Step 1:

H0 : µ0 = 25

H1 : µ0 ̸= 25

Step 2:
X̄ − µ0
Z= √
σ/ n

since µ0 and σ are known. Step 3:

σ 2 = 45, n = 10, X̄ = 22

Step 4:

X̄ − µ0
Zc = √
σ/ n
22 − 25
= √ √
45/ 10
3
= −
2.1213
= −1.41

Step 5:
We are unable to reject the null hypothesis, since −1.42 > −1.96.

R Aircrew escape systems are powered by a solid propellant. The burning rate of
this propellant is an important product characteristic. Specifications require
that the mean burning rate must be 50 centimeters per second. We know that
the standard deviation of burning rate is σ = 2 centimeters per second. The
168 Chapter 7. Hypothesis Testing

experimenter decides to specify a type I error probability or significance level of


α = 0.05 and selects a random sample of and obtains a sample average burning
rate of X̄ = 51.3 centimeters per second. What conclusions should be drawn?
Solution:
Step 1:

H0 : µ0 = 50

H1 : µ0 ̸= 50

Step 2:

X̄ − µ0
Zc = √
σ/ n
51.3 − 50
= √
2/ 25
1.3
=
0.4
= 3.25

Conclusion: Since zc = 3.25 > 1.96, we reject H0 : µ0 = 50 at the 0.05 level of


significance. We conclude that the mean burning rate differs from 50 centimeters
per second, based on a sample of 25 measurements. The following does the test
in R.

i n s t a l l . packages ( "BSDA" )

# Load t h e BSDA p a c k a g e
l i b r a ry (BSDA)
7.1 Tests of Hypotheses and Significance 169

# Given v a l u e s
mu <− 50 # H y p o t h e s i z e d p o p u l a t i o n mean
sigma <− 2 # Known p o p u l a t i o n s t a n d a r d d e v i a t i o n
X_bar <− 5 1 . 3 # Sample mean
n <− 30 # Sample s i z e
a lp h a <− 0 . 0 5 # Significance level

# Conducting t h e one−sample z−t e s t


t e s t_r e s u l t <− z . t e s t ( x = X_bar , mu = mu, sigma . x = sigma , n .
x = n , a l t e r n a t i v e = " two . s i d e d " , c o n f . l e v e l = 1 − alpha )

# Output t h e t e s t r e s u l t
print ( t e s t_r e s u l t )

7.1.3 Tests on the Mean of a Normal Distribution: Variance Unknown

Test statistic is
X̄ − µ
t= √
s/ n

has a t distribution with n - 1 degrees of freedom

R A study revealed that the upper limit of the Normal Body Temperature of
males is 98.6. The body temperatures for 25 male subjects were taken and
recorded as follows: 97.8, 97.2, 97.4, 97.6, 97.8, 97.9, 98.0, 98.0, 98.0, 98.1, 98.2,
98.3, 98.3, 98.4, 98.4, 98.4, 98.5, 98.6, 98.6, 98.7, 98.8, 98.8, 98.9, 98.9 and 99.0.
Test the hypothesis H0 : µ0 = 98.6 versus H1 : µ0 ̸= 98.6, using α = 0.05

R Nine patients suffering from the same physical handicap, but otherwise com-
parable were asked to perform a certain task as part of an experiment. The
170 Chapter 7. Hypothesis Testing

average time required to perform the task was seven minutes with a standard
deviation of two minutes. Assuming normality, can we conclude that the true
mean time required to perform the task by this type of patient is at least ten
minutes?

R The increased availability of light materials with high strength has revolutionized
the design and manufacture of golf clubs, particularly drivers. Clubs with hollow
heads and very thin faces can result in much longer tee shots, especially for
players of modest skills. This is due partly to the “spring-like effect” that
the thin face imparts to the ball. Firing a golf ball at the head of the club
and measuring the ratio of the outgoing velocity of the ball to the incoming
velocity can quantify this spring-like effect. The ratio of velocities is called the
coefficient of restitution of the club. An experiment was performed in which
15 drivers produced by a particular club maker were selected at random and
their coefficients of restitution measured. In the experiment, the golf balls were
fired from an air cannon so that the incoming velocity and spin rate of the ball
could be precisely controlled. Determine if there is evidence (with α = 0.05)
to support a claim that the mean coefficient of restitution exceeds 0.82. The
observations are:

0.8411 0.8191 0.8182 0.8125 0.8750


0.8580 0.8532 0.8483 0.8276 0.7983
0.8042 0.8730 0.8282 0.8359 0.8660

The sample mean and sample standard deviation are X̄ = 0.83725 and s =
0.02456.

The following does the test in R.

Perform the one-sample t-test testr esult < −t.test(coef f icients, mu = mu0 , alternative =
”greater”)
7.1 Tests of Hypotheses and Significance 171

Output the test result print(testr esult)

Draw conclusions based on the p-value if (testr esultp.value < alpha) print("Reject
the null hypothesis: there is evidence that the mean coefficient of restitution exceeds
0.82.") else print("Do not reject the null hypothesis: there is not enough evidence to
conclude that the mean coefficient of restitution exceeds 0.82.")

7.1.4 Tests on a Population Proportion

It is often necessary to test hypotheses on a population proportion. For example,


suppose that a random sample of size n has been taken from a large population
and that X(≤ n)observations in this sample belong to a class having a particular
characteristic of interest. Then P̂ = X/n is a point estimator of the proportion of the
population p that belongs to this class. Note that n and p are the parameters of a
binomial distribution. Recall that the sampling distribution of p̂ is approximately
normal with mean p and variance p(1 − p)/n, if p is not too close to either 0 or 1 and
if n is relatively large.
In many engineering problems, we are concerned with a random variable that follows
the binomial distribution. For example, consider a production process that manufac-
tures items that are classified as either acceptable or defective. It is usually reasonable
to model the occurrence of defectives with the binomial distribution, where the bino-
mial parameter p represents the proportion of defective items produced. Consequently,
many engineering decision problems include hypothesis testing about p.

Considering testing
172 Chapter 7. Hypothesis Testing

H0 = p = p0

H1 = p ̸= p0

For large samples, the normal approximation to the binomial with the test statistic

X − np0 X/n − p0 P̄ − p0
Z=q =q =q
p0 (1−p0 ) p0 (1−p0 )
np0 (1 − p0 ) n n

may be used.
This presents the test statistic in terms of the sample proportion instead of that
number of items X in the sample that belongs to the class interest.

R In a study designed to assess the relationship between a certain drug and a


certain anomaly in chick embryos, 50 fertilized eggs were injected with the
drug on the fourth day of incubation. On the twentieth day of incubation the
embryos were examined and in 12 the presence of the abnormality was observed.
Test the null hypothesis that the drug causes abnormalities in not more than
20 percent of eggs into which it is introduced. Let α = 0.05.

The following does the test in R.


Conduct the binomial test testr esult < −binom.test(x, n, p = p0 , alternative =
”greater”)
Output the test result print(testr esult)
Draw conclusions based on the p-value if (testr esultp.value < alpha) print("Reject
the null hypothesis: There is evidence that the drug causes abnormalities in more
than 20 else print("Do not reject the null hypothesis: There is not enough evidence
to conclude that the drug causes abnormalities in more than 20
7.1 Tests of Hypotheses and Significance 173

R A manufacturer of intraocular lenses is qualifying a new grinding machine


and will qualify the machine if the percentage of polished lenses that contain
surface defects does not exceed 2%. A random sample of 250 lenses contains
six defective lenses. Formulate and test an appropriate set of hypotheses to
determine if the machine can be qualified. Use α = 0.05.

R A semiconductor manufacturer produces controllers used in automobile engine


applications. The customer requires that the process fallout or fraction defective
at a critical manufacturing step not exceed 0.05 and that the manufacturer
demonstrate process capability at this level of quality using α = 0.05. The
semiconductor manufacturer takes a random sample of 200 devices and finds
that four of them are defective. Test the null hypothesis that the process fallout
does not exceed 0.05.

7.1.5 7.5 The Difference Between two Population Means

Hypothesis testing involving the difference between two population means is most
frequently employed to determine whether or not it is reasonable to conclude that the
two are unequal. In such cases, one or other of the following hypotheses is tested:

H0 : µ1 = µ2 = 0 H1 : µ1 = µ2 ̸= 0
H0 : µ1 = µ2 ≥ 0 H1 : µ1 = µ2 < 0
H0 : µ1 = µ2 ≤ 0 H1 : µ1 = µ2 > 0

Hypothesis Tests for a Difference in Means, Variances Known


Test statistic is

X̄1 − X̄2 − (µ1 − µ2 )


Z= r
σ12 σ2
n1 + n22
174 Chapter 7. Hypothesis Testing

R In a large hospital for the treatment of the mentally retarded, a sample


of 12 individuals with mongolism yielded a mean serum uric acid value of
X̄1 = 4.4mg/100ml. In a general hospital, a sample of 15 normal individuals of
the same age and sex were found to have a mean value of X̄2 = 43.4mg/100ml.
If it is reasonable to assume that the two populations of values are normally
distributed with variances equal to 1. Do these data provide sufficient evidence
to indicate a difference in mean serum uric acid levels between normal individuals
and individuals with mongolism? Let α = 0.05.

Solution

n1 = 12 n2 = 15

σ12 = 1 σ22 = 1

x̄1 = 4.5 x̄2 = 3.4

H0 : µ1 = µ2 = 0

H1 : µ1 = µ2 ̸= 0

X̄1 − X̄2 − (µ1 − µ2 )


Zc = r
σ12 σ2
n1 + n22

X̄1 − X̄2 − (µ1 − µ2 )


Zc = r
σ12 σ2
n1 + n22

(4.5 − 3.4) − 0
= q
1
12 + 15
1
7.1 Tests of Hypotheses and Significance 175
1.1 1
=√ = = 2.84
0.15 0.3873

Reject H0 since 2.84 > 1.96 on the basis of these data, there is an indication
that the means are not equal.

R A product developer is interested in reducing the drying time of a primer


paint. Two formulations of the paint are tested; formulation 1 is the standard
chemistry, and formulation 2 has a new drying ingredient that should reduce
the drying time. From experience, it is known that the standard deviation of
drying time is 8 minutes, and this inherent variability should be unaffected by
the addition of the new ingredient. Ten specimens are painted with formulation
1, and another 10 specimens are painted with formulation 2; the 20 specimens
are painted in random order. The two sample average drying times are x̄1 = 121
minutes and x̄2 = 112 minutes, respectively. What conclusions can the product
developer draw about the effectiveness of the new ingredient, using α = 0.05?

Solution:

n1 = 10 n2 = 10

σ12 = 64 σ22 = 64

x̄1 = 121 x̄2 = 112

H0 : µ1 ≤ µ2 = 0

H1 : µ1 > µ2

X̄1 − X̄2 − (µ1 − µ2 )


Zc = r
σ12 σ2
n1 + n22
176 Chapter 7. Hypothesis Testing

(121 − 112) − 0
= q
64
10 + 64
10

9 9
=√ = = 2.52
12.8 3.5777

Conclusion: Reject H0 .

R Exercises

i Two machines are used for filling plastic bottles with a net volume of
16.0 ounces. The fill volume can be assumed normal, with standard
deviation α1 = 0.020 and α2 = 0.025 ounces. A member of the quality
engineering staff suspects that both machines fill to the same mean net
volume, whether or not this volume is 16.0 ounces. A random sample of
10 bottles is taken from the output of each machine.

Machine 1 Machine 2
16.03 16.01 16.02 16.03
16.04 15.96 15.97 16.04
16.05 15.98 15.96 16.03
16.05 16.02 16.01 16.01
16.02 15.99 15.99 16.00

Do you think the engineer is correct? Use. α = 0.05


ii Two different formulations of an oxygenated motor fuel are being tested
to study their road octane numbers. The variance of road octane number
for formulation 1 is σ12 , and for formulation 2 it is σ22 . Two random
samples of size n1 = 15 and n2 = 20 are tested, and the mean road octane
numbers observed are x̄1 = 89.6 and x̄2 = 92.5. Assume normality, and if
formulation 2 produces a higher road octane number than formulation
1, the manufacturer would like to detect it. Formulate and test an
appropriate hypothesis, using α = 0.05.
7.1 Tests of Hypotheses and Significance 177

Hypothesis Tests for a Difference in Means, Variances unknown but As-


sumed Equal.
We now extend the results of the previous lecture to the difference in means of the
two distributions when the variances of both distributions σ12 and σ22 are unknown. If
the sample sizes n1 and n2 exceed 30, the normal distribution procedures could be
used. However, when small samples are taken, we will assume that the populations
are normally distributed and base our hypothesis tests on the t distribution. This
nicely parallels the case of inference on the mean of a single sample with unknown
variance.

The normality assumption is required to develop the test procedure, but moderate
departures from normality do not adversely affect the procedure. Two different sit-
uations must be treated. In the first case, we assume that the variances of the two
normal distributions are unknown but equal; that is, σ12 = σ22 = σ 2 . In the second, we
assume that and are unknown and not necessarily equal. The test statistic is

X̄1 − X̄2 − (µ1 − µ2 ) X̄1 − X̄2 − (µ1 − µ2 )


tc = r = q
1
with n1 + n2 − 2df
Sp2
+
Sp2 Sp n1 + n12
n1 n2

The two sample variances are combined to form an estimator of σ 2 . The pooled
estimator of σ 2 is defined as follows.

(n1 − 1)S12 + (n2 − 1)S22


Sp2 =
n1 + n2 − 2

Examples

R The diameter of steel rods manufactured on two different extrusion machines is


being investigated. Two random samples of sizes n1 = 15 and n2 = 17 are selected,
and the sample means and sample variances are x̄1 = 8.73,s21 = 0.35,x̄2 = 8.68,
178 Chapter 7. Hypothesis Testing

and s21 = 0.40, respectively. Assume that σ12 = σ22 and that the data are drawn
from a normal distribution. Is there evidence to support the claim that the two
machines produce rods with different mean diameters? Use α = 0.05 in arriving
at this conclusion.

R Two catalysts are being analyzed to determine how they affect the mean yield
of a chemical process. Specifically, catalyst 1 is currently in use, but catalyst 2
is acceptable. Since catalyst 2 is cheaper, it should be adopted, providing it
does not change the process yield. A test is run in the pilot plant and results
in the data shown in the following table. Is there any difference between the
mean yields? Use α = 0.05, and assume equal variances.

Observation Number Catalyst 1 Catalyst 2


1 91.50 89.19
2 94.18 90.95
3 92.18 90.46
4 95.39 93.21
5 91.79 97.19
6 89.07 97.04
7 94.72 91.07
8 89.21 92.75
x̄ = 92.255 x̄ = 92.733
s1 = 2.39 s2 = 2.98

R Serum amylase determinations were made on a sample of 15 apparently nor-


mal subjects. The sample yielded a mean of 96 units/100ml and a standard
deviation 35 units/100ml. Serum amylase determinations were also made on 22
hospitalized subjects. The mean and standard deviation from this second group
are 120 and 40 units/100ml., respectively. Would we be justified in calculating
that the implied population means are different? Let α = 0.05.

Hypothesis Tests for a Difference Between Two Population Proportions


Suppose that two independent random samples of sizes n1 and n2 are taken from two
7.2 Questions 179

populations, and let X1 and X2 represent the number of observations that belong to
the class of interest in samples 1 and 2, respectively. Furthermore, suppose that the
normal :approximation to the binomial is applied to each population, so the estimators
of the population proportions P̂1 = X1 /n1 and P̂2 = X2 /n2 have approximate normal
distributions.

The test statistic is


P̄1 − P̄2 − (p1 − p2 )
Z=r
p1 (1−p1 )
n1 + p2 (1−p
n2
2)

is distributed approximately as standard normal and is the basis of a test for H0 :


p1 = p2 . If the null hypothesis is true, using the fact that p1 = p2 = p, the random
variable
P̄1 − P̄2
Z=r  
p(1 − p) n11 + n12

is distributed approximately N (0, 1). An estimator of the common parameter p is

X1 + X 2
P̂ =
n1 + n2

The test statistic for H0 : p1 = p2 is then

P̂1 − P̂2
Z=r  
p̂(1 − p̂) n11 + n12

7.2 Questions

R A random sample of 500 adult residents of Maricopa County found that 385
were in favor of increasing the highway speed limit to 75 mph, while another
sample of 400 adult residents of Pima County found that 267 were in favor
of the increased speed limit. Do these data indicate that there is a difference
in the support for increasing the speed limit between the residents of the two
180 Chapter 7. Hypothesis Testing

counties? Use α = 0.05.

R Out of a sample of 150, selected from patients admitted over a two-year period
to a large hospital, 129 had some type of hospitalization insurance. In a sample
of 160 similarly selected patients from a second hospital, 144 had some type of
hospitalization insurance. Test the null hypothesis that p1 = p2 . Let α = 0.05.
8. Regression

Learning Objectives
Having worked through this chapter the student will be able to:

• Use simple linear regression for building empirical models of engineering and
scientific data.
• Understand how the method of least squares is used to estimate the parameters
in a linear regression model.
• Test statistical hypotheses and construct confidence intervals on regression model
parameters.
• Use the regression model to make a prediction of a future observation and
construct an appropriate prediction interval on the future observation.
• Apply the correlation model.
182 Chapter 8. Regression

8.1 Regression and Correlation Analysis

8.1.1 Introduction

Many problems in engineering and science involve exploring the relationships between
two or more variables. Regression analysis is a statistical technique that is very useful
for these types of problems. For example, in a chemical process, suppose that the yield
of the product is related to the process-operating temperature. Regression analysis
can be used to build a model to predict yield at a given temperature level. This model
can also be used for process optimization, such as finding the level of temperature
that maximizes yield, or for process control purposes. Other examples are, studying
the relationship between blood pressure and age, the concentration of an injected
drug and heart rate etc.
Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the independent or explanatory
variables with a view to estimating and predicting the (population) mean or average of
the former (dependent) in terms of the known or fixed (in repeated sampling) values
of the latter (independent).
Very often in practice, a relationship is found to exist between two (or more) variables
and one wishes to express this relationship in mathematical form by determining
an equation connecting the variables. Correlation analysis, on the other hand, is
concerned with measuring the strength of the relationship between variables. When
we compute measures of correlation from a set of data, we are interested in the degree
of the correlation between variables.

8.1.2 The Regression Model

In the typical regression problem, the researcher has available for analysis a sample of
observations from some real or hypothetical population. Based on the result of his
8.1 Regression and Correlation Analysis 183

analysis of the sample data, he is interested in reaching decisions about the population
from which the sample is presumed to have been drawn. It is important that the
researcher understand the nature of the population in which he is interested.
In the simple linear regression model two variables X and Y, are of interest. The
variable X is usually referred to as the independent variable, while the other variable,
Y is called the dependent variable; and we speak of the regression of Y on X. The
following are the assumptions underlying the simple linear regression model.

i Values of the independent variable X are fixed.


ii The variable X is measured without error.
iii Values Y are normally distributed.
iv The Y values are statistically independent.
v The variances of the sub-populations of Y are all equal.
V (Y |x) = V (α + βx + e) = V (α + βx) + V (e) = 0 + σ 2 = σ 2
vi The means of the sub-populations of Y all lie on the same straight line, this is
the assumption of linearity.

 
E Y |x = µY |x = α + βx

These assumptions may be summarized by means of the following equation which is


called the simple linear regression model because it has only one independent variable
or regressor:
y = α + βx + e

where α and β (slope and intercept) are called population regression coefficients,
e is called the error term with mean zero and variance σ 2 . The random errors
corresponding to different observations are also assumed to be uncorrelated random
variables.
184 Chapter 8. Regression

The results of n observations of the set of random variables X and Y can be summarized
by drawing a scatter diagram. A straight line passing closely to the points may be
drawn. The main problem arises when the points do not all lie exactly on the straight
line, but simply form a cloud of points around it. Thus, it may be possible by guess
work to draw quite a number of lines each of which will appear to be able to explain
the relationship between X and Y. We shall consider finding a best fit line. Such a
line will then be used as a model relating the random variable Y with the random
variable X.

     
Suppose that we have n pairs of observations x1 , y1 , x2 , y2 , ... xn , yn . The
following figure shows a typical scatter plot of observed data and a candidate for the
estimated regression line.
The estimates of α and β should result in a line that is (in some sense) a “best fit” to
the data. The German scientist Karl Gauss proposed estimating the parametersandin
Equation 1.1 to minimize the sum of the squares of the deviations in the diagram.
We call this criterion for estimating the regression coefficients the (method of least
squares.) Using Equation 8.1.2, we may express the n observations in the sample as

yi = αi + βxi + ei i = 1, 2, ..., n

and the sum of the squares of the deviations of the observations from the true regression
line is
n
X n 
X 2
L= = y − α − βxi
i=1 i=1

The least squares estimators of α and β, say α̂$and$, must satisfy

n 
δL X 
= −2 y − α̂ − β̂xi = 0
δα i=1
8.1 Regression and Correlation Analysis 185
n 
δL X 
= −2 y − α̂ − β̂xi xi = 0
δβ i=1

Simplifying these two equations yields

n
X n
X
nα̂ + β̂ xi = yi
i=1 i=1

n
X n
X
α xi + β̂ xi yi
i=1 i=1

Equations 1.4 are called the least squares normal equations. The solution to the
normal equations results in the least squares estimators α and β. The least squares
estimates of the intercept and slope in the simple linear regression model are

α̂ = ȳ − β̂ x̄

P P 
n n
Pn i=1 xi i=1 yi
i=1 xi yi − n
β̂ = P 2
n
Pn 2 i=1 xi
i=1 xi − n

Pn P P 
n n
n i=1 xi yi − i=1 xi i=1 yi
= Pn P 2
2 n
n i=1 xi − i=1 xi

Equation 8.1.1 can also be written as

Pn
xi yi − nx̄ȳ
β̂ = Pi=1
n 2 2
i=1 xi − nx̄

The estimated regression line is therefore ŷ = α̂ + β̂x Alternatively


186 Chapter 8. Regression

8.2 Method of Least Squares

We shall now find a and b, the estimates of and so that the sum of the squares of
the residuals is a minimum. The residual sum of squares is often called the Sum of
Squares of Errors (SSE) about the regression line. This minimisation procedure for
estimating the parameter is called the “methods of least squares”. Hence we shall find
a and b so as to minimise

n n  2 n  2
e2i =
X X X
SSE = yi − ŷi = yi − a − bxi
i=1 i=1 i=1

Differentiating SSE with respect a and b, we have

n n
δ(SSE) X δ(SSE) X
= −2 (Yi − a − bxi ) = −2 (Yi − a − bxi )xi
δa i=1 δb i=1

Setting the partial derivative equal to zero and rearranging the terms, we obtain the
equation (called the normal equations)

n
X n
X
na + b xi = yi .......(1)
i=1 i=1

n n n
x2i =
X X X
a xi + b xi y i ......(2)
i=1 i=1 i=1

Solving for a and b from (1) and (2)

P P 
n
n
xi y i − Pn
P
n i=1 xi
i=1 yi i=1 (xi − x̄)(yi − ȳ) SSxy
b= 2 = Pn 2
=
i=1 (xi − x̄) SSxx
Pn P
n
n i=1 x2i − i=1 xi

Pn Pn
i=1 yi − b i=1 xi
a= = ȳ − bx̄
n

Equations (1) and (2) can also be solved using matrices as:
8.2 Method of Least Squares 187

P
n x
    P 
a y
  = 
P 2 P
b xy
P
x x
P −1  P
n x
  
a y
    
P 2 P
b xy
P
x x

Examples

Assuming we have the following quantities n = 8, x = 140, = 382, xy =


P P
R
3870, and x2= 3500.
P

Solution:
Using the above equation:

8 140 382
    
a
  = 
140 3500 b 3870

Solving gives a = 94.67 and b = -2.68


Thus, the estimated regression line is given by:
= 94.67 - 2.68x

# Given q u a n t i t i e s

n <− 8

sum_x <− 140

sum_y <− 382

sum_xy <− 3870

sum_x2 <− 3500

# C a l c u l a t e means o f x and y

mean_x <− sum_x / n

mean_y <− sum_y / n


188 Chapter 8. Regression

# Calculate slope (b)

b <− ( n ∗ sum_xy − sum_x ∗ sum_y ) / ( n ∗ sum_x2 − sum_x

^2)

# Calculate intercept (a)

a <− mean_y − b ∗ mean_x

# Print the regression l i n e equation

print ( paste ( " The␣ r e g r e s s i o n ␣ l i n e ␣ i s ␣y␣=" , a , "+" , b , " ∗␣

x" ) )

It may be noted that the least-squares line passes through the point (x, y) called
the ‘Centroid’ or centre of gravity of the data. The slope b of the regression line is
independent of the origin of coordinates. It is therefore said that b is invariable under
the translation of axes. Besides assuming that the regression of y and x is a linear
function having the form E(Y |X) = α + βx we have made three further assumptions
which may be summarised as follows:

1 Normality: We have assumed that each variable yi has a normal distribution.


2 Independence: We have assumed that the variables yi ....yn are independent.
3 Homoscedasticity: We have assumed that the variable yi ....yn have the same
variance σ 2 . This assumption is called the assumption of homoscedasticity.
In general, it is said that random variables having the same variance are ho-
moscedastic, and random variables having different variance are heteroscedastic.

How it is computed in R

# Given q u a n t i t i e s
8.2 Method of Least Squares 189

n <− 8
sum_x <− 140
sum_y <− 382
sum_xy <− 3870
sum_x2 <− 3500

# Calculate slope (b)


b <− ( n ∗ sum_xy − sum_x ∗ sum_y ) / ( n ∗ sum_x2 − sum_x ^2)

# Calculate intercept (a)


a <− (sum_y − b ∗ sum_x ) / n

# Print the regression l i n e equation


print ( paste ( " The␣ r e g r e s s i o n ␣ l i n e ␣ i s ␣y␣=" , a , "+" , b , " ∗␣x " ) )

R A team of professional mental health workers in a long-stay psychiatric hospital


wished to measure the level of response of withdrawn patients to a program of
remotivation therapy. A standard test was available for this purpose, but it
was expensive and time-consuming to administer. To overcome this obstacle,
the team developed a test that was much easier to administer. To test the
usefulness of the new instrument for measuring the level of patient response,
the team decided to examine the relationship between scores made on the new
test and scores made on the standardized test. The objective was to use the
new test if it could be shown that it was a good predictor of a patient’s score
on the standardized test. The results are shown in the table below: Obtain the
estimates of the regression coefficients.
190 Chapter 8. Regression
Patients’ Scores on Standardized Test and New Test
Patient Number Score on New Test (X) Score on Standardized Test (Y)
1 50 61
2 55 61
3 60 59
4 65 71
5 70 80
6 75 76
7 80 90
8 85 106
9 90 98
10 95 100
11 100 114

R A research on “Near Surface Characteristics of Concrete: Intrinsic Permeabil-


ity”, presented data on compressive strength x and intrinsic permeability y
of various concrete mixes and cures. Summary quantities are n14, xi = 43,
P

yi = 572, x2i = 157.42, yi2 = 23, 530, and xi yi = 1697.80. Assume that
P P P P

the two variables are related according to the simple linear regression model.

i. Calculate the least squares estimates of the slope and intercept.


ii. Use the equation of the fitted line to predict what permeability would be
observed when the compressive strength is x = 4.3.
iii. Give a point estimate of the mean permeability when compressive strength
is x = 3.7

The following data were obtained from a study investigating the relationship
between noise exposure and hypertension.

Y 1 0 1 2 5 1 4 6 2 3 5 4
6 8 4 5 7 9 7 6
X 60 63 65 70 70 70 80 90 80 80 85 89
90 90 90 90 94 100 100 100

i Fit the simple linear regression model using least squares.


ii Find the predicted mean rise in blood pressure level associated with a
sound pressure level of 85 decibels.
8.3 Correlation Analysis 191

8.3 Correlation Analysis

Closely related but conceptually very much different from regression analysis is
correlation analysis, where the primary objective is to measure the strength or degree
of linear association between two variables. The correlation coefficient measures this
strength of (linear) association. For example, we may be interested in finding the
correlation between smoking and lung cancer; between scores on mathematics and
fluid mechanics examinations, between high school grades and college grades etc.
In regression analysis, as already noted, we are not primarily interested in such a
measure. Instead, we try to estimate the average value of one variable on the basis of
the fixed values of another variable.
The population correlation coefficient between two random variables, X and Y is
defined as

E[X − E(X)][Y − E(Y )] cov(X, Y ) σxy


ρ= q =q =
var(X)var(Y ) var(X)var(Y ) σX σY

where σXY is the covariance between variables X and Y, σX and σY are the standard
deviations of X and Y respectively. It is possible to draw inferences about the
correlation coefficient ρ using its estimator, the sample correlation coefficient, r. “r” is
the correlation coefficient between “n” pairs of observations whose values are (Xi , Yi )
and is given by

Pn P P 
n n
i=1 xi yi −
Pn
i=1 xi yi − nxy
¯ n i=1 i=1 yi
r = r P  = r 2  P 2 
Pn 2 n 2 Pn P
n
P
i=1 xi − nx̄
2
i=1 yi − nȳ
2
n 2
i=1 xi − i=1 xi n ni=1 yi2 − n
i=1 yi

1 It is symmetrical in nature (the two variables are treated symmetrically). That


is, there is no distinction between the dependent and independent variables.
2 Both variables are assumed to be random.
192 Chapter 8. Regression

3 It can be positive or negative, the sign depending on the sign of the term in the
numerator which measures the sample co variation of the two variables.
4 It lies between the limits of -1 and +1; that is, −1 ≤ r ≤ +1.
5 If X and Y are independent, the correlation coefficient between them is zero but
if r=0 it does not mean that the two variables are independent.
8.3 Correlation Analysis 193

Figure 8.1: Scatter plots with various r values

How to compute correlation R programming

# Example d a t a v e c t o r s
x <− c ( 1 , 2 , 3 , 4 , 5 )
194 Chapter 8. Regression

y <− c ( 2 , 4 , 6 , 8 , 10 )

# Compute Pearson c o r r e l a t i o n c o e f f i c i e n t
c o r r e l a t i o n_p e a r s o n <− cor ( x , y , method = " p e a r s o n " )

# Print the r e s u l t
print ( c o r r e l a t i o n_p e a r s o n )

Testing hypothesis about the correlation coefficient.


A test of the special hypothesis = 0 versus an appropriate alternative is equivalent to
testing β = 0 for the simple linear regression model. In doing this the t-distribution
with n-2 degrees of freedom may be needed. √ b which can also be written as to
SSxx

test:
H0 : ρ = 0
H1 : ρ ̸= 0

8.4 Questions

R Using the following data, test the hypothesis that there is no linear correlation
among the variables that generated them; at 5% level of significance: SSxx =
0.11273 SSyy = 11, 807, 324, 786 SSxy = 34, 42275972
Solution:

SSxy 34422.75972
r= √ =p = 0.9435
SSxxSSyy (0.11273)(11807324786)

H0 : ρ = 0

H1 : ρ ̸= 0
8.4 Questions 195

α = 0.05 df = (n − 2)

critical region : t < −2.052 and t > 2.052



0.9435( 29 − 2)
t= p = 14.79
1 − (0.9435)2

P < 0.0001

Decision: Since t > t0.025 (27) reject the hypothesis of no linear correlation.
More generally, if X and Y follow the bivariate normal distribution, it can
be shown that quantity is a random variable that follows approximately the
normal distribution with mean and variance equal to 1/(n − 3). There the
procedure is to compute

√ √
n−3 1+r 1 + ρ0 n−3 (1 + r)(1 − ρ0 )
      
z= ln − ln = In
2 1−r 1 − ρ0 n (1 − r)(1 + ρ0 )

R Consider the immediate preceding example data, test the null hypothesis that
= 0.9 against the alternative that > 0.9 at 5% level of significance.
Solution:
H0 : ρ = 0.9
H1 : ρ > 0.9
Critical region : Z > 1.645

Decision: Since Z < Z0.05 there is no evidence that the correlation coefficient
is not equal to 0.9

In ordinary usage of this method, it is not necessary to use the formula for Z
that corresponds to r values between 0.0 and 0.99. Tables contain fisher - Z
values Zf are available. In this case to test H0 : ρ = ρ0 vrs H1 : ρ ̸= ρ0
Z = we have Z=
Critical region is Z ≤ −Z2 andZ ≥ Zα/2 where Zf and f are the fisher - Z values
196 Chapter 8. Regression

for r and ρ0 respectively.

R The following data gave X = the water content of snow on April


1 and Y= the yield from April to July (in inches) on the Snake
River watershed in Wyoming for 17 years.

x y X y
23.1 10.5 37.9 22.8
32.8 16.7 30.5 14.1
31.8 18.2 25.1 12.9
32.0 17.0 12.4 8.8
30.4 16.3 35.1 17.4
24.0 10.5 31.5 14.9
39.5 23.1 21.1 10.5
24.2 12.4 27.6 16.1
52.5 24.9

Estimate the correlation between X and Y

R Two methods of measuring cardiac output were compared in 10


experimental animals with the following results

Cardiac Output (1./min)


Method I x Method I Y
0.8 0.5
1.0 1.2
1.3 1.1
1.4 1.3
1.5 1.1
1.4 1.8
2.0 1.6
2.4 2.0
2.7 2.4
3.0 2.8

Compute the sample correlation coefficient.


8.4 Questions 197

R A group of eight athletes ran a 400 metres race twice. The times
in seconds were recorded as follows for each athlete.

Runner
1st Trial x 2nd Trial Y
48.4 48.0
51.2 54.3
48.6 49.4
49.5 48.4
51.6 54.0
49.3 47.2
50.8 51.8
49.7 50.3

Calculate the correlation coefficient between these two trials.


Solution in R

# Load n e c e s s a r y l i b r a r y

library ( ggplot2 )

# C r e a t i n g a d a t a frame

data <− data . frame (

f i r s t_t r i a l = c ( 4 8 . 4 , 5 1 . 2 , 4 8 . 6 , 4 9 . 5 , 5 1 . 6 , 4 9 . 3 ,

50.8 , 49.7) ,

s e c o n d_t r i a l = c ( 4 8 . 0 , 5 4 . 3 , 4 9 . 4 , 4 8 . 4 , 5 4 . 0 , 4 7 . 2 ,

51.8 , 50.3)

# Calculating the correlation c o e f f i c i e n t

c o r r e l a t i o n <− cor ( data$ f i r s t_t r i a l , data$ s e c o n d_t r i a l )

# Printing the correlation c o e f f i c i e n t

print ( paste ( " C o r r e l a t i o n ␣ c o e f f i c i e n t : " , c o r r e l a t i o n ) )


198 Chapter 8. Regression

# Plotting

g g p l o t ( data , a e s ( x = f i r s t_t r i a l , y = s e c o n d_t r i a l ) ) +

geom_p o i n t ( ) +

geom_smooth ( method = " lm " , se = FALSE, c o l o r = " r e d " )

g g t i t l e ( " S c a t t e r ␣ p l o t ␣ o f ␣ 1 s t ␣ T r i a l ␣ vs ␣ 2nd␣ T r i a l ␣ Times "

) +

x l a b ( " 1 s t ␣ T r i a l ␣Time␣ ( s e c o n d s ) " ) +

y l a b ( " 2nd␣ T r i a l ␣Time␣ ( s e c o n d s ) " )

Figure 8.2: Scatter plot

You might also like