0% found this document useful (0 votes)
16 views69 pages

Quantitative Skills For Animal Sciences-Day 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views69 pages

Quantitative Skills For Animal Sciences-Day 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Quantitative Skills for Animal Sciences

YAS33803

Henk Bovenhuis, ABG


Exploratory analyses
a.1) Initial examination of the data - Why??

1. Was the data imported in the correct way?

2. Asses the structure of the data


What is the sample size?
How many variables are there?
What type of variables (continuous, categorical) are there?

3. Data quality
Are there missing observations?
How were missing values treated?
Are there any “outliers” – strange data points?

81
Exercise

 Use “Analysis tool pack”

 Create a “Histogram” to explore


variable “longevity”

 What do you conclude


• Any “strange” data points?
• Distribution of the data?
82
Exploratory analyses - Exercise
Input range: select cells that
contain the variable (e.g.
longevity)

Bin range: option to specify the


size of the “bins”, e.g. 15, 20,
25, 30…….etc
Bin = “bucket” = interval

Output

83
Exploratory analyses - longevity

Counts the number of


observations in the
interval:
• 15: <15
• 20: 15-20
• 25: 20-25
• …..

84
Exploratory analyses - longevity

Note the effect of “Bin” size: Bin size 5 versus 10


85
Exploratory analyses - longevity

To check for normality it is better to look at Q-Q


plot (option in R)

86
Exploratory analyses - histogram
(Strong) deviations from normality

Note: Y-variable does not need to be normally distributed!!


87
Exploratory analyses - histogram

Use histogram (or Boxplot) to


identify “strange data points”
(potential outliers).

Note: outliers should be


identified based on the
residuals from your model
(see later)

88
Exploratory analyses - outliers

Outlier:
a value that is far from the others: it is an unusually
large or an unusually small value compared to the others

89
Exploratory analyses - outliers

What to do – delete or keep in the analysis?

1. Explanation and can we fix it?? Was the value entered into the
computer correctly? If there was an error in data entry, fix it.
2. Is there a justification to exclude the value resulting from that
analysis? Were there any experimental problems with that
value?
3. Is the outlier caused by “normal” variation? The
observation/individual may be different from the others. This
may be the most exciting finding in your data!

90
Exploratory analyses - outliers

91
Exploratory analyses - histogram

2′-fucosyllactose in goat milk, d 31 in lactation.

Is this an outlier or a goat with an


interesting genotype?? 92
Digression - histogram

Hatching weight of chickens in different generations

MSc thesis Lotte van Kempen 93


Digression - histogram
Generation

Rounding BW to nearest 10 g

Rounding BW to nearest 5 g

Rounding BW to zero decimal places

Rounding BW to one decimal place

Missing BW

Large differences in accuracy of Body weight (BW) measurements

MSc thesis Lotte van Kempen 94


Exploratory analyses

So far, we only “explored” the response variable (longevity).


What about explanatory variables (X) co-variables in the
model??
Which could be co-variables in our model??
• ID: Serial No. (1-25) within each group of 25 (the order
in which data points were abstracted)
• GROUP: code 0.0, 1.0, 1.1, 8.0 or 8.1
• LONG: Longevity or life span, in days
• THOR: Length of thorax, in mm
• SLEEP: Percentage of each day spent sleeping

Model: yij =  + Classi .......+ Xij + eij


95
Exploratory analyses

So far, we only “explored” the response variable (longevity).


What about explanatory variables (X) co-variables in the
model??

Model
yij =  + Classi .......+ Xij + eij

Long.. = …….+ Group.. + Thor.. + Sleep.. +……

96
Exercise

 Create a “Histogram” to explore


possible co-variables “Thor” and
“Sleep”

 What do you conclude


• Any “strange” data points?
• Distribution of the data?

97
Exploratory analyses

Quantitative explanatory variables – regressors / co-variable


How are they distributed?

98
Exploratory analyses

Quantitative explanatory variables – regressors / co-variable


Do not need to be normally distributed but a clear “bimodal
pattern” or “strange values” might affect results.

99
Example - Egg shell strength in Poultry

The data set contains information on:

animal: identification of the bird


sire: identification of the father
dam: identification of the mother
10- wk-weight: weight of the bird at 10 weeks of age
cage: number of the cage in which hens were housed
hatch_date: date at which the individual came out of the egg.
batch: group of birds that arrived at the farm in 2-week intervals
age: age at which the observation was taken
number_observations: number of eggs that were tested in order to
calculate an average egg shell strength for this individual
value: average egg shell strength measurement
100
Example - Egg shell strength in Poultry

Co-variable (regressor) Body Weight

Birds with a weight of 0!!

101
Digression - Egg shell strength in Poultry
Results including 2 individuals with WEIGHT=0, n=539

Results excluding 2 individuals with WEIGHT=0, n=537

Excluding 2
date errors has
a big impact!!

102
Exploratory analyses
Quantitative explanatory variables – regressors / co-variable
How are they distributed? Do not need to be normally
distributed but a clear bimodal pattern or a strange value
(outlier) might affect results.

“Give me a lever long enough and


a place to stand, and I can move
the earth.”
Aristotle

103
Exploratory analyses

A data point has high leverage if it has "extreme"


predictor x values.

Outliers and high leverage data points have the


potential to be influential.

104
Exploratory analyses

Explanatory variables: Qualitative explanatory variables –


class variables

• ID: Serial No. (1-25) within each group of 25 (the order


in which data points were abstracted)
• GROUP: code 0.0, 1.0, 1.1, 8.0 or 8.1
• LONG: Longevity or life span, in days
• THOR: Length of thorax, in mm
• SLEEP: Percentage of each day spent sleeping

Model: yij =  + Classi .......+ Xij + eij


105
Exploratory analyses

Explanatory variables: Qualitative explanatory variables –


class variables

Model
yij =  + Classi .......+ Xij + eij

Long.. = …….+ Group.. + Thor.. + Sleep.. +……

Exploratory phase: How many classes and how many


observations per class?
106
Exploratory analyses

Create Frequency Table in Excel – different possibilities….


1. Copy 2. “Move 3. Copy column 4. Go to 5. Select copy of
worksheet or Copy” “Group” “data” column “Group” &
select “Remove
Duplicates”

Right-click Paste
worksheet tick box content in
“Create a new
copy” column

107
Exploratory analyses

Use function “COUNTIF”

Range: column “Group”

Criteria: Equal to “0-0”

=COUNTIF(B$2:B$126,I2)

108
Exploratory analyses

Qualitative explanatory variables – class variables


Exploratory phase: How many classes and how many
observations per class?
• We do not want classes with 1
observation!!
Group n
0-0 25
1-0 25 • We do not like classes with small
1-1 25 number of observations (e.g. <4)
8-0 25
8-1 25 • If n>1, but small….maybe we can
• Combine classes
• Model effect as
random..(mixed models) 109
Exploratory analyses

Model
yij =  + Classi .......+ Xij + eij

Long.. = ID.. + Group.. + Thor.. + Sleep.. +……

• We do not want classes with 1


What about “ID”??
observation!!

110
Exploratory analyses

What did we learn???


 125 observations
 Variables
 ID: Serial No – does not make sense to include in the
analysis!!
 GROUP: class variable, 5 classes, each with 25
observations
 LONG: continuous variable
 THOR: continuous variable
 SLEEP: continuous variable
 No missing observations
 No indications for the presence of outliers

111
Data analysis

Steps when analysing data:

a) Exploratory analyses
a.1) Initial examination of the data.
a.2) Relations explanatory variables
and the response variable.
a.3) Relations among explanatory
variables.
a.4) Conclusions based on the
exploratory analysis.

112
Exploratory analyses: explanatory-response

Long.. = Group.. + 1Thor.. + 2Sleep.. +……

113
Exploratory analyses: explanatory-response

 Get a first, preliminarily idea about the effect of the class


variable on the response variable (Y).

 Get a first clue about the “type of relationship” between


the regressors (covariable) and the response variable (Y).

“I have known only a few cases where significant effects were


found, for which the corresponding display, clearly
demonstrating this could not be produced.”
Andrews (1991)

114
Exercise

Calculate Average and Standard deviation of


longevity for each class of “GROUP”:
0-0
1-0
1-1
8-0
8-1
Suggestion: use function “AVERAGEIF”

115
Exploratory analyses: explanatory-response

Use function “AVERAGEIF”

Range: column “Group”

Criteria: Equal to “0-0”


Average for: column “Long”

=AVERAGEIF(B$2:B$126,I2,C$2:C$126)
116
Exploratory analyses: explanatory-response

Classes n Average SD
0-0 25 63.56
1-0 25 64.8 Standard Deviation??
1-1 25 56.76 Use “IF” function:
8-0 25 63.36
8-1 25 38.72
Range: column “Long”
=STDEV.S(IF($B$2:$B$126=I2,$C$2:$C$126,FALSE))
Range: column “Group”

Logical test: e.g. Group = “0-0”


Value if true: include in
calculating SD longevity
117
Exploratory analyses: explanatory-response

Longevity (days)
Group n Average SD Are differences between
0-0 25 63.56 16.45 Groups (treatments) large??
1-0 25 64.80 15.65
1-1 25 56.76 14.93 Do you expect “Group” to
8-0 25 63.36 14.54
8-1 25 38.72 12.10
have a significant effect on
All 125 57.44 17.56 Longevity??

118
Exploratory analyses: explanatory-response

Longevity (days)
Group n Average SD
0-0 25 63.56 16.45
Difference means:
1-0 25 64.80 15.65
1-1 25 56.76 14.93
8-0 25 63.36 14.54
8-1 25 38.72 12.10

( )

P=0.13 (e.g. T.DIST.2T in EXCEL)


119
Exploratory analyses: explanatory-response

Alternative:
Data Analysis – Anova: Single
Factor

Reorganize
data

120
Exercise

Perform “Anova: Single Factor” to test the effect


of “Group” on “Long”

121
Exploratory analyses: explanatory-response
Output
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
0-0 25 1589 63.56 270.67
1-0 25 1620 64.8 245
1-1 25 1419 56.76 222.85
8-0 25 1584 63.36 211.40
8-1 25 968 38.72 146.46

ANOVA
SS
Source of Variation df MS F P-value F crit
Between Groups 11939.2 4 2984.82 13.61195 3.52E-09 2.447237
Within Groups 26313.5 120 219.2793

Total 38252.8 124

122
Exploratory analyses: explanatory-response

Long.. = Group.. + 1Thor.. + 2Sleep.. +……

Relation between longevity and Co-variable THOR??

Create “scatterplot” – relation between 2 continuous


variables

123
Exploratory analyses: explanatory-response

Create 1.Select X
Scatterplot &Y

Add trendline
to plot
Right mouse
Click “series” 124
Exercise

Create a scatterplot. Variables: “Long” and “Thor”

 Which variable should be on the x-axis and which


on the y-axis

 Guess: will the effect of “Thor” on “Long” be


significant.

 Is the relation linear or non-linear??

125
Long Thor

0.6
40 4
0.7
37 0
0.7
44 2

Exploratory analyses: explanatory-response


0.7
47 2
0.7
47 2
0.7
47 6
0.7
68 8
0.8
47 0
0.8
54 4
0.8
61 4
0.8
71 4
0.8
75 4
0.8
89 4
0.8
58 8
0.8
59 8
0.8
62 8
0.8
79 8

120
0.8
96 8
0.9
58 2
0.9
62 2
0.9
70 2
0.9
72 2
0.9
75 2

100
0.9
96 2
0.9
75 4
0.6
Longevity (days)
46 4
0.6
42 8
0.7
65 2
0.7
46 6

80
0.7
58 6
0.8
42 0
0.8
48 0
0.8
58 0
0.8
50 2
0.8
80 2
0.8

60
63 4
0.8
65 4
0.8
70 4
0.8
70 4
0.8
72 4
0.8
97 4
0.8

40
46 8
0.8
56 8
0.8
70 8
0.8
70 8
0.8
72 8
0.8
76 8
0.8
90 8

20
0.9
76 2
0.9
92 2
0.6
21 8
0.6
40 8
0.7
44 2
0.7
54 6

0
0.7
36 8
0.8
40 0
0.8
56 0

0.50 0.60 0.70 0.80 0.90 1.00


0.8
60 0
0.8
48 4
0.8
53 4
0.8
60 4

Thorax length (mm)


0.8
60 4
0.8
65 4
0.8
68 4
0.8
60 8
0.8
81 8
0.8
81 8
0.9
48 0
0.9
48 0
0.9
56 0
0.9
68 0
0.9
75 0
0.9
81 0
0.9
48 2
0.9
68 2
0.6
35 4
0.6

126
37 8
0.6
49 8
0.7
46 2
0.7
63 2
0.7
39 6
0.7
46 6
0.7
56 6
Long Thor

0.6
40 4
0.7
37 0
0.7
44 2

Exploratory analyses: explanatory-response


0.7
47 2
0.7
47 2
0.7
47 6
0.7
68 8
0.8
47 0
0.8
54 4
0.8
61 4
0.8
71 4
0.8
75 4
0.8
89 4
0.8
58 8
0.8
59 8
0.8
62 8
0.8
79 8
0.8
96 8
0.9
58 2
0.9
62 2
0.9
70 2

120
0.9
72 2

• There seems to be a clear


0.9
75 2
0.9
96 2
0.9

y = 144.33x - 61.052
75 4
0.6
46 4

100
0.6
Longevity (days)

42 8

(linear) relationship
0.7

R² = 0.4051
65 2
0.7
46 6
0.7
58 6
0.8
42 0

80
0.8
48 0

between THOR and LONG.


0.8
58 0
0.8
50 2
0.8
80 2
0.8
63 4

60
0.8
65 4
0.8
70 4
0.8
70 4
0.8
72 4
0.8
97 4
0.8

40
• Note: linear model….but
46 8
0.8
56 8
0.8
70 8
0.8
70 8
0.8
72 8
0.8

20
we can test if non-linear
76 8
0.8
90 8
0.9
76 2
0.9
92 2
0.6
21 8
0.6

0
relationships give a better
40 8
0.7
44 2
0.7
54 6
0.7

0.50 0.60 0.70 0.80 0.90 1.00


36 8
0.8
40 0
0.8

fit:
56 0
0.8
60 0

Thorax length (mm)


0.8
48 4
0.8
53 4
0.8
60 4
0.8

X2, eX ………..


60 4
0.8
65 4
0.8
68 4
0.8
60 8
0.8
81 8
0.8
81 8
0.9
48 0
0.9
48 0
0.9
56 0
0.9
68 0
0.9
75 0
0.9
81 0
0.9
48 2
0.9
68 2
0.6
35 4
0.6

127
37 8
0.6
49 8
0.7
46 2
0.7
63 2
0.7
39 6
0.7
46 6
0.7
56 6
Exploratory analyses: explanatory-response

Linear relationship between Y =61.05 + 144.33 (Thor)


Long = β0+β1(Thor) R2= 0.4051

Quadratic relationship:
Long = β0+β1(Thor)+ β2(Thor2)

Y =-106.45 + 259.5 (Thor) – 72.28(Thor2)


R2= 0.4058
128
Exercise

Explore relation between longevity and percentage


sleeping (SLEEP).

 Which variable should be on the x-axis and which on the y-


axis

 Guess: will the effect of “Sleep” on “Long” be significant.

 Is the relation linear or non-linear??

129
Exploratory analyses: explanatory-response

120

100
Longevity (days)

80

60

40

20

0
0 10 20 30 40 50 60 70 80 90 100
%Sleep

130
Exploratory analyses: explanatory-response

120
y = 0.0046x + 57.332
100 R² = 2E-05
Longevity (days)

80

60

40

20

0
0 10 20 30 40 50 60 70 80 90 100
%Sleep

131
Exploratory analyses: explanatory-response

What did we learn about relations between explanatory


variables and the response variable.

 GROUP: There seem to be differences in longevity


 THOR: There seems to be a (linear) relationship
between longevity and thorax length.
 SLEEP: There seems to be no relationship between
longevity and percentage sleeping.

132
Data analysis

Steps when analysing data:

a) Exploratory analyses
a.1) Initial examination of the data.
a.2) Relations explanatory variables
and the response variable.
a.3) Relations among explanatory
variables.
a.4) Conclusions based on the
exploratory analysis.

133
Exploratory analyses: among explanatory

Long.. = Group.. + 1Thor.. + 2Sleep.. +……

134
Exploratory analyses: among explanatory

a.3) Relations among explanatory variables.

Explanatory variables might (partly) explain the same


variation in the response variable (Y).

Confounding
Two variables are confounded if they vary together in
such a way that it is impossible to determine which
variable is responsible for an observed effect.

135
Exploratory analyses: among explanatory
Experiment comparing two treatments for depression

Treatment
1 2
Young # -
Age
Old - #

In case of a significant difference between treatment groups,


it is impossible to say if the effect is due to treatment or due
to an age difference  randomize

136
Exploratory analyses: among explanatory

Herd Cow Observation (k)


(i) (j) 1 2 . . k
1 1 y111 y112 . . y11k
The structure of
the data might . . . . . .
affect how the 1 10 y1j1 y1j2 . . y1jk
statistical model 2 11 y2j1 y2j2 . . y2jk
looks like . . . . . .
2 20 . . . . .
3 21 . . . . .
. . . . . . .
137
Exploratory analyses: among explanatory

Cow
1 2 . . j
Cows only occur
1 *** ***
2 *** ***
on one herd: cows
are “nested”
Herd

.
. within herds
i ***
Cow
1 2 . . j
1 *** *** *** *** ***
2 *** *** *** *** *** The same cow can
not be present on
Herd

. *** *** *** *** ***


. *** *** *** *** *** multiple herds
i *** *** *** *** ***
138
Exercise
Explore relations between:

 “Group” and “Thor”


 “Group” and “Sleep”
 “Thor” and “Sleep”

Long.. = Group.. + 1Thor.. + 2Sleep.. +……

139
Exploratory analyses: among explanatory

1.“Group” and “Thor” 3.“Thor” and “Sleep”


Thor
Group n Average SD 90
y = 13.503x + 12.379
0-0 25 0.84 0.08 80 R² = 0.0043
1-0 25 0.83 0.07 70
1-1 25 0.84 0.07
8-0 25 0.81 0.08 60

%Sleep
8-1 25 0.80 0.08 50
40
2.“Group” and “Sleep” 30

Sleep 20
Group n Average SD 10
0-0 25 21.6 12.5 0
1-0 25 24.1 16.7
0.50 0.60 0.70 0.80 0.90 1.00
1-1 25 25.8 18.4
8-0 25 25.2 19.8 Thor (mm)
8-1 25 20.8 10.7

140
Exploratory analyses: among explanatory
“Anova: “Single Factor” (Thorax) in EXCEL
Groups Count Sum Average Variance SD
0-0 25 20.90 0.8360 0.0071 0.084
.
1-0 25 20.64 0.8256 0.0049 0.070
1-1 25 20.94 0.8376 0.0050 0.071
8-0 25 20.14 0.8056 0.0067 0.082
8-1 25 20.00 0.8000 0.0061 0.078
p= 0.29

“Anova: “Single Factor” (Sleep) in EXCEL


Groups Count Sum Average Variance
0-0 25 539 21.56 155.1733
1-0 25 602 24.08 278.4933
1-1 25 644 25.76 340.2733
p= 0.75
8-0 25 629 25.16 393.0567
8-1 25 519 20.76 115.44

141
Exploratory analyses: among explanatory

What did we learn about relations among explanatory


variables?

 No relation between SLEEP and THOR.


 No relation between GROUP and THOR
 No relation between GROUP and SLEEP

142
Exploratory analyses

 First analyses using one (explanatory) variable at a time


provide a good starting point for the first model.

 In case (explanatory) variables are nicely (randomly)


distributed across all other (explanatory) variables
(independent), the effect of simultaneous analyzing all
(explanatory) variables is expected to have a limited effect
on the estimates (balanced data).

 …. if this is not the case………

143
Exploratory analyses: among explanatory

 Relations among explanatory variables are relevant for


performing the appropriate statistical analysis.

 Be aware of a “nested” data structure

 Data might not be “balanced”.

144
Data analysis

Steps when analysing data:

a) Exploratory analyses
a.1) Initial examination of the data.
a.2) Relations explanatory variables
and the response variable.
a.3) Relations among explanatory
variables.
a.4) Conclusions based on the
exploratory analysis.

145
Exploratory analyses – wrapping up

What did we learn???


 125 observations
 Variables
 ID: Serial No – does not make sense to include in the
analysis!!
 GROUP: class variable, 5 classes, each with 25
observations
 LONG: continuous variable
 THOR: continuous variable
 SLEEP: continuous variable
 No missing observations
 No indications for the presence of outliers

146
Exploratory analyses- – wrapping up

What did we learn about relations between explanatory


variables and the response variable.

 There seem to be differences in longevity between


treatment groups.
 There seems to be a (linear) relationship between
longevity and thorax length.
 There seems to be no relationship between longevity
and percentage sleeping.
147
Exploratory analyses – wrapping up

What did we learn about relations among explanatory


variables?

 No relation between SLEEP and THOR.


 No relation between GROUP and THOR
 No relation between GROUP and SLEEP

148

You might also like