Descriptive Statistics
Dr.S.Manikandan
Manikandan 2
What is intended?
Appropriate use of summary statistics
When to use which statistic?
What is not intended ?
To scare you with the mathematical
notation and statistical jargon
Manikandan 3
How to get thro’ statistics with minimal difficulty....
 Keep up with the class – the classes will be
extremely cumulative.
 Develop an understanding of the concept –
work out the problems yourself
 Spend quality time in studying – spending
five hours on two days before exam is not as
effective as spending one hour ten days
 Look at the assigned material before lecture,
review the covered material after lecture
Manikandan 4
Do not hesitate to ask questions
Manikandan 5
If you think you can do a thing or think you
can’t do a thing, you’re right.
- Henry Ford
Manikandan 6
Basic definitions
Manikandan 7
Statistics :
A set of methods for organising,
summarising and interpreting information.
Population :
The set of all individuals of interest in
a particular study
Sample :
Set of individuals selected from a
population, usually intended to represent the
population
Manikandan 8
Types of statistics
DESCRIPTIVE STATISTICS Describing a phenomena
Frequencies How many…
Basic measurements Meters, seconds, cm3, IQ
INFERENTIAL STATISTICS Inferences about phenomena
Hypothesis Testing Proving or disproving theories
Confidence Intervals If sample relates to the larger population
Correlation Associations between phenomena
Significance testing e.g diet and health
Manikandan 9
Parameter :
A value, usually a numerical value, that
describes a population.
Variable :
Characteristic of an element whose value
may differ from element to element
Data :
Measurements that are collected, recorded &
summarised for presentation, analysis &
interpretation
Manikandan 10
Types of Data
Data
Qualitative Quantitative
Discrete Continuous
Ordinal
Nominal
Manikandan 11
Scales of data measurement
Nominal scale :
 Uses names or tags to distinguish one
measurement from another
 Each measurement is assigned to one of the
limited number of categories
 Does not imply magnitude of individual
measurement
 Cannot be ordered one above the other
Manikandan 12
Examples of nominal scale :
 Gender (male/female)
 Religion (Hindu, muslim, christian, sikh)
Manikandan 13
Types of Data
Nominal Proportion
Expressed as:
10% Christians
18% Muslims
72 % Hindus
18
10
72
Manikandan 14
Ordinal Scale
 Data can be placed in a meaningful order
 No information about the size of the interval
 No conclusion can be drawn about whether the
difference between the first & second category is
the same as second & third
Examples :
 Stages of cancer – I, II, IIIa, IIIb and IV
Manikandan 15
Types of Data
Ordinal Scores, ranks
How do you rate the pain
after the analgesic?
Unbearable
Very
painful Mild
pain
No pain
Painful
Data can be arranged in an ORDER and RANKED
Expressed as:
Manikandan 16
Interval Scale
 A numerical unit of measurement is used
 Difference between any two measurements can
be clearly identified
 There is no true zero point
 No meaningful ratio can be got as there is no
absolute zero
Manikandan 17
Example of interval scale
 Measurement of body temperature in celsius
The difference bet. 100 & 90oC is the same as
between 50 & 40oC.True zero point is absent as
temp. can be below zero (-10oC). No meaningful
ratio can be got – 100oC is not twice as hot as
50oC.
Manikandan 18
Ratio Scale
 Similar to interval scale
 Also has a absolute zero, and so meaningful
ratios do exist.
Examples :
Most biomedical variables - weight, blood
pressure, pulse rate.
Gitanjali 19
Nominal Male Female
Ordinal Short Medium Tall
Ratio Measure exact height
How do you measure?
Manikandan 20
What is frequency distribution?
 The organization of
raw data into several
classes using
frequency tally.
e.g.
Height (cm) No. of
students
121-125 5
126-130 17
131-135 25
136-140 30
141-145 20
146-150 3
Total 100
0
5
10
15
20
25
30
Frequency
Height (cms)
Manikandan 21
Features of distributions
When you assess the overall pattern of
any distribution (which is the pattern
formed by all values of a particular
variable) look for:
number of peaks
general shape (skewed or symmetric)
centre
spread
Manikandan 22
Symmetrical Skewed
Manikandan 23
Unimodal Bimodal
Manikandan 24
Central tendency
 The point in the distribution around
which the data are centered e.g. mean,
mode, median
Manikandan 25
Measures of Central Tendency
 Arithmetic mean: Sum of all
values divided by number of
observations
1 2 5 3 4 5 8
Mean = 28/7 = 4
Manikandan 26
Geometric Mean
nth root of the product of the observations
When to use geometric mean?
When data is measured on logarithmic scale
(eg) Dilution of small pox vaccine
Manikandan 27
Measures of Central Tendency
 Mode Most common value observed
1 2 5 3 4 5 8
1 2 3 4 5 5 8
Mode = 5
For nominal data MODE is the most
appropriate measure
Manikandan 28
Measures of Central Tendency
Median Value that comes half-way when
the data are ranked in order
1 2 5 3 4 5 8
1 2 3 4 5 5 8
Median = 4
For ordinal data only MODE and MEDIAN
can be used
Manikandan 29
Which measure of central tendency is best
with a particular set of observations?
This depends on
 Scale of measurement
 Distribution of observations
Manikandan 30
Guidelines for using measures of central tendency
 The mean is used for numerical data and for
symmetric distribution.
 The median is used for ordinal data or for
numerical data if the distribution is skewed.
 The mode is used for bimodal distribution.
 The geometrical mean is used for observations
measured on a logarithmic scale.
Manikandan 31
Mean is not enough
Take 2 populations A & B
A consists of scores 10,10,10,10,10
B consists of 5,15,5,15,10
Mean in both cases is 10, but the
distributions are different
Hence a measure of variation is required
in addition to measure of central tendency
Manikandan 32
Dispersion (Spread)
The extent to which data are tightly
clustered around the central tendency
e.g. range, variation, standard deviation
Measure of uncertainty
Manikandan 33
medium
variability
high
variability
low
variability
Measures of Dispersion
Range: Difference between lowest and highest scores
in a set of data
 SD: describes the variability of observations
about the mean
 SEM: describes the variability of means
80, 70, 80, 5, 2, 3,1 Range=80-1=79
80, 6, 7, 30,12, 2,1 Range=80-1=79
80, 70, 80, 5, 2, 3,1 S.D.= 34.4 ± 39.7
80, 6, 7, 30,12, 2,1 S.D.= 19.7 ± 28.3
80, 70, 80, 5, 2, 3,1 SEM= 34.4 ± 15.0
80, 6, 7, 30,12, 2,1 SEM= 19.7 ± 10.7
Percentiles
Percentage of a distribution that is equal
to or below a particular number
When to use percentiles?
Compare an individual value with a norm
(eg) Interpret physical growth charts
Interquartile range
Difference between 25th and 75th percentiles, also
called the first and third quartile
Contains the central 50% observations
When to use interquartile range?
Describe central 50% of distribution
(regardeless of shape)
For skewed distribution, better than S.D for
describing the dispersion
Gitanjali 37
 Normal (Gaussian)
 Data are symmetrically distributed on both
sides of the mean
Forms a bell shaped curve in a frequency
distribution plot
 Non-normal (Non-Gaussian)
 Data are skewed to one side
 Bimodal, Poisson, Rectangular
Types of Distribution
Gitanjali 38
Normal (Gaussian) Distribution
Frequency
Parameter
Gitanjali 39
Normal (Gaussian) Distribution
 Characteristics:
 Symmetric
 Unimodal
 Extends +/- infinity
 Area under the curve=1
 Described by mean and
SD
 95% of observations lie
within 1.96 SD on either
side of the mean
Gitanjali 40
Non Normal Distribution
Some distributions fail to be symmetrical
If the tail on the left is longer than the right,
the distribution is negatively skewed (to the
left) e.g. No. of times a child with diarrhoea
passes stool
If the tail on the right is longer than
the left, the distribution is positively
skewed (to the right) e.g. No of
alleles responsible for a
polymorphism
Gitanjali 41
Can we know the shape of the distribution
without actually seeing it ?
 The mean is smaller than 2 S.D the
observations are probably skewed.
 If the mean and median are equal, the
distribution is symmetric.
 If the mean is larger than median, the
distribution is skewed to right.
 If the mean is smaller than median, the
distribution is skewed to left.

Descriptive statistics PG.ppt

  • 1.
  • 2.
    Manikandan 2 What isintended? Appropriate use of summary statistics When to use which statistic? What is not intended ? To scare you with the mathematical notation and statistical jargon
  • 3.
    Manikandan 3 How toget thro’ statistics with minimal difficulty....  Keep up with the class – the classes will be extremely cumulative.  Develop an understanding of the concept – work out the problems yourself  Spend quality time in studying – spending five hours on two days before exam is not as effective as spending one hour ten days  Look at the assigned material before lecture, review the covered material after lecture
  • 4.
    Manikandan 4 Do nothesitate to ask questions
  • 5.
    Manikandan 5 If youthink you can do a thing or think you can’t do a thing, you’re right. - Henry Ford
  • 6.
  • 7.
    Manikandan 7 Statistics : Aset of methods for organising, summarising and interpreting information. Population : The set of all individuals of interest in a particular study Sample : Set of individuals selected from a population, usually intended to represent the population
  • 8.
    Manikandan 8 Types ofstatistics DESCRIPTIVE STATISTICS Describing a phenomena Frequencies How many… Basic measurements Meters, seconds, cm3, IQ INFERENTIAL STATISTICS Inferences about phenomena Hypothesis Testing Proving or disproving theories Confidence Intervals If sample relates to the larger population Correlation Associations between phenomena Significance testing e.g diet and health
  • 9.
    Manikandan 9 Parameter : Avalue, usually a numerical value, that describes a population. Variable : Characteristic of an element whose value may differ from element to element Data : Measurements that are collected, recorded & summarised for presentation, analysis & interpretation
  • 10.
    Manikandan 10 Types ofData Data Qualitative Quantitative Discrete Continuous Ordinal Nominal
  • 11.
    Manikandan 11 Scales ofdata measurement Nominal scale :  Uses names or tags to distinguish one measurement from another  Each measurement is assigned to one of the limited number of categories  Does not imply magnitude of individual measurement  Cannot be ordered one above the other
  • 12.
    Manikandan 12 Examples ofnominal scale :  Gender (male/female)  Religion (Hindu, muslim, christian, sikh)
  • 13.
    Manikandan 13 Types ofData Nominal Proportion Expressed as: 10% Christians 18% Muslims 72 % Hindus 18 10 72
  • 14.
    Manikandan 14 Ordinal Scale Data can be placed in a meaningful order  No information about the size of the interval  No conclusion can be drawn about whether the difference between the first & second category is the same as second & third Examples :  Stages of cancer – I, II, IIIa, IIIb and IV
  • 15.
    Manikandan 15 Types ofData Ordinal Scores, ranks How do you rate the pain after the analgesic? Unbearable Very painful Mild pain No pain Painful Data can be arranged in an ORDER and RANKED Expressed as:
  • 16.
    Manikandan 16 Interval Scale A numerical unit of measurement is used  Difference between any two measurements can be clearly identified  There is no true zero point  No meaningful ratio can be got as there is no absolute zero
  • 17.
    Manikandan 17 Example ofinterval scale  Measurement of body temperature in celsius The difference bet. 100 & 90oC is the same as between 50 & 40oC.True zero point is absent as temp. can be below zero (-10oC). No meaningful ratio can be got – 100oC is not twice as hot as 50oC.
  • 18.
    Manikandan 18 Ratio Scale Similar to interval scale  Also has a absolute zero, and so meaningful ratios do exist. Examples : Most biomedical variables - weight, blood pressure, pulse rate.
  • 19.
    Gitanjali 19 Nominal MaleFemale Ordinal Short Medium Tall Ratio Measure exact height How do you measure?
  • 20.
    Manikandan 20 What isfrequency distribution?  The organization of raw data into several classes using frequency tally. e.g. Height (cm) No. of students 121-125 5 126-130 17 131-135 25 136-140 30 141-145 20 146-150 3 Total 100 0 5 10 15 20 25 30 Frequency Height (cms)
  • 21.
    Manikandan 21 Features ofdistributions When you assess the overall pattern of any distribution (which is the pattern formed by all values of a particular variable) look for: number of peaks general shape (skewed or symmetric) centre spread
  • 22.
  • 23.
  • 24.
    Manikandan 24 Central tendency The point in the distribution around which the data are centered e.g. mean, mode, median
  • 25.
    Manikandan 25 Measures ofCentral Tendency  Arithmetic mean: Sum of all values divided by number of observations 1 2 5 3 4 5 8 Mean = 28/7 = 4
  • 26.
    Manikandan 26 Geometric Mean nthroot of the product of the observations When to use geometric mean? When data is measured on logarithmic scale (eg) Dilution of small pox vaccine
  • 27.
    Manikandan 27 Measures ofCentral Tendency  Mode Most common value observed 1 2 5 3 4 5 8 1 2 3 4 5 5 8 Mode = 5 For nominal data MODE is the most appropriate measure
  • 28.
    Manikandan 28 Measures ofCentral Tendency Median Value that comes half-way when the data are ranked in order 1 2 5 3 4 5 8 1 2 3 4 5 5 8 Median = 4 For ordinal data only MODE and MEDIAN can be used
  • 29.
    Manikandan 29 Which measureof central tendency is best with a particular set of observations? This depends on  Scale of measurement  Distribution of observations
  • 30.
    Manikandan 30 Guidelines forusing measures of central tendency  The mean is used for numerical data and for symmetric distribution.  The median is used for ordinal data or for numerical data if the distribution is skewed.  The mode is used for bimodal distribution.  The geometrical mean is used for observations measured on a logarithmic scale.
  • 31.
    Manikandan 31 Mean isnot enough Take 2 populations A & B A consists of scores 10,10,10,10,10 B consists of 5,15,5,15,10 Mean in both cases is 10, but the distributions are different Hence a measure of variation is required in addition to measure of central tendency
  • 32.
    Manikandan 32 Dispersion (Spread) Theextent to which data are tightly clustered around the central tendency e.g. range, variation, standard deviation Measure of uncertainty
  • 33.
  • 34.
    Measures of Dispersion Range:Difference between lowest and highest scores in a set of data  SD: describes the variability of observations about the mean  SEM: describes the variability of means 80, 70, 80, 5, 2, 3,1 Range=80-1=79 80, 6, 7, 30,12, 2,1 Range=80-1=79 80, 70, 80, 5, 2, 3,1 S.D.= 34.4 ± 39.7 80, 6, 7, 30,12, 2,1 S.D.= 19.7 ± 28.3 80, 70, 80, 5, 2, 3,1 SEM= 34.4 ± 15.0 80, 6, 7, 30,12, 2,1 SEM= 19.7 ± 10.7
  • 35.
    Percentiles Percentage of adistribution that is equal to or below a particular number When to use percentiles? Compare an individual value with a norm (eg) Interpret physical growth charts
  • 36.
    Interquartile range Difference between25th and 75th percentiles, also called the first and third quartile Contains the central 50% observations When to use interquartile range? Describe central 50% of distribution (regardeless of shape) For skewed distribution, better than S.D for describing the dispersion
  • 37.
    Gitanjali 37  Normal(Gaussian)  Data are symmetrically distributed on both sides of the mean Forms a bell shaped curve in a frequency distribution plot  Non-normal (Non-Gaussian)  Data are skewed to one side  Bimodal, Poisson, Rectangular Types of Distribution
  • 38.
    Gitanjali 38 Normal (Gaussian)Distribution Frequency Parameter
  • 39.
    Gitanjali 39 Normal (Gaussian)Distribution  Characteristics:  Symmetric  Unimodal  Extends +/- infinity  Area under the curve=1  Described by mean and SD  95% of observations lie within 1.96 SD on either side of the mean
  • 40.
    Gitanjali 40 Non NormalDistribution Some distributions fail to be symmetrical If the tail on the left is longer than the right, the distribution is negatively skewed (to the left) e.g. No. of times a child with diarrhoea passes stool If the tail on the right is longer than the left, the distribution is positively skewed (to the right) e.g. No of alleles responsible for a polymorphism
  • 41.
    Gitanjali 41 Can weknow the shape of the distribution without actually seeing it ?  The mean is smaller than 2 S.D the observations are probably skewed.  If the mean and median are equal, the distribution is symmetric.  If the mean is larger than median, the distribution is skewed to right.  If the mean is smaller than median, the distribution is skewed to left.