Stats Definitions - designed experiment (direct control -Measurement Error (eg bad qn/
Ogive/Cumulative % Polygon (x-
- Variable: characteristic of an item or over who gets treatment) Hawthorne effect) axis: variable of interest, y-axis:
individual - observational studies (no control) cumulative %)
- Data: set of individual values Organizing - 2 Numerical Variables
associated with a variable Types of Samples - Categorical Scatter Plot
- Statistics: methods that help transform - Non-probability Summary Table (1 variable) Time Sequence
data into useful information Judgement (get opinions of Contingency Table (2 variables)
- Population: all the items/individuals experts) - Numerical Central Tendency
about which you want to draw a Convenience (easy) Ordered Array (rank from min to - Central Tendency: extent to which all
conclusion - Probability max) data values group around a central
- Sample: proportion of population Simple Random (equal chance Frequency Distribution value
selected for analysis of being picked) o Class, frequency, n N
- Population Parameter: summarizes Systematic (pick every kth frequency %, cum -
∑ xi , ∑ xi
i=1
the value of a specific variable of a person, where frequency, cum % x= μ= i=1
population n N
number of people o 5-15 classes
- Sample Statistic: summarizes the value k= n+1
sample ¿ ¿ ¿ o - Median (position) =
of a specific variable for sample data 2
Stratified (divide pop into strata range - Mode = most common value
class interval=
& select sample to mimics its no . of groups desired
Types of Variables - Should use both mean and median
characteristics) Cumulative Distribution
- Categorical since mean is affected by extreme
Cluster (divide pop into clusters;
Nominal (defined categories) outliers
each representative of pop) Visualizing
nominal scale - Summary
Ordinal (ordered categories) Variation
Comparing Sampling Methods Bar Chart
ordinal scale - Variation: amount of dispersion/
- Simple Random/Systematic Pie/Doughnut Chart
- Numerical scattering of values
Simple to use Pareto Chart (bar chart with
Discrete (counting) - Range = Xmax - Xmin (ignores distribution
Not a good representative of decreasing order of frequency + of data & is sensitive to outliers)
Continuous (measurement) population’s characteristics cumulative polygon) n N
Uses either interval scale (no 0 - S =∑ ¿ ¿ ¿), σ =∑ ¿ ¿ ¿
2 2
- Stratified - Contingency
point) or ratio scale (true 0 Ensures representation Side by Side Bar Chart i =1 i=1
n N
scale) - Cluster Doughnut Chart - S=√ ∑ ¿ ¿ ¿ , σ =√ ∑ ¿ ¿ ¿
Cost effective - Ordered Array i=1 i=1
Sources of Data Less efficient S
Stem-and-Leaf Display
- Data distributed by organizations or - Coefficient of Variation: CV = ×100
- Frequency/Cumulative Distribution x
individuals (eg weather report, financial Survey Errors Histogram (x-axis: midpoint, y- Always in %
statements) - Coverage Error/Selection Bias (exclude axis: freq/ rel freq/ freq %) Measures relative variation to
- Survey people) Polygon (x-axis: midpoint, y- mean
- Data collected by ongoing business - Nonresponse Error axis: freq %)
activities (eg Big Data) - Sampling Error (always exists)
Can be used to compare Q2-min > Q2-min = Q2-min < - Empirical: based on data collected No. of events in one area of
variability of 2/more data sets max-Q2 max-Q2 max-Q2 - Subjective: based on experience, opportunity is independent of
with different units Q1-min > Q1-min = Q1-min < opinion and analysis of a situation the no. of events in other areas
x −x max-Q3 max-Q3 max-Q3 of opportunity
- Z-score: Z=
S Q2-Q1 > Q3- Q2-Q1 = Q3- Q2-Q1 < Q3- Probability Formulas Probability that 2 or more
Extreme outlier if Z < -3.0 or Q2 Q2 Q2 - P(A U B) = P(A) + P(B) – P(A n B) events occur in an area of
Z > +3.0 *Just need to fulfill 2/3 conditions P ( A n B) opportunity approaches 0 as the
Measure of R/S Between 2 Variables - P(A|B) =
P (B) area of opportunity shrinks
Shape - Covariance: measures the strength of -If 2 events are independent, then P(A| Avg no. of events = λ
- Shape: pattern of distribution the linear relationship X & Y B) = P(A)
−λ
e λ
x
- - P ( X=x|λ )=
- Skewness: measures extent to which Discrete Random Variables X!
n
data values are not symmetrical (X)
Left-skewed (-ve):
∑ ( X i− X)(Y i −Y ) 1 - μ=E - μ= λ
- σ 2=λ
¿ - σ =E ( X ) −E(X )
2 2 2
mean<median<mode cov ( X , Y )= i=1 =
n−1 n−1 - σ =λ
Symmetric: cov(X,Y)>0 X & Y tend to Binomial Distribution
mean=median=mode move in the same direction - Characteristics: Normal Distribution
Right-skewed (+ve): X−μ
cov(X,Y)<0 X & Y tend to Fixed number of observations - Z=
mode<median<mean σ
move in the opposite Each observation is categorized
- Kurtosis: affects peakedness of curve - Empirical Rules:
direction into success or failure
Constant probability 1. μ ± σ=0.6826
cov(X,Y)=0 X & Y are
Observations are independent 2. μ ± 2σ =0.9544
independent
- 3. μ ±3 σ=0.9973
- Coefficient of Correlation: measures
n! n −x - Evaluating Normality
P ( X=x|n , π )=
x
the relative strength of the linear π (1−π)
x ! ( n−x ) ! Construct charts/graphs (check
relationship between X & Y
- μ=E ( X )=nπ for symmetry and bell shape)
cov (X , Y )
- r= Compute descriptive stats
S X SY - σ 2=nπ (1−π )
o Mean≈ Median≈ Mode
5 Number Summary *population coefficient of correlation = p - σ =√ nπ (1−π )
sample coefficient of correlation = r o IQR ≈ 1.33 σ
- min, Q1, Q2, Q3, max - Shape:
Closer to -1 stronger -ve o Range ≈ 6 σ
n+1 π < 0.5 right-skewed
- Q1 (position) = relationship Check theoretical properties
4 π > 0.5 left-skewed
Closer to +1 stronger +ve o Empirical rule holds
n+1 o ~80% lies within
- Q2 (position) = relationship Poisson Distribution
2 Closer to 0 weaker linear x ± 1.28 σ
3(n+1) - Characteristics:
- Q3 (position) = relationship Probability that an event occurs
4 Uniform Distribution
- IQR = Q3 – Q1 in one area of opportunity is the
Assessing Probability same for all areas of
Left- Symmetric Right- - Priori: based on prior knowledge opportunity
skewed skewed
- Uniform Distribution: probability
distribution that has equal probabilities
for all possible outcomes
- Probability = base x height
1
- f ( x )= if a ≤ X ≤ b
(b−a)
- Otherwise, f ( x )=0
a+ b
- μ=
2
2
(b−a)
- σ =√
12