Reliability
Presenter: Erlwinmer Reyes Mangmang
Outline
THE CONCEPT OF RELIABILITY
- Sources of Error Variance
RELIABILITY ESTIMATES
- Test-Retest Reliability Estimates
- Parallel-Forms & Alternate-Forms Reliability
Estimates
- Split-Half Reliability Estimates
- Other Methods of Estimating Internal Consistency
- Measures of Inter-Scorer Reliability
Outline
USING & INTERPRETING A COEFFICIENT OF
RELIABILITY
- The Purpose of the Reliability Coefficient
- The Nature of the Test
- The True Score Model of Measurement &
Alternatives to It
Definition of Reliability
- Lay man
- Psychometrics
- Reliability coefficient
Simple analogy
X=T + E
X= represent an observed score
T= represent a true score
E= represent error
Categories of error
•Random error
•Systematic error
Sources of error Variance
Test Construction
Test Administration
Scoring
Interpretation
Test Construction
ITEM SAMPLING or CONTENT
SAMPLING
Test Administration
•Test Environment
•Test taker Variables
•Examiner-related Variables
Test Scoring and Interpretation
•Computer Scoring
•Subjective test
•Assessment purposes
Other sources of error
•Methodological error
“RELIABILITY (solving) is not the
ultimate fact in the book of recording
angel”
-Stanley (1971)
Reliability Estimates
Test-Retest Reliability
Estimates
• Using the same instrument to measure the
same thing at two points in time
• Coefficient of stability
Parallel-Forms & Alternate-
Forms Reliability Estimates
Parallel forms (of a test)- each form of the test,
the means the variances of observed test scores
are equal
Alternate forms (of a test)- simply different
versions of a test that have been constructed so as
to be parallel
Example: Variable such as CONTENT & LEVEL OF
DIFFICULTY
Split-Half Reliability
Estimates
• Obtained by correlating two pairs of scores
obtained from equivalent halves of a
single test administered once
• SPEARMAN-BROWN FORMULA
Other Methods of Estimating
Internal Consistency
• KUDER-RICHARDSON FORMULAS
• CRONBACH ALPHA
• AVERAGE PROPORTIONAL DISTANCE (APD)
Measures of Inter-Scorer
Reliability
• Is the degree of agreement or consistency
between two or more scores (or judges or
raters) with regard to a particular measure
USING AND INTERPRETING A
COEFFICIENT OF RELIABILITY
Guide question…
How high should the coefficient of reliability be?
Answer:
1. “on a continuum relative to the purpose and
importance of the decisions to be made on the
basis of scores on the test”
2. A .95 or higher (important decisions)
B .85 to .90
B- .75 to 80s
F .74 and below (barely passing)
The purpose of the Reliability
Coefficient
Nature of the test
•Homogenous or heterogeneous
•Dynamic or static
•Restriction or inflation of range
•Speed test or power test
•Criterion referenced tests
The true score model of
measurement and Alternatives
to it
• Classical test theory
• Domain sampling theory and
generalizability theory
• Item response theory
End of slide
XU advance psych testing and
assessment 16-17
Thank you 

Reliability for testing and assessment

Editor's Notes

  • #5 LAY MAN- synonym of dependability or consistency PSYCHOMETRICS- refers to the consistency in measurement In everyday conversation reliability always connotes something positive, in psychometric sense it really only refers to something that is consistent Reliability coefficient- is an index of reliability, a proportion that indicates the ratio between the true score variance on test and the total variance
  • #7 RANDOM ERROR- sudden noise pollution, school rally, truck sound SYSTEMATIC ERROR- the use of ruler that is a tenth of one inch longer. The weighing scale of the biggest loser
  • #9 One source of variance during test construction is ITEM SAMPLING or CONTENT SAMPLING FROM THE PERSPECTIVE OF A TEST CREATOR, A CHALLENGE IN TEST DEVELOPMENT IS TO MAXIMIZE THE PROPORTION OF THE TOTAL VARIANCE THAT IS TRUE VARIANCE AND TO MINIMIZE THE PROPORTION OF THE TOTAL VARIANCE THAT IS ERROR VARIANCE
  • #10 3 bullets in this slide TE- room temperature, level of lighting, ventilation & noise TTV- emotional problems, physical discomfort, lack of sleep & drugs or medication ERV- physical appearance & demeanor, presence or absence of examiner
  • #11 COMPUTER SCORING- in the advent of modern technology, computer-scorable items have virtually eliminated error variance SUBJECTIVE TEST- well trained professional should be very objective about it ASSESSMENT PURPOSES- assessment as we all know is crucial hence systematic and objective decision making must be observed (i.e group co.leader in group process) Next slide is reliability estimates
  • #12 Methods that doesn’t qualify under the diferent types of error variance
  • #13 A quote before the Reliability Estimates
  • #14 Its hard to know the true variance of a certain test or assessment method so what we can do best is to estimate as much as we can
  • #15 COEFFICIENT OF STABILITY- when the interval between testing is greater than six months, the estimate of test-retest reliability is often refered as coefficient of stability
  • #16 SIMILARITY Two test with the same administrations with the same group are required Test scores may be affected by factors such as motivation, fatigue, or any intervening events
  • #17 SPEARMAN BROWN- psychometric reliability to test length and used by psychometricians to predict the reliability of a test after changing the test length
  • #18 1. Essentially it lets you know whether the exam as a whole discriminated among students who mastered the subject matter and those who did not. The KR(20) generally ranges between 0.0 and +1.0, but it can fall below 0.0 with smaller sample sizes. 2. Corellation of 2 test that measures the same construct 3. The APD is a measure that focuses on the degree of difference that exists between item scores
  • #19 NEXT OUTLINE NEXT SLIDE
  • #23 Restriction or inflation of range- is the correlation analysis restricted by the sampling procedure used - Is the range of variances employed is appropriate to the objective of the correlational analysis