07.
17 PA Refresher Course Revised
Maximal Ability Test/Ability Test – overlap
Testing vs Assessment sometimes
Psychological Testing – the process of Intelligence Test
measuring psychology-related variables by o General potentiality (potential to
means of devices/procedures designed to solve problems, adapt to changing
obtain a sample of behavior circumstances, and profit from
Psychological Assessment – the gathering experience)
and integration of psychology-related data o Concerned with past and future
for the purpose of making a psychological o Binet-Simon Test: only administered
evaluation that is accomplished using tools
in one-on-one basis; school children
such as tests, interviews, case studies,
Aptitude test
behavioral observation, and specially
o Focuses on informal learning/life
designed apparatuses and measurement
experiences
procedures
o Measures potential to learn
o Evaluation (final decision); the one
o Concerned with the future
reflected in psychological report
Achievement test
o Integration: present ONLY in
o Designed to measure
assessment
accomplishment
Psychological Test – a device; to quantify
o given after
psychological variables; measures a construct
o Construct – indirectly observable, we o focuses on formal learning
believe that they exist and can be o only concerned with the past
measured o measures what has learned
Testing Assessment Interest test – dislikes/likes; does not tell whether
To obtain a score To answer a referral you will excel in a certain area
question
Diagnostic Test – used in screening, for diagnosis
Individual or by group Highly individualized
(clinical); for education, to identify
- Clinical setting
strengths/weaknesses
observing the individual
closely
Types of Psychological Tests
Highly objective (as is) Has some degree of
Personality Test (or Typical-performance
subjectivity (or
tests)
maturity/clinical
o measure typical behavior (traits,
judgment)
temperaments, and dispositions)
Tester is NOT key to the Assessor is key to the
o related to the overt and covert
process process of selecting
tests and/or other tools dispositions of the individual
of evaluation and in Structured (objective)
drawing conclusions o provides self-report statement
o choose between two or more
alternative responses
Ex. NEO-Pi-r
Projective – ambiguous test stimulus; Level C: require substantial understanding of testing
response requirements are unclear and supporting psychological fields together with
supervised experience in the use of these devices
Typical Ability Test – no right or wrong answer RPsy
Attitude Unstructured personality test
Interest Individually administered intelligence tests
Personality Flynn Effect
raw scores mean on intelligence tests has
Army Alpha – dependent on the ability to read and been increasing over the years
write Not observable in less advanced countries
Army Beta – illiterate adults Present in highly developed countries
Contained tasks such as mazes and picture Sampling Techniques
completion Simple Random - Equal chance of being
selected
The war created a demand for large-scale group Systematic – begins by listing all people in
testing the population, then randomly picking a
Robert Yerkes – army requested for his starting point on the list. Sample is then
assistance obtained by moving down the list, selecting
every nth name.
Informed Consent Stratified Random – identify the specific
Necessary for: subgroups (or strata) to be included in the
Risks and benefits sample. Select equal-sized random samples
How the obtained data would be stored from each of the pre-identified subgroups,
Limits of confidentiality using the same steps as in simple random
sampling
For persons who are legally incapable of giving o merong different groups, and make
Informed Consent, psychologists nevertheless sure that each group is equally
Provide an appropriate explanation represented
Seek the individual’s assent o recognizes that there are subgroups
Obtain appropriate permission from a legally o approach is randomized recruitment
authorized person Cluster – random selection of groups instead
of individuals from a population; selecting
Consent is obtained except when groups NOT individuals
Testing is mandated by law or governmental Convenience – use participants who are easy
regulations to get
Testing is conducted as a routine educational, o Weakest way of recruiting
institutional, or organizational activity respondents
One purpose of the testing is to evaluate o Doesn’t recognize subgroups
decisional capacity o Quota
Levels of Psychological Tests type of convenience
Level A: can adequately be administered, scores, and sampling; identify specific
interpreted with the aid of the manual and a general subgroups to be included in
orientation the sample and then
Lowest level establishing quotas for
No license required individuals to be sampled
Ex. Teacher-made tests from each group
Level B: require some technical knowledge of the test similar with stratified, but
construction and use and of supporting psychological manner of recruitment is
and educational fields convenient
RPm Nuggets: Consider the profile of the target
Structured personality and intelligence tests population; sample should represent the interest of
the population Zero doesn’t mean absence
Ex. Fahrenheit and Celsius
Assumptions of Psychological Testing 4. Ratio
1. psychological traits and states exist Interval scale with the additional
2. psychological traits and states can be feature of an absolute zero point
quantified and measured Ratios of numbers do reflect ratios of
3. test-related behavior predicts non-test- magnitude
related behavior Ex. Number of students, pets,
4. tests and other measurement techniques amount of water in a container,
have strengths and weaknesses length, weight.
5. various sources of error are part of the Zero means wala talaga
assessment process
6. testing and assessment can be conducted in a
fair and unbiased manner
7. testing and assessment benefit society
Results should not be interpreted in isolation
Isolation = not considering other factors
Levels of Measurement
1. Nominal Mean – can be distorted by extreme values (outliers)
consists of a set of categories that Mode – most recurrent number in the distribution
have different names Median – not sensitive to outliers that’s why this is
measurements label and categorize recommended to use as an alternative to mean
observations, but do not make any Relative Size Measures of Variability
quantitative distinctions between Variance = SD2
observations Range = highest score minus lowest score
ex. Favorite color, brand of shoes,
gender, cellphone number, jersey Understanding the Standard Deviation
no., plate no. IQ Average = 100, SD = 15
categories; do not have proper order, 100 + 15 = 115 (1 SD above)
pantay-pantay lang; may be 100 – 15 = 85 (1 SD below)
numerical
2. Ordinal Therefore, the average range for the IQ is 85 to 115.
Organized in an ordered sequence
Measurements rank observations in
terms of size or magnitude
There’s a proper way of arranging the
observations
Likert scale (per item): ordinal ; Likert
scale (total scores/overall): interval
Hindi pantay-pantay yung gap
Ex. Sizes of t-shirt, ranking in an
exam, top 3 singers, severity of
disease
3. Interval
Intervals are of the same size
Equal differences between numbers
on scale reflect equal differences in
magnitude
Zero point on an interval scale is
arbitrary and does not indicate a zero
amount of the variable being
measured
Characteristics of the normal curve 1
Mean, median, mode is in the center of the Positive z-score is obtained when the raw
bell curve score is greater than the mean.
Symmetrical. Left and right portion contains Negative z-score is obtained when the raw
50% each. The entire curve covers 100% of score is smaller than the mean.
the distribution. We cannot compare test takers using raw
Most of the test takers’ score are close to scores; it doesn’t tell the whole story
average Converted raw score to locate the score in a
Low scores = close to the left tail distribution
High scores = right tail Location in a distribution
It is not skewed. For comparing average
The so-called normal range within which Z-score = 0 = average
approximately two-thirds of all scores fall is located
within one standard deviation above and below the Z-Score Formula
mean.
Nuggets:
Nuggets:
Ayaw sa z-score kasi may negative score.
Hence, misinterpretation.
Percentile Ranks – answers the question, “What
percent of the scores fall below a particular score?”
99th percentile = person outscored 99% of
the test takers
- Ilang percent ng testtakers, ilan sa kanila ung 1st percentile = outscored 1% of the test
makaka-average? 68 out of 100 test takers. takers
- Rule: 50th percentile = outscored 50% of the test
o 1 SD = 68% takers
o 2SD = 95%
Percentile – ilang percent of test takers ung natala
Standard deviation – how the scores, in general, mo
scatter around the mean
Higher SD = mas kalat yung score
Z-scores
Direction and degree that any given raw
score deviates from the mean of a
distribution on a scale of standard deviation
units.
o Indicates how far an individual raw
score falls from the mean of a
distribution
The z distribution has a mean of 0 and a SD of 0 z-score = 50th percentile
Z = 1.00 = 84th percentile international personal data sheet
Note: always compute the left area of the curve The Woodworth Psychoneurotic Inventory: for
personal psychological adjustment
Educational Assessment
Formative Assessment: an assessment aimed
at facilitating learning and evaluating it.
Summative Assessment: an assessment that
has a purely evaluative function
Classical Test Theory: observed score = true
score plus error
Error: nagpapalayo ng observed score from
the true score
Higher reliability = lower error = mas malapit
ang observed score sa true score
Normal VS Skewed Distribution Parallel forms: similarity in the results
obtained using two different forms of the test
o Concern is equivalence
o Computation of Pearson r for the
results obtained from both forms of
the same test
Inter-rater reliability: consistency in the
ratings given by the two or more evaluators
o Dapat same ung diagnosis among all
raters
Test-retest reliability: hindi gumagamit ng
Positively Skewed Distribution magkaibang form ng test
Mean > Median > Mode o Practice effect – kaya tumataas yung
the mean is greater than the median. Median score sa second take
is greater than the mode o Time sampling error
o Trait being measured does not
Negatively Skewed Distribution change over time (used in stable
Mode > Median > Mean observations)
The mode is greater than the median. o Traits, time, concern is stability
Median is greater than the mean o Establish stability over time kaya
Direction of the Relationship ginagamit lang sa mga traits that do
Positive relationship = direct relationship not change over time
Negative relationship = indirect relationship o Two administrations
Pearson-r vs Spearman’s Rho Alternate test: reliability reduces practice
Pearson-r – used in correlating variables in effect
interval and ratio scale o Content sampling error – reliability
Spearman’s Rho – used when one or both of the test can be compromised or
variables are of ordinal scaling undermined if the items in the
Both for correlation second form of the test do not match
Dependent t-test: comparing 2 observations from the items in the original form of the
the same group of respondents test; may discrepancy sa items in 2
Independent t-test: comparing data from two forms of test
different groups (for separate groups) KR 20
One-way ANOVA: deals with different groups When computing internal reliability
ANOVA Repeated Measures: dealing with the same used for dichotomous items
group has right or wrong answers
ANOVA: deals with 3 or more groups/observations Coefficient alpha
Wechsler-Bellevue Intelligence test: adult used for tests composed of items with no
right or wrong answers. Convergent – hahanap ng theoretically
Internal Consistency related with the one you’re measuring
1 administration, 1 set of tests Divergent – discriminant; 2 constructs na
High if all items in a test measure the same magkaiba/walang kinalaman with one
construct another
Cronbach’s alpha o Relationship should be weak
MacDonald’s ω – good alternative compared ---End---
to Cronbach alpha; measure of internal
consistency
Split-half reliability
Very high reliability = problematic = redundant
Domain sampling - sampling items from the
population of possible items that could be used in a
test
The greater the number of items = higher reliability,
kasi mas malawak yung coverage = tends to yiled
more reliable results
Face validity
the assessment of the validity from the
perspective of the test taker
simplest and least scientific
demonstrated when the superficial
appearance of a measurement measures
what it is supposed to measure.
Content Validity
items are examined if they are accurately
measure what supposed to measure
ensuring inspection of test items
educ setting; coverage of the test
When test developers ask experts to rate the
items they included in a newly constructed
test as useful, useful but not essential or not
useful
Criterion Validity
correlating with other measure; you stick
with the same variable
Correlating test scores with the
scores/observations recorded using an
external measure.
3 types:
1. Concurrent - other measure to correlate with
is available in the present
2. Predictive
3. Known groups
Construct Validity: Correlating a newly created test
and another measure that quantifies a similar
construct