interest, attitudes, values, motives, temperament, non-cognitive aspects of personality
-structure personality instruments
when was early testing done?
greeks - 2500 years ago
chinese - 2000 years ago
who was credited with launching the testing movement - emphasis on sensory perception and intelligence?
who was credited with founding the science of psychology - furthered intelligence testing?
who expanded testing to include memory and other sample mental processes?
james mckeen catell
what is binet-simon scale (1905)?
-assessed judgement, comprehension & reasoning
-ratio of mental age to chronological age (IQ)
what edition is the stanford-binet scale on?
world war 1 - group testing (army alpha & army beta)?
use to assign work role/tasks
frank parson's "father of guidance"?
promoted more systemic views of assessment and career
theoretical basis became a debating concerning definition of?
interest in assessment spread beyond intelligence - led to development of self-report personality inventories such as?
rorschach inkblots developed in 1921
aptitude tests developed for ?
selecting and classifying industrial personnel
standford achievement test (1923) was ?
first standardize achievement battery
the first edition of mental measurements yearbook was when?
dissatisfaction with exisitng persoanlity instruments led to :
projective techniques became popular
MMPI developed (early 1940)
presented individual from "faking" results
standardized achievement tests well-established in public schools?
multiple aptitude batteries appeared after 1940
criticisms of assessment began to emerge:
need for standards (APA)
need for centralized test publication
condensing of testing agencies/publishers allowed for electronic scoring
increased accessibility of tests
examination and evaluation of testing and assessment - widespread public concern:
misuse of testing isntruments
family educational rights and privacy act (1974):
mandated right to view educational records, including testing
required permission of parents for many types of testing
use of computers blossomed : administration, scoring, interpretation, computer-adapted testing, report-writing-
customized testing based on test-taker
revision of instruments in response to criticism
increase in cultural diversity and awareness of test bias
increasing use of authentic and portfolio assessment
educational setting s- evaluation through multiple tests instruments, assessments
in 1900's what happened?
standford binet scale
world war 1 - group testing
frank parsons "father of guidance"
in 1920's and 1930's what happened?
theoretical basis became a debating concerning definition of intelligence
interest in assessment spread beyond intelligence - led to development of self-report personality inventories
aptitude tests developed for selecting and classifying industrial personnel
development of vocational counseling instruments
stanford achievement test
first edition of mental measurements yearbook
what happened in 1940s and 1950s?
dissatisfaction with existing personality instruments
standarized achievement tests well-established in public schools
criticisms of assessment began to emerge
what happened in 1960s and 1970s?
examination and evaluation of testing and assessment - widespread public concern
1970's grassroots movement for "minimum competency" testing for high school graduates
family educational rights and privacy act
increased use of computes in assessment
what happened in 1980's and 1990s?
use of computers blossomed: administration, scoring, interpretation, computer-adapted testing, report-writing
revision of instruments in response to cricism
increasing use of authentic and portfolio assessment
what happend in 2000s to the present?
influences on technology and the internet
research on multicultural issues
achievement testing & no child left behind
increased intetest in accountability and effectiveness data
revision of standards and DSM
measurement scale in which numbers are assigned to attributes of objects or classes of objects solely for the purpose of identifying the objects
a scale on which data is shown simply in order of magnitude since there is no standard of measurement of differences
what is interval?
scale of measurement for which the difference between two points on the scale is meaningful.
what is ratio ?
The ratio scale
of measurement is similar to the interval scale in that it represents
quantity and has equality of units. One difference is that this scale
also has an absolute zero, with no numbers existing below the zero.
individual's score is compared to performance of others who have taken the same instrument (norming group)
example : personality inventory
evaluating the norming group by size , sampling, and representation
criterion referenced instruments -
individual's performance is compared to specific criterion or standard
example : third-grade spelling tests
how are standards determined? common practice, professional organizations or experts, empirically-determined
what is frequency distribution?
mathematical function showing the number of instances in which a variable takes each of its possible values
what is frequency polygon?
A graph made by joining the middle-top points of the columns of a frequency histogram
what is histogram?
diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval
what is mode?
most frequent score
what is median?
evenly divides scores into two halves (50% of scores fall above, 50% fall below)
what is mean?
arithmetic average of the scores
what are measures of variability?
range, variance, standard deviation
what are measures of central tendency?
mean, median, mode
what is range?
highest score minus lowest score
what is variance?
sum of squared deviations from the mean
what is standard deviation?
square root of variance
what is normal distribution?
a function that represents the distribution of many random variables as a symmetrical bell-shaped graph
what is skewed distribution?
bell-shaped curve with the most scores in the middle and tapering off toward the higher and lower scores. A skewed distribution is when the most scores fall to either the high or low end instead of in the middle
what are types of scores?
percentile scores/percentile ranks
what are standard scores?
z scores, t scores, stanines, age/grade-equivalent scores
how can we interpret percentiles?
98th percentile - 98% of the group had a score at or below this individual's score
32nd percentile - 32% of the group had a score at or below this individual's score
if there were 100 people taking the assessment, 32 of them would have a score at or below this individual's score
units are not equal
useful for providing information about relative position in normative sample
not useful for indicating amount of difference between scores
what is z score?
how many standard deviations from the mean a value is
what is t score?
scores scaled to have a mean of 50 and a standard deviation of 10
what is stanines?
method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two
what are problematic scores?
problematic because do not reflect precise performance on an instrument
learning does not always occur in equal developmental levels
instruments vary in scoring
adequacy of norming group depends on:
clients being assessed
purpose of the assessment
how information will be used
examine methods used for selecting group
examine characteristics of norming group
methods for selecting norming group:
simple random sample
what is stratified sample?
method of sampling from a population
what is cluster sample?
technique that generates statistics about certain populations
norming group characteristics -
assessment report -
goal is to gain experience giving a formalized assessment in a safe situations and then using the results to decide treatment interventions and therapeutic goals - less emphasis on the construction of the instrument
instrument review -
goal is to understand more about how a particular instrument is created - by whom, how, psychometric, the intended use, and your best clinical opinions of the quality of the instrument based on what you know thus far
classical test theory
every observed score is a combination of true score/ability and error
reliability coefficient (ex 0.80) - observed variance vs error variance
systematic versus unsystematic error
reliability only takes unsystematic error into account
often based on consistency between two sets of scores
statistical technique used to examine consistency (relationship between scores)
correlation coefficient : ranges from -1.00 to +1.00 (.00 indicates lack of a correlation or evidence)
is a relationship between two variables such that their values increase or decrease together
A relationship between two variables in which one variable increases as the other decreases
pearson-product moment correlation coefficient -
most common (most practical sense, +, -, large, small)
examines z scores between administrations (product of variance) and the number of individuals
coefficient of determination -
the percentage of shared variance between two sets of data
types of reliability -
internal consistency measures
what is test/retest?
variation between first and second administration
what is alternate/parallel forms?
two versions of test, ca n be administered closer together, eliminates memorized responses
what is itnernal consistency measures?
split half - superman-brown formula
coefficient alpha (cronbachs alpha) for non-dichtonous test items
nontypical situations -
typical methods for determining reliability may not be suitable for :
be knowledgeable about reliability coefficients of other instruments int that area
examine characteristics of particular clients against reliability coefficients
number of times of relaibility testing and type of reliability measures
standard error of measurement -
provides estimate of range of scores if someone were to take instrument repeatedly (attempt to indicate an individuals 'true score'
concerns what instrument measures and how well it does so
not something instrument "has or does not have"
informs counselor when it is appropriate to use instrument and what can be inferred from results
reliability is a prerequisite for validity
traditional categories of validity -
content related - items represent intended behavior
criterion - related - how instrument relates to an outcome (SAT)
construct - extent to which instrument measures theoretical or hypothetical construct
evidence based on test content -
degree to which the evidence indicates that items, questions, or tasks adequately represent intended behavior domain
central focus is typically on how the instrument's content was determind
content-related validation evidence should not be confused with face validity (instrument appears to be a good measure)
evidence based on response processes -
concerns whether individuals respond in a manner consistent with construct measured
how do individuals think aloud
may also examine information processing differences by subgroup
evidence based on internal structure -
measuring one construct or the use of scales and subscales?
examining internal structure using factor analysis (comparison of subscale sets)
can also examine internal structure of instrument for different subgroups (differential item functioning) - (example , women consistently get item correct on particular item while men consistently get it incorrect)
evidence based on relations to other variables -
related to other variable which it should in theory be related to
not related to other variables from which it should differ
prediction/instrument-criterion relationship -
concurrent validity - assesses current context, no gap in time - useful for diagnosing in session
predictive validity - gap in time between administration and collecting criterion evidence
cousin of correlation
used to determine usefulness of variable or a set of variables in relation to other meaningful variable we are trying to predict
regression line - instrument scores vs criterion - line of best fit
standard error of estimate -
no perfect reliability or content - related validity
the margin of expected error in the individual's predicted criterion score as a result of imperfect validity
used to give clinicians a range for practical significance by acounting for error
decision theory -
"group separation" or "expectancy tables"
do the scores of the instrument correctly differentiate into performance or diagnostic realms
gathering of how often the instrument is right, and how often it is incorrect
validity generalization -
method of combining validation studies to determine if validity evidence can be generalized to new situations
must have a substantial number of studies
must use meta-analytiv procedures (exploration of many studies and different research factors)
evidence based consequences of testing -
examples of social consequences :
group differences on tests used for employment selection
group differences in placement in special education
desire to see context and system in which instrument was administered
counselors should consider both validation evidence and social implications
conclusions on different types of validation evidence :
gradual accumulation of evidence
no clear decision of construct is or is not present
consult validity prior to interpreting results
counselor must evaluate the information to determine if it is appropriate for which client under what circumstance
validation evidence should also be considered in informal assessments
item analysis -
examining and evaluatin each item in instrument (validity refers to instrument as a whole)
useful in development and revision process
item difficulty (must consider other variables)
item discrimination - does the item differentiate among responders on behavior domain
item response theory
focus is on each item and establishing items that measure ability or level of a latent trait
emphasis is on each individual item rather than the sum of the whole
particular interest in which items the persons answers correctly or affirmatively
selection of assessment instruments/strategies
determine what information is needed
analyze strategies for obtaining information
search assessment resources
evaluate assessment strategies
select an assessment instrument or strategy
determine information needed -
identify - information needed for specific client, general information clinicians in an organization need about clients
consider information already available
analyze strategies for obtaining information -
formal or informal techniques
consider which assessment method would be best suited to clients
consider professonal limitations and which instruments counselor can ethically administer and interpret
search assessment resources
mental measurements yearbook
educational testing services
test in print
tests: a comprehensive reference for assessments in psychology, education and business
directory of unpublished experimental mental measures
evaluate assessment strategies
appropriate selection of norming group or criterion
evaluate assessment strategies
interpretation of scoring materials
user qualifications (level a, b,c)
administering assessment instruments
read administration materials ahead of time
attend to time limits
know boundaries of what is acceptable
use administration checklist, if helpful
hand, computer, or internet scored (some can be self-scored)
before using computer scoring, investigate integrity of scoring service and steps used to develop program
some assessments require clinician judgement as part of scoring
scoring authentic / performance assessments
involve performance of "real" authentic applications, rather than proxies
objectivity in scoring is more difficult to achieve
scoring is enhance if : assessment has specific focus, scoring plan is based on qualities that can be directly observed, scoring is designed to reflect intended target, the setting for assessment is appropriate, observers use checklists or rating scales, score procedures have been field tested before use
often one of the most important parts of assessment process
clients who receive test interpretation have greater gains than those who dont
tentative interpretations more helpful than abosloute
clients prefer individual interpretation
guidelines for communicating results -
know information in manual (especially validity information)
optimize the power of the test
use effective general counseling skills
develop multiple methods of explaining results
use visual aids to explain technical terms
use descriptive terms rather than numerical
provide range of scores and rational for assessment
use probabilities rather than certainties, tentative interpretations rather than abosoloutes
discuss results in context of other information
involve clients in interpretation
monitor client reactions during interpretation
encourage client to ask questions
do not leave confused
communicating results to parents -
be prepared to answer questions and explain results
should understand testing used and symptoms of child's disorder
help parents adjust to diagnosis
be prepared to use variety of techniques
focus on active, coping approach
acknowledge parents emotions
purpose: to disseminate assessment information to parents or other professionals
valuate quality of reports before implementing suggested intervntions
expect a comprehensive overview of client and interpreation of results in contextual manner
should be carefully crafted with attention to detail
be alert for typographical errors, us of vague jargon, careless mistakes