The flashcards below were created by user
on FreezingBlue Flashcards.
Definition of Reliability
The quality of test such that it is consistent or whether it measures something consistently
Goal is to determine how much variability in test scores is due to errors in measurement and how much is due to variability in true test scores
Reliability is for a set of scores. SEM is for individual test items/scores
Observed Score and True Score
Observed: What you actually score on the test, as influenced by a number of variables
True: The 100% accurate reflection of what you really know. What the constructs are trying to measure- Impossible to get true score
Ranging between .00- 1.00. 0= Lack of reliability, 1= Total reliability. High stakes testing tries to be around .9 while most valid tests shoot for .7 and above. The number is due to true score variance. The other number then (.03) is the random measurement error.
It is the ratio of the true score variance to the total variance of the observed test scores
Types of Reliability (4)
BOLD is the reliability, italics is the error variance
Test retest- Used to determine whether a test is reliable over time. Correlate test score given at time 1, with test score given at time 2. Stability over time
Parallel (Alternate) Form- Used to determine if several forms of a test are reliable and cosistent. Correlate one test score with another. Equivalency of forms
Split-Half (Internal Consistency) Reliability- Determine if test asses one, and only one, dimension. Correlate each individual item with total test score. Use Crombach's Alpha. Consistency among items
Interrater reliability- Used to determine consistency between raters of a test. Examine percentage of agreements between raters on open ended items. Scorer consistency
Factors that contribute to reliability (test score) inconsistency
- Temporary but general characterstics of the participant: health, fatigue, emotional stress
- Specific characteristics: comprehension of specific test task, techniques in dealing with test materials, attention fluctuation
- Testing situation: freedome from distraction, clarity of instructions, sex/race of examiner
- luck of correctly guessing answer. Momentary distractions (brain farts)
Standard Error of Measurement (SEM)
True Score + SEM = Observed Score
SEM is for the individual. Measures the difference between item scores for the one individual.
S= Standard deviation, r=reliability coefficient
Tracks the difference between total test scores
Definition and Use of Sampling
Sampling- Ability to Generalize over time. Random selection allows for this the best
Sample- Actual observations about which we draw conclusions (usually the part of the population that is easiliest accessible)
Population- Complete set of observations in which we draw conclusions. Using entire population is known as a census
Probability vs Non-probability Sampling
- Probability- Each member of the sampling frame has a known probability of being selected. Not necessarilily equal for everyone
- -- Simple random, systematic, stratisfied, cluster
- Non-probability- Individual selection probabilities are unknown
- -- Convienience, Quota, Purposive
General Method of Sampling
Define the observation unit, define the target population, define the boundries of population, define sampling technique, obtain a sampling frame, select the sample
Types of Sampling: Simple Random Sampling
Every sample size of N has an equal chance of being selected from the pop. Problem is there is no control on make-up of sample. Other than size, you can not make any restrictions.
Ex:// Total 24, 9 females, 15 makes. Take a random sample of 6: MMMMMF
Types of Sampling: Systematic Sampling
- Selecting every K element of the sampling frame with a randomly selected starting point and selection determination.
- Ex:// Twenty people, start at #5 and then select every third
Types of Sampling: Stratisfied Sampling
The population is divided into sub-populations based on various characterstics. Categories are known as STRATA, then sample is randomly selected from each strata.
Can be selected proporationally or weighted
Ex:// if freshmen GPA of 4.0 comprised 1% of the population, then for a sample size of 1000, you would select 10 students from the group of 4.0s
Types of Sampling: Cluster Sampling
The clusters appear naturally in a population. Naturally forming groups. Population is divided into heterogeneous groups known as clusters.
After one or more cluster is selected, all members of the cluster are selected to participate. Often due to ease
Ex:// selecting a school in the area, then narrowing down to a sample of students (any students from an area high school)
Types of Sampling: Mid-cluster Sampling
- Two-stage sampling
- Cluster group is slected, then a random sample of groups is selected, then students selected from that.
Ex.// School, all freshmen, then students with 3.9
Types of Sampling: Probabilistic Cluster Sampling
Random or Stratisfied random. Probablility is based on size of unit rather than equal probability. Better for weighing at the individual level.
Easiest to generalize but does not give equal representation.
Types of Sampling: Convenience Sampling
Based on a selection of elements from a pool of easily accessible elements (subject pools or volunteers)
- Volunteers tend to be liberal thinking, more educated, higher ses, less representative of the general population
Types of Sampling: Quota Sampling
Begin with a matrix describing target population. Collect data from persons representing the population. Similar to Strata but no randomness
Ex:// Ask participants to volunteer until you get every 'cell' filled. Address the percentages in the sample
Types of Sampling: Purposive Sampling
Select on the basis of your knowledge of the population, its elements, and your research aims.
- - Often used for pilot testing or test manipulations
- - have a particular person in mind and then find what closestly represents it
- Best used when members of subset are visible but enumeration is impossible (test leaders but not entire pop.)
Disproportionate Sampling --> Weighting
Occurs when a population is over sampled and the sample is not proportionate to the total population ratio. Therefore, weighting is applied
Ex:// AA are 25% of population, CC 75% of population. Test 5 of each. CC multiply findings by 3
Sampling Errors: Examples and Reasons
Not errors that researchers made but errors true to the population.
Ex:// Sampling frams often do not include all members of the population yet it does not indicate why those groups are excluded
Non-response bias- Often the differences are demographics and are accounted for through statistical modeling
Relationship between Reliability and Validity
Can not have a valid test without it first being reliable. It has to measure the same thing over and over consistently (reliable), but what it is measuring may not be correct. This is where validity comes in.
Construct Validity- Purpose
Used to know if a TEST measures some underlying construct.
Correlates the set of test scores with some theorized outcome that reflects teh construct for what the test is being designed.
Not limited to numerical support. Can be theory, interpretations, qualitative evidence
Construct Validity- Definition
Must begin with an explicit statement of the proposed interpretation of test scores.
Construct is a group of interrelated variables. Its all about the inferences and decisions we make. How it is going to be interpreted and used.
Validity then is a set of scores for specific constructs for use with interpretation.
Answers the question 'Why am I using this instrument and what will the data show me?"
Use observations and then test for correlations between theoretical behaviors and actual behaviors.
Every use and purpose of the test needs to be validated. BUT validation is through scores, not through tests
Therefore if you determine that you do not have the validity evidence that you want (based on the 5 sources of evidence), then the test is not doing something that it should
Threats to Validation Process
Construct Irrelevant Variance- Something more than the construct is being tested
Construct Underrepresentation- Instrument does not fully represent or test all aspects of construct
Validation Sources of Evidence (5)
Test content, Response process, Internal Structure, Relation to other variables, Intended Consequences
Validation source of Evidence: Test Content
Matches with the instrument content (content match construct)
It can refer to themes, wording, format of items, tasks, as well as procedural guidlines for administration and scoring.
Validation source of Evidence: Response Process
Identified during either the developmental stages or during administration.
? Is the question requiring the person to do the action? (Apply, remember, etc.). Is the intended action being done?
Same question is also applied to the raters and judgers
Validation source of Evidence: Internal Structure
Identified during test development or adoption
How relationship between test items and test components conform to the constructs on which the tests score interpretations are based on.
?Is the item measuring the construct?
Conceptual framework is important. Can use CA or split-half for evidence, correlation coefficients
Validation source of Evidence: Relations to other variables
- High correlation coefficient with convergent- there is related tests scores and other measures intended to asses constructs
- and discriminant evidence- should not be related
Validation of source Evidence: Intended Consequences of Testing
? How accurately do tests predict critereon performance?
Have to wait and see. Do not know consequences so have to predict them and try to prevent. Usually deals with invalidity related to bias, fairness, distributive justice, etc.
Intended consequences: Shaping the curriculum, teaching to test
Unintended consequences: Narrowing curriculum, subpopulation differences, teaching to the test
Internal Validity: Definition and Purpose
Originates in experimental design and involves the interpretation/conclusions about a casual relationship (independent and dependent variables must be related and isolated from all other influences)
Concerned with the extent to which cause and effect relationship is isolated from competing influences
Threats to Internal Validity: Time Threats
Change occurs in the outcome variable measured overtime within subjects, because of factors other than independent variable (mostly with pre/post testing)
History, Maturation, Testing, Instrumentation
Threats to Internal Validity: History
A time threat. The situation in which some specific even occures during the study which in turn effects the results. It overpowers the treatment and occures during treatment
Solution: Isolate participants in a lab setting, use a control group, use shorter time, select a dependent variable less prone to the effects of history
Threats to Internal Validity: Maturation
A time threat. Changes in participants. Any naturally occuring process with individuals because of time that may cause a change in their performance. Could include fatigue, boredom, intellectual growth. Can NOT really be controlled
Solutions: Conduct study over a shorter period of time. Use a control group to compare maturation. Use group less prone to rapid maturation (NOT adolescents)
Threats to Internal Validity: Testing
(Test reactivity). A time threat. Testing on multiple occasions. Could be due to publication of a social indicator concerning the test. Participants are informed of this before the post-test
Solutions: Lengthen period between tests so they forget what is on the pre-test. Disguise use of pretest. Use control group so they have same reaction to published articles as experimental group
Threats to Internal Validity: Instrumentation
A time threat. Involves apparent changes instead of real changes. ONLY applies to pre/post test studies. Changes in instruments between pre/post tests (reviewers, points, criteria, structure, etc.)
Solutions: Standardized measurement procedure. Control group. Dont change things!
Threats to Internal Validation: Group Threats
Include rival explanations based on differences between groups (other than experimental ones intentionally created by researcher). They are NOT due to treatment, they are due to group composition/make-up. Has to have control/experiemental group
Statistical regression, selection, interactions with the selections
Threats to Internal Validation: Statistical Regression
Group threat. The scores of the post test always regress to the population mean. It operates to inflate change scores of low groups and to deflate change scores of high groups
Solutions: Avoid comparison of extreme groups, use highly reliable measures, use a control group for each extreme group.
Threats to Internal Validation: Selection
Group threat. An effect may be due to the difference between the kinds of person in one experimental group as opposed to another. ALL quasi-experimental groups sffer from this and it is NOT made better through random assignment.
Solutions: Random assignment, matching, check pretest equivalency-- uses Monova or Annova
Threats to Internal Validation: Interactions with selection
A group threat. Experimental groups mature at different rates, treatment groups come from difference settings so they experience events differently.
Solutions: Not much, random assignment possibly
Threats to Internal Validation: Mortality
A group threat. Results from subjects leaving the study for different/systematic reasons.
Solution: Use a control group if the reasons for mortality are the same, shorter time interval between study's start and finish. Use pretests to compare scores of those who dropped out with those who did not
Threats to Internal Validation: Atypical Behaviors
- Group threats. Refers to actions of those assigned to control groups that recieve less than favorable treatment.
- - Resentful demoralization: Reaction to getting no treatment
- - Diffusion of imitation of treatments- Control group talks with experimental groups. Finds out differences
- - Compensatory treatment- Administrators are reluctant to withhold treatment that may be favorable
Interactions between the groups or decisions made by those in power needs to be limited. Monitor and seperate, monitor and seperate, monitor and seperate.
External Validity: Definition and Purpose
The ability to generalize. Address the concern of how well does the relationship between two variables generalize accross settings, samples, and times.
External Validity Threats (3)
- Interaction with persons (sample) and treatment
- Interaction of setting and treatment
- Interaction of time (history) and treatment
External Validity Threats: Person
Treatment effects only generalize to those who are selected in the same way as the present sample.
Solution: Random selection (probablilistic technique), ensure participation is conveniant
External Validity Threats: Setting
Occurs when the treatment effects only generalize to those settings used in the study. Any other setting would not work.
Solution: Vary the settings and analyze the independent/dependent relationship within each setting
External Validity Threats: Time
Treatment effects only generalize to particular times, past or future. (Do not do it on or around special days)
Solution: Replicate the study at different times. Consult previous studies that may either confirm or refute relationships.