Home > Preview
The flashcards below were created by user
on FreezingBlue Flashcards.
Validity represents the most important characteristics of data from measures used in HR selection. It shows what is assessed by selection measures and determines the kind of conclusions we can draw from data such measures produce.
- Definition: When we are concerned with the accuracy of judgements or inferences made from scores on selection measures, such as predictors we are interested in their validity. In this sense validity refers to the degree to which available evidence supports inferences made from scores on selection procedures.
- HR selection and selection prcedre validity, we are most interested in evidence that supports inferences regarding a selection procedure's job relatedness. We want to evaluate our inferences' that is, we want to know how accurate are the prediction we have made.
one way to illustrate the process of inference making from a measure's scores is to think of a common measure that many of us often make - a simple handshake. Thin of the last time you met someone for the first time and shook that person's hand. How did the hand feel? Cold? Clammy? Rough? Firm grip? Limp grip? Did you make attributions about that person? e.g. if the person's hand was rough, did you conclude- they must do physical labour? if it was clammy and cold - she must be nervous? if the grip limp - must not be assertive? if you had these feelings or similar - you drew inferences from a measure.
What is the evidence to support the inferences we have drawn? We study validity to collect evidence on inferences we can make from our measures.
In regards to selection we want to know how well a predictor (such as a test) is related to criteria important to us. If a predictor is correlated with job perfromance criteria then we can draw inferences from scores on the measure about individuals' future job performance in terms of these criteria.
eg. If we have an ability test that is related to job performance, then scores on the test can be used to infer a job candidate's ability to perform the job in question. Because test scores are related to job performance we can be assured that, on average, applicants who score high on the test will do well on the job.
Validity is not a measurement of procedure but the validity of inference that can be made from scores on a measure. There is not just one validity; there can be many. The numb er will depend on the number of valid inferences to be made for the criteria available.
Validity is not an inherent property of the test; rather it depends on the inferences we can legitimately make from scores on the test. A legitimate inference is one in which scores on the selection measure are related to some aspect of job success or performance. In some cases, validity of inferences is expressed quantitatively; in other cases, judgmentally.
The research process we go through in discovering what an how well a selection procedure measures is called validation. In validation, we conduct research studfies to accumulate information (that is evidence) on the meaning and value of our inferences from a selection procedure.
- The results of this process represent evidence that tells us what types of inferences may be made from scores obtained on the measurement device. e.g. suppose that a manager believes that a master's degree is essential for satisfactory performance in a job involving technical sales. The manager is inferring that possession of the degree leads to adequate job performance and that lack of the degree results in anacceptable job performance.
- In validating the use of the educational credential as a selection standard, the manager attempts to verify the inference that the degree is a useful predictor of future job success. The manager is not validating the educational selection per se, but rather the inferences made from it. Therefore, many validities may relate to the standard. Thus validation involves the research processes we go through in testing the appropriateness of our inferences.
Relibility involves dependability, consistency and precision of measurement - which are important characteristicvs of any measurement device, but it is possible to have a measure that is reliable but does not measure what we want for selection. E.g. a device that will measure colour of job applicants eyes in a precise and dependable manner. - this does not predict applicants job performance - eye colour has no relation with how well people perform their jobs.
- Knowledge regarding what is being measured by this selection procedure and what it means for predicting job performance makes validity information the most important standard for judging the value of a selection procedure.
- Reliability and validity go hand in hand, but we can have reliability without validity but we cannot have validity without reliability. High reliability is a necessary but not a sufficient condition for high validity. The more that reliability falls then the more likely it is that maximum possible validity will aslo fall.
A validation study provides the evidence for determining the inference that can be made from scores on a selection measure. Most often , such as study is carried out to determine the accuracy of judgments made from scores on a predictor about important job behaviours as represented by a criterion. A number of different strategies are used for obtaining this evidence to see whether these inferences are accurate and can be supported.
- The different validation strategies include: content validation, criterion-related validation - both concurrent and predictive) and construct validation.
- additional strategies are validity generalzation and synthetic validity.
where possible, practitioners should rely on multipul validation strategies to support the use of a selection measure in employment decision making.
Content validation strategy
- the strategy is amenable to employment situations where only small numbers of applicants are actually being hired to fill a position. Such a situation is characteristic of many small businesses. When adequate amounts of predictor or criterion data are not available, the applicability of statistical procedures needed in other validation strategies is seriously curtailed.
- Adequate measures of job success criteria may not be readily available. For some jobs, such as those involving the delivery of services to clients customers etc. a suitable means for measuring employee success may be very difficult and prohibitively expensive to obtain. In such as context, content validation is a viable option because the need for quantitative measures of employee performance is minimized.
- Selection consultants employing a content validation strategy likely believe use of the strategy will lead to selection procedures that will enhance applicants' favourable perceptions of an organization's selection system.
- Content validation attempts to maximize the correspondence between the content of a job and content of selection procedures used for that job.
A selection measure has content validity when it can be shown that its content (items, questions etc) representatively samples the content of the job for which the mesure will be used.
"content of the job" is a collection of job behaviours and the associated knowledge skills and ability and other personal characteristics (KSAs) that are necessary for effective job performance. The behaviours and KSAs that represent the content of the job to be assessed are referred to as the job content domain. If a measure is to possess content validity, the content of the measure must be representative of the job content domain. (true for predictors and criteria).
The more closely the content of a selection procedure can be linked to actual job content, the more likely it is that a content validation strategy is appropriate.
- Content validation differs from other validation strategies in two important ways. First the prime emphasis in content validation is on the construction of a new measure rather than the validation of an existing one. The procedures employed are designed to help ensure that the measure being constructed representatively samples what is to be measure, such as the KSAs required on a job.
- The method principally emphasizes the role of expert judgement in determining the validity of a measure rather than relying on statistical methods. Judgements are used to describe the degree to which the content of a selection measure represents what is being assessed. Therefore content validation is called a form of descriptive validity. Emphasis on description, content validation contrasts with concurrent and predictive validation strategise where the emphasis is on statistical prediction.
Face validity only concerns the appearance of whether a measure is meauring what is intended. A selection test has face validity if it appears to job applicants taking the test that it is related to the job.
It is useful to avoid applicants filing discrimination charges against an organisation. If it has more face validity than applicants are going to be more accepting of the outcome.
Majore aspects of content validation - there are a number of ways for examining the link between content of a job and content of a predictor:
1. Showing that the predictor is a representative sample of the job domain.
2. Demonstrating that predictors content measure the KSAs required for successful performance on the job
3. using subject matter experts to make judgements regarding the overlap between the KSAs required to perform well on a predictor and those KSAs needed for successful job performance.
Key elements of implementing a content validity strategy:
- 1. Conduct of a Comprehensive Job Analysis - Job analysis is the heart of any validation study. In particular, job analysis is the essential ingredient in the successful conduct of a content validation study. The results of a job analysis serve to define the job content domain. The job content domain consists of the work activities and KSA and any other worker characteristics needed to perform these work activities. The identified job content domain need not include all work activities and KSAs that compose the job. Only those work activities and KSAs deemed most critical to the job need to be considered. By matching the identified job content domain to the content of the selection procedure, content validity is established.
- A job analysis should result in the following products:
- A. A description of the tasks performed on the job.
- B. Measures of the criticality and/or importance of the tasks.
- C. A specification of KSAs required to perform these critical tasks.
- D. Measures of the criticality and/or importance of KSAs which include
- 1- An operational definition of each KSA
- 2 - A description of the relationship between each KSA and each job task
- 3 - A description of the complexity/difficulty of obtaining each KSA
- 4 - A specification as to whether an employee is expected to possess each KSA before being placed on the job or being trained.
- 5 - An indication of whether each KSA is necessary for successful performance on the job.
- E. Linkage of important job tasks to important KSAs. Linking tasks to KSAs serves as a basis for determining the essential tasks to simulate in a behaviourally oriented selection measure, such as work simulation, or for determining critical KSAs to tap in a paper and pencil measure such as a multiple choice test.
- Each important job task identified in the job analysis will likely require at least some degree of a KSA for successful task performance. Here, the KSAs required to perform these tasks are specified. Most often, these KSAs are identified by working with subject matter experts who have considerable knowledge of the job and the necessary KSAs needed to perform it. This step typically involves subjective judgment on the part of participants in identifying the important KSAs. Because inferences are involved, the emphasis in this step is on defining specific KSAs for specific job tasks. By focusing on specific definitions of tasks and KSAs, the judgments involved in determining what KSAs are needed to perform which tasks are less likely to be subject to human error.
- 2. Selection of experts participating in a content validity study - as we have noted, the application of content validity requires the use of expert judgment. Usually, these judgments are obtained from job incumbents and supervisors serving as subject matter experts (SMEs). Subject matter experts are individuals who can provide accurate judgments about the tasks performed on the job, the KSAs necessary for performing these tasks, and any other information useful for developing selection measure content. Because of their importance, it is essential that these judges be carefully selected and trained. In using SMEs in a content validation study, it is important to report their qualifications and experience. Ideally, members of protected classes (e.g. gender and race/ethnicity) should be represented among the SMEs. Details should also be given regarding their training and the instructions they received while serving as SMEs. Finally, information on how SMEs made their judgments in the validation study - e.g. as individuals or in a group consensus - should also be provided.
- 3. Specification of Selection Measure Content - Once the job tasks and KSAs have been appropriately identified, the items, questions, or other content that compose the selection measure are specified. This phase of content validation is often referred to as domain sampling. That is, the items, questions, or other content are chosen to constitute the selection measure so they represent the behaviours or KSAs found important for job performance. The content included is the proportion to the relative importance of the job behaviours or KSAs found important for job performance. Subject matter experts who are knowledgeable of the job in question review the content of the measure and judge its suitability for the job. Final determination of selection measure content depends on these experts' judgements. To aid the review of selection measure content, such as a multiple-choice test or structure interview, Richard Barrett proposed the content validation form II (CVFII). Basically the form consists of questions that serve as a guide or audit for analysing the appropriateness of selection measure content in a content validation study. Individuals knowledgeable of the job (for example, incumbents) working with a test development specialist record their evaluations and the rationale behind them for each part of the test being reviewed. The questions on the CVFII are organised into three test review areas. The CVFII review areas and sample questions from each are as follows:
- A. Selection Procedure as a Whole - Is an adequate portion of the job covered by the selection procedure?- Does the selection procedure, combined with other procedures, measure the most important content of the job.
- B. Item by Item analysis- How important is the ability to respond correctly to successful job performance?- Are there serious consequences for safety or efficiency if the applicant does not have the information of skills needed to respond correctly?
- C. Supplementary Indications of Content Validity- Are the language and mathematics demands of the test commensurate with those required by the job?- Would applicants who are not test-wise be likely to do as well as they should?
- Predictor Fidelity
- In developing a predictor to be used in selection (for example, a selection interview, a work simulation, or a multiple-choice test), one key issue is the fidelity or comparability between the format and content of the predictor and the performance domain of the job in which the predictor will be used. Two types of predictor/job performance fidelity must be addressed: A = physical fidelity and B= psychological fidelity. Physical fidelity concerns the match between how a worker actually behaves on the job and how an applicant for that job is asked to behave on the predictor used in selection. E.g. a driving test for truck drive applicants that requires driving activities that an incumbent truck driver must actually perform, such as backing up a truck to a loading dock, has physical fidelity. A typing test requiring applicants to use certain word processing software to type actual business correspondence that is prepared by current employees using the same software also has physical fidelity.Psychological fidelity occurs when the same knowledge, skills, and abilities required to perform the job successfully are also required on the predictor. E.g. a patrol police officer may have to deal with hostile and angry citizens. Asking police officer job applicants to write a statement describing how they would handle a hostile individual is not likely to have psychological fidelity. KSAs different from those required on the job, such as the ability to express one's self in writing, may be called for by the written statement. On the other hand, psychological fidelity might be found in a role-playing simulation in which police patrol applicants interact with an actor playing the role of an angry citizen. Finally, one other point should be made with regard to selection measures- job fidelity. This issue concerns the extent to which a test measures KSAs that are not required by the job under study. This problem can be particularly troublesome with MC tests. The real issue with MC tests is whether they assess KSAs that are not required for successful job performance. E.g. candidates being assessed for promotion may be asked to memorize voluminous materials for a knowledge test, but on the job itself incumbents are not required to retrieve the information from memory to perform the job. Necessary information for performing the job can be obtained from available manuals or other reference sources.KSAs covered on the selection procedure were different from those required to perform the job.In general, when physical and psychological fidelity of the selection predictor mirrors the performance domain of the job, content validity is enhanced. Such as measure is psychologically similar because applicants are asked to demonstrate the behaviours knowledge, skills, and abilities required for successful incumbent performance.
- 4. Assessment of selection Measure and Job Content Relevance - another important element in content validation is determining the relevance of selection measure content for assessing the content of the job. After a test or some other measure has been developed, an essential step in determining content validity is to have job incumbents judge the degree to which KSAs identified from the job analysis are needed to answer the test questions. These incumbents actually take the test and then rate the extent of test content-to-KSA overlap. A rating instruction such as knowledge, skills and abilities needed to answer these questions (or perform these exercises)?
- 1= not at all
- 2= to a slight extent
- 3= to a moderate extent
- 4= to a great extent
- A quantitative procedure called the Content validity ratio (CVR) for determining the extent of KSA needed to answer each question. The CVR is an index computed on ratings made by a panel of experts (job incumbents and supervisors) regarding the degree of overlap between the content of a selection measure and content of the job. Each panel member is presented with the contents of a selection measure and asked to make ratings of these contents. E.g. the panel members are given items on a job knowledge test. Members judge each test item by indicating whether the KSA measure by the item is
- A - essential,
- B- useful but no essential or
- C- not necessary for job performance.
- These judgements are used in the following formula to produce a CVR for each item on the test: CVR=Ne-N/ over N/2 where Ne is the number of judges rating the test item as essential, and N is the total number of judges on the rating panel. The computed CVR can range from 1.00 (all rated essential) to 0.00 (half judges rated essential) to -1 (none rated it essential). Because it is possible for a CVR to occur by chance, it is tested for statistical significance using tables presented by Lawshe. Statistically significant items would suggest correspondence with the job. Nonsignificant items that most of the judges do not rate "essential" can be eliminated from the test. By averaging the CVR item indexes, a Content Validity Index (CVI) for the test as a whole can be also derived. The CVI indicates the extent to which the panellists believe the overall ability to perform on the test overlaps with the ability to perform on the job. Hence the index represents overall selection measure and job content overlap.
Content validation approach can have wide applicability in the selection of individuals for jobs requiring generally accepted KSAs e.g. reading ability, knowldege of mathematics, ability to read drawings etc).
where should the measures contents come from? for the content to be most representative, they should be derived from what incumbents actally do on the job.
Inappropriateness of content validation
- Because content validation is based principally on expert judgment, as long as selection measures assesses observable job behaviors (e.g a driving test used to measure a truck driver applicant's ability to drive a truck) the inferential leap in judging between what a selection device measures and the content of the job is likely to be rather small. However, the more abstract the nature of a job and the KSAs necessary to perform it, the greater the inferential leap required in judging the link between the content of the job and content of a selection measure.
- Where inferential leaps are large, error is more likely to be present. Therefore, it is much more difficult to accurately establish content validity for those jobs characteized by more abstract functions and KSAs than for jobs whose functions and KSAs are more observable. For these situations, other validation strategies, such as criterion-related ones, are necessary.
job analysis and content validation
A central concept of content validity is that selection measure content must appropriately sample the job content domain. When there is a congruence between the KSAs necessary for effective job performance and those KSAs necessary for successful performance on the selection procedure, then it is possible to infer how performance on the selection procedure relates to job success. Without a clearly established link between these two sets of KSAs, such as inference is unwarranted. Whenever the specific content of the measure and the KSAs required to perform the tasks of a job differ, then an inferential leap or judgment is necessary to determine whether the measure appropriately samples the job. How do we establish this job-KSA link with the selection-procedure-KSA link in a content validity study? A carefully performed, detailed job analysis is the foundation for any content validity study.
- First inference - from job itself to task identified as composing it. Where careful and thorough job analysis techniques focusing on job tasksz are used, the judgments necessary for determining whether the tasks accurately represent the job will probably have minimal error.
- Second inference point is from the tasks of the job to identify KSAs required for successful job performance. Here again, complete, thorough job analyses can minimize possible error.
- Third inference point (most critical). It is at the point that final judgments regarding content validity of the selection measure are made. Here we are concerned with the physical and psychological fidelity between the measure and the job performance domain.
- Specifically, to make the inferential leap supporting content validity, we must address three important issues that contribute to physical and psychological fidelity:
- 1. Does successful performance on the selection measure require the same KSAs needed for successful job performance?
- 2. Is the mode used for assessing test performance the same as the required for job or task performance?
- 3. Are KSAs not required for the job present in our predictor?
- If we can deal with these issues successfully, our inferential leaps can be kept small.
- For jobs that are directly observable (e.g. typist), only small inferences may be required in judging the relation between what is done on the job, the KSAs necessary to do the job, and the KSAs assessed by the selection measure.
- For jobs whose activites and work processes are less visible and more abstract (e.g. executive), greater inferences must be made between job activities, application requirements for successful performance, and selection measure content. - greater inferential leap means more likely errors and more difficulty in establishing content validity. Advisable to use another validation such as criterion-related approaches.
Uniform guidelines - rcognise the limits of content validation and specify some situations where content validation alone is not appropriate; in these situations, other validation methods must be used. These situations include the following:
- 1. When mental processes, psychological constructs, or personality traits (such as judgment, integrity, dependability, motivation) are not directly observable but inferred from the selection device.
- 2. When the selection procedure involves KSAs, an employee is expected to learn on the job.
- 3. When the content of the selection device does not resemble a work behavior; when the setting and administration of the selection procedure does not resemble the work setting.
As a validation metho, content validation differs from criterion-related techniques in the following ways:
- A. in content validity, the focus is on the selection measure itself, while in others, the focus is on an external variable;
- B. Criterion-related validity is narrowly based on a specific set of data, whereas content validity is based on a broader base of data and inference;
- C. A statement of criterion-related validity is couched in terms of precise quantitative indices (prediction), whereas content validity is generally characterized by using broader, more judgmental descriptors.
Because content validation emphasizes judgmental rather than statistical techniques for assessing the link between the selection standard and indicators of job success, some writers question its use as a validation strategy.
- Their main criticism is that content validity is primarily concerned with inferences about the construction of content of the selection procedure rather than with predictor scores. Thus because validity of selection standards concerns the accuracy of inferences made from predictors, some have argued that content validity is not really validity at all.
- In contrast to this criticism of content validity, there are situations (as we saw earlier) in which content validation is the only practical option available e.g. small sample sizes such as when only a few individuals are eligible for promotion into a position, may necessitate a content validation approach. in addition, reliable criterion information may not be available because of existing lawsuites in an organization or union actions that prohibit the collection of job performance data.
Logically, it might seem reasonable to conclude that inferences made from a content related strategy will overlap those inference made from criterion-reltated strategies.
- cognitive predictors (such as psychomotor and performance tests, selection interviews, biographical data scores, knowledge tests, work sample test) that are positively correlated with one another as well as with a measure of job success, whether there is a match or mismatch between content of the predictors and content of the job (content validity) is unlikely to ahve a meaningful influence on criterion-related validity. However, for personality inventories, matching predictor and job content (that is, establishing content validity) might be an important determinant of criterion-related validity.
- Basically, content validity reflects only the extent to which KSAs identified for the job domain are judged to be present ina selection measure. Therefore, as our database of inferences regarding validity is built, it is essential we include both content and criterion-reltated evidence.
Criterion Related Validation Strategies
- Inferences about performance on some criterion fro scores on a predictor, such as an ability test are best examined through the use of a criterion-related validation study. Two approaches are typically undertaken when conducting a criterion-related study:
- a. A concurrent validation study
- b. a predictive validation study
- In some ways these approaches are similar as information is collected on a predictor and a criterion, and statistical procedures are used to test for a relation between these two sources of data.
- Results from these procedures answer the question, Can valid inferences about job applicans' performance on the job be made based on how well they performed on our predictor? Although concurrent and predictive strategies share a number of similarities, we have chosen to discuss these strategies seperately in order to highlight their unique characteristics.
Concurrent validation (present employee method)
In concurrent validation strategy information is obtained on both a predictor and criterion for a current group of employees. Because predictor and criterion data are collected roughly at the same time, this approach has been labeled "concurrent validation". Once the two sets of data have ben collected, they are statistically correlated. The validity of the inference to be drawn from the measure is signified by a statistically significant relationship (usually determaned by a correlation coefficient) found between the predictor and measure of job success or criterion.
Steps for concurrent validation. (my version)
- 1. Thorough job analysis.
- 2. uncover critical tasks actually performed on the job.
- 3. Infer KSA and other characteristics necessary for successful job performance form critical tasks.
- 4. After identifying the requisite KSAs, the next step is to select or develop those tests that appear to measure the revelant attributes necessary for job success.
- 5. Seek expert advice to ensure tests are appropriate.
- 6. Conduct selected test on candidates already working in the firm - telling them participation is voluntary.
- 7. Part of job analysis - identify measures of job success that can serve as critera. Collect criterion information such as performance appraisal ratings or objective measures such as number of errors made.
- 8. Analyse results using statistical procedure e.g. pearson product-moment correlation coefficient)
Steps for Concurrent Validation - Text book version
- 1. Conduct analyses of the job.
- 2. Determine relevant KSAs and other characteristics required to perform the job successfully.
- 3. Choose or develop the experimental predictors of these KSAs.
- 4. Select Criteria of job success.
- 5. Aminister predictors to current employees and collect criterion data.
- 6. Analyze predictor and criterion data relationships.
Strengths and weaknesses of Concurrent validation.
- 1. almost immediate information on usefulness of a selection device.
- 1. Several factors can affect the usefullness of concurrent validation - difference in job tenure or lenght of employment of employees who participate in study, the representativeness or unrepresentativeness of present employees to job applicants, Certain individuals missing from validation study, the motivation of employees to participate in study or employee manipulation of answers to some selection predictors.
Predictive validation - (future employee or follow up method)
Rather than collective predictor and criterion data at one point in time (like concurrent) predictive validation involves the collection of data over time. In the context of HR selection, job applicants rather than job incumbents are used as the source of data.
steps for predictive validation (my version)
- 1. Job analysis
- 2. cohoice of criteria
- 3. selection of tests
- 4. test administration - tests administered to job applicants rather than current employees.
- 5. once administered the measures are filed and applicants selected on the basis of other available data( e.g. other tests, interviews etc.)
- 6. selection decision made.
- 7. after 6 months on the job, criterion data representing job successs is collected on applicants.
- 8. Two sets of scores are statistically correlated and examined for possible relationship.
steps for predictive validation (textbook)
- 1. conduct analyses of the job.
- 2. Determine relecant KSAs and other characteristics required to perform the job successfully.
- 3. Choose or develop the experimental predictors of these KSAs.
- 4. Select criteria of job success.
- 5. Administer predictors to job applicants and file results.
- 6. After passage of suitable period of time collect criterion data.
- 7. Analyze predictor and criterion data relationships.
Strengths and weaknesses of Predictive validation
- Strengths -
- because information is collected from applicants the motivation may be higher and more realistic when completing the predictor measure.
- Differences between individuals and subsequent applications with respect to job tenure is not an issue
- Takes longer to determine validity of the results.
- If organisation hires relatively few people a mont, it may take many months to obtain a sufficient sample size to conduct a predictive validation study.
What inferences will the design of a criterion-related study permit?
Types of predictive validation design
- 1. Follow up- Random Selection:
- Applicants are tested and selection is random; predictor scores are correlated with subsequently collected criterion data.
- 2. Follow up- present System:
- Applicants are tested and selection is based on whaterver selection procedures are already in use; predictor scores are correlated with subsequently collected criterion data.
- 3. Select by Predictor:
- Applicants are tested and selected on the basis of their predictor scores; predictor scores are correlated with subsequently collected criterion data.
- 4. Hire and then Test:
- Applicants are hired and places on the payroll; they are subsequently tested (e.g., during a training period), and predictor scores are correlated with criteria collected at a later time.
- 5. Personnel File Research:
- Applicants are hired and their personnel records contain references to test scores or other information that might serve as predictors. At a later date, criterion data are collected. The records are searched for information that might have been used and validated had it occurred to anyone earlier to do so.
Concurrent verses Predictive Validation Strategies
General assumption that a predictive validation design is superior to concurrent one.
- Review of 99 published criterion-related validity studies showed a greater number of validation studies based on a concurrent validation design than a predictive one. Minimal difference were found in validation results of two types of designs.
- Another review of validity estimates of ability tests revealed no significant differences in validity estimates derived from the two designs.
- For ability tests, results from these studies suggest that a concurrent validation approach may be just as viable as a predictive one.
- Predictive designs yield validity estimates roughly 0.05 to 0.1 lower than those obtained in concurrent designs for personality inventories, structured interviews, person-organization fit measures, and biographical data inventories.
Requirements for a criterion-Related Validation Study
- Certain minimum requirements must be met for a criterion related validation study.
- 4 requirements:
- 1. The job should be reasonably stable and not in a period of change or transition. The results of a study based on a situation at one point in time may not apply to the new situation.
- 2. A relevant, reliable criterion that is free from contamination must be available or feasible to develop.
- 3. It must be possible to base the validation study on a sample of people and jobs that is representative of people and jobs to which the results will be generalized.
- 4. A large enough, and representative, sample of people on whom both predictor and criterion data have been collected must be available. Large samples (more than several hundred) are frequently required to identify a predictor-criterion relationship if one really exists. With small samples, it may be mistakenly concluded that a predictor is not valisd when in fact it is. The probability of finding that a predictor is significantly related to a criterion when it is truly valid is lower with small sample sizes than with large ones. Therefor large sample sizes are essential.
Criterion-Related Validation over Time
- predictive validity of some measures rapidly decay over times.
- study found that the predictive validity of mental ability tests actually increased over time, job experience validity decreased, dexterity tests remained the same.
- properly developed and validated mental ability test should be valid for at least five years.
The courts and criterion-related validation
- leagal realities faced by employers and selection consultants:
- 1. Rather than considering empirical validity evidence, some courts pref to judge validity on the basis of format or content of the selection instrument (e.g MC format or content of an item)
- 2. Some courts were swayed by a test's legal history (e.g Wonderlic Personnel Test) even though exisiting evidence was available on the validity of the test; others were influenced by the type of test used (e.g. general aptituded tests such as vocabulary test).
- 3. Judgest had different preferences with regard to the use of a predictive validation strategy versus a concurrent validation strategy for demonstrating selection procedure validity.
- 4. A statistically significiant validty coefficient alone did not guarantee a judgement for the defendant; some courts also considered the utility of the selection measure. However, they differed as to what evidence is needed to demonstrate utility.
- 5. Judges differed on their willingness to accept statistical corrections (e.g. restriction of range corrections) to predictor scores. Some judges apparently believed that corrections by the defendant were misleading and done to make validity appear higher than it really was.
- The higher the validity coefficient, the better for selection practice and the better for legal defense.
Content versus criterion-related validation: Some requirements
The choice of a validation strategy implies that certain requirements must first be evaluated and met. Each validation strategies discussed up to this point has a particular set of requirements. These requirements must be met for a specific strategy to be viable. A review of these requirements provides a means for determining the feasibility of a particular validation methodology. Requirements serve as considerations for deciding the feasibility of a particular validation approach; they are not complete technical requirements.
Content validation feasibility considerations
- 1. Must be able to obtain a complete, documented analysis of each of the jobs for which the validation study is being conducted, which is used to identify the content domain of the job under study.
- 2. Applicable when a selection device purports to measure existing job skills, knowledge, or behaviour. Inference is that content of the selection device measures content of the job.
- 3. Although not necessarily required, should be able to show that a criterion related methodology is not feasible.
- 4. Inferential leap from content of the selection device to job content should be a small one.
- 5. Most likely to be viewed as suitable when skills and knowledge for doing a job are being measured.
- 6. Not suitable when abstract mental processes, constructs or traits are being measured or inferred.
- 7. May not provide sufficient validation evidence when applicants are being ranked.
- 8. A substantial amount of the critical ob behaviors and KSAs should be represented in the selection measure.
Criterion-related validation feasibility considerations.
- 1. Must be able to assume the job is reasonably stable and not undergoing change or evolution.
- 2. Must be able to obtain a relevant, reliable, and uncontaminated measure of job performance (that is, a criterion).
- 3. Should be based as much as possible on a sample that is representative of the people and jobs to which the results are to be generalized.
- 4. Should have adequate statistical power in order to identify a predictor-criterion relationship if one exists. To do so, must have:
- A. Adequate sample size;
- B. Variance or individual difference in scores on the selection measure and criterion
- 5. Must be able to obtain a complete analysis of each of the jobs for which the validation study is being conducted. Used to justify the predictors and criteria being studied.
- 6. Must be able to infer that performance on the selection measure can predict future job performance.
- 7. Must have ample resources in terms of time, staff and money.
Construct Validation strategy: tests our hypothesis. It is a research process involving the collection of evidence used to test hypotheses about relationships between measures and their constructs.
Instead of directly testing or using other information to predict job success, some selection methods seek to measure the degree to which an applicant possesses psychological traits called constructs. Constructs include intelligence, leadership ability, verbal ability, mechanical ability, manual dexterity, etc.Constructs deemed necessary for successful performance of jobs are inferred from job behaviors and activities as summarized in job descriptions. They are the job specifications part of job descriptions. Construct validity requires demonstrating that a statistically significant relationship exists between a selection procedure or test and the job construct it seeks to measure. For example, does a reading comprehension test reliably measure how well people can read and understand what they read?
Major steps for implementing a construct validation study as follows:
- 1. The construct is carefully defined and hypotheses formedconcerning the relationships between the construct and other variables.
- 2. A measure hypothesized to assess the construct is developed.
- 3. Studies testing the hypothesized relationships (formed in step 1) between the constructed measure and other, relevant variables are conducted.
Bescause construct validation may be conducted when no available measure exists, the "thing" or "construct" being validated requires a number of measurement operations. Results of studies such as the following are particularly helpful in construct validation.
- 1. Intercorrelations among the measure's parts should show whether the parts cluster into one or more groupings. The nature of these groupings should be consistent with how the construct is defined.
- 2. Parts of the measure belonging to the same grouping should be internally consistent or reliable.
- 3. Different measures assessing the same construct as our developed measure should be related with the developed measure. Measures assessing different constructs that are not hypothesized to be related to the construct of interest should be unrelated.
- 4. Content validity studies should show how experts have judged the manner in which parts of the measure were developed and how these parts of the measure sampled the job content domain.
Construct validation is a process of accumulating empirical evidence of what a selection measure measures. The more evidence we collect, the more assurance we have in our judgements that a measure is really doing what was intended.
Construct validation represents a much broader definition of validity than we might find in a single criterion-related or content validation study. Through accumulated evidence (that may come from other vaqlidation strategies, literature reviews, controlled experiments etc), we can answer what and how well a selection measure assess what it measures. Construct validation is still a developing issue. There is no complete, uniform agreement on the exact methods the strategy entails. Guture clarification of the strategy will also clarify its application.
Empirical considerations in criterion-related validation strategies - Even when we have conducted content validation studies on a selection measure, at some point weprobably will want to answer two important questions:
- 1. Is there a relationship between applicants' reponses to our selection measure and their performance on the job?
- 2. If so, is the relationship strong enough to warrent the measure's use in employment decision making?
computing validity coefficients - validity coefficient - is simply an index that summarizes the degree of relationship between a predictor and criterion. Where doe the validity coefficient come from? What does it mean?
- Ideally, we need at least several hundred people on who both predictor and criterion data are available. Large sample sizes are essential.
- For each employee we have a predictor and a criterion score.
- A scattergram or scatterplot of data is used to visually inspect any possible relationships between predictor and criterion variables. Each point in the graph represents a plot of the pair of scores for a single sales person. Although a scattergram is useful for estimating the existence and direction of a relationship, it really does not help us specify the degree of relationship between our selection measure and job performance.
- More precise approach is to calculate an index that will summarize the degree of any linear relationship - most often the pearson product moment or simple correlation coefficient (r) is used to provide that index.
the correlation coefficient/validity coefficient summarizes the relationship between our predictor and criterion.
- A validity coefficient has two important elements:
- A: Its sign
- B: Its magnitude.
- The sign (+or-) indicates the direction of a relationship, while its magnitude indicates the strength of association between a predictor and criterion. The coefficient itself can range from -1 to 0 to + 1. As it approaches 1, there is a positive relationship between performance on a selection measure and a criterion, eg. high predictor scores associated with high criterion scores and vice versa with Low scores.
- When the score moves towards -1 a negative or inverse relation appears between scores between predictor and criterion.
- When the validity coefficeient is not statistically significant or r is equal to 0.00, then no relationship exists between a predictor and a criterion.
- IF A CALIDITY COEFFICIENT IS NOT STATISTICALLY SIGNIFICANT THEN THE SELECTION MEASURE IS NOT A VALID PREDICTOR OF A CRITERION.
Importance of Large Sample Sizes
- The number of people on whom we have both predictor and criterion data for computing a validity coefficient is referred to as the sample size (or N) of a validation study. There are at least three reasons why it is absolutely essential to have as large a sample size as possible in calculating a validity coefficient:
- 1. A validity coefficient computed on a small sample (eg 20) must be higher in value to be considered statistically significant than a valididity coefficient based on a large sample (eg 220).
- 2. A validity coefficient computed on a small sample is less reliable than one based on a large sample. That is, if we took independent samples of pairs of predictor and criterion scorees and calculated the validity coefficient for each sample, there would be more variability in the magnitudes of the validity coefficients for small samples than if the sample sizes were large.
- 3. The chances of finding that a predictor is valid when the predictor is actually or truly valid is lower for small sample sizes than for large ones. A predictor may be truly valid, but the correlation coefficient may not detect it if the sample size on which the coefficient is base is small. The term statistical power is often used when describing a validation study. One way of thinking about statistical power is the ability of a validation study to detect a correclation between a selection procedure and a criterion when such a relationship actually exists. Small sample size isn a validation study have less statistical power than large samples. E.g. If a criterion related validation study had a sample size of 200, then there is about an 80 percent chance of identifying a validity coefficient of .20 or higher as being statistically significant. For a sample size of 100, the chances are roughly 50%.
A validity coefficeint can be computed for a small sample; if it is statistically significant, the predictor is considered valid. The coefficient itself is interpreted in exactly the same way as for the large sample size.
What is wrong with a small sample? As the sample size decreases, the probability of not finding a statistically significant relationship between predictor and criterion scores increases. Therefore, we would be more likely to conclude (prehaps incorrectly) that a predictor is not valid and is usless in selection. This could result in a sampling error which is pronounced when using small sample sizes in validation research, we might not detect the tru validity of a predictor. Therefore as large as possible sample sizes should be used.
Interpreting Validity Coefficients - What precisely does the coefficient mean?
- If our predictor is useful, it should help to explain some of these difference in performance. By squaring the validity coefficient, we can obtain an index that indicates our tests ability to account for these individual perfomance differences. This index , called the coefficient of determination, represents the percentage of variance in the criterion that can be explained by variance associated with the predictor.
- The coefficient of of determination of 0.64 is considered to be high.
- Only on relatively infrequent occasions do validity coefficients, especially for a single predictor much exceed 0.50; a more common size of coefficient is in the range of 0.3 to 0.5. Thus coefficients of determination for many validity coefficients will range from roughly 0.10 to 0.25.
- In addition to the coefficient of determination, expectancy tables and charts can be used.
- Utility analysis can also be used. Its computation is far more compex than the methods we just mentioned. Yet it offers, perhaps the ultimate interpretation ofa valid predictor and its impact in a selection program for managers in an organization. By translating the usefulness of a validity coefficient into dollars, utility analysis adds an economic interpretation to the meaning of a validity coefficient.
Predicition - a statistically significant validity coefficient is helpful in showing that for a group of persons a test is related to job success. However, the coefficient itself does not help us in prediciting the job success of individuals. Yet the predicition of an individuals likelihood of job success is preciselly what an employment manager wants. For individual prediction purposes, we can turn to the use of linear regression and expectancy charts to aid us in selection decision making. These should be develop only for those predictors that have proven to have a statistically significant relationship with the criterion.
In using these mehtos, a practitioner is simply taking predictor information, such as test scores, and predicting an individual's job success, such as rated job performance, from this information. For each metho, one key assumption is that we are utilizing information collected on a past or present group of employees and making predictions for a future group of employees.
Basically, linea regression involves the determination of ho changes in criterion scores are functionally related to changes in predictor scores. A regression equation is developed that mathematically describes the functional relationship between the predictor and criterion. Once the regression equation is known, criterion scores can then be predicted from predictor information. In general, there are two common types of liniear regression you are likely to come across: simple and multiple regression
- Simple Regression: There is only one predictor and one criterion. A line is fitted to the plotted scores on a scattergram called a regression line. It ummarizes the relationship between the inventory scores and the job performance ratings. The line is fitted statistically so that it is a minimum distance from each of the data points in the figure. The regression line represents the line of best fit.
- The data points around the regression line and the validity coefficient are closely related. The validity coefficient represents how well the regression line fits the data. As the validity coefficient approaches + or - 1.00, the data points move closer to the line. If a validity coefficient equals + pr -1.00, then the data points will fall exactly on the regression line itself, and predicition will be perfect. However, as the coefficient moves away from + or - 1.00 (toward 0.00), the points will be distributed further from the regression line and more error will exist in our predictions.
- The intercept is the value where the regression line crosses the Y-axis and represents an applican's predicted job performance if his or her sales ability inventory score were zero.
- The slope of the line is called a regression weight or regression coefficient because it is multiplied times the score on the predictor.
- The slope or regression weight represents the amount of change in the criterion ariable per one unit change in the predictor. Therefor for every one unit increase in an applicants inventory score we would expect an increase in job performance.
- A positive validity coefficient indicates a positive slope of the regression line and therefore a positive regression weight. A negative validity coefficient means a negative slope and negative regression weight.
- Once we have a regression line we can use it to predict our criterion scores.
- when correlation between inventory and job performance is not a perfect 1.oo, prediction will include some error. While it may seem that some of the errors may be quite large, the amount of errors are less using the predictor than not using a predictor.
we can use the standard error of estimate index for summarizing the degree of error in prediction. - The standard deviation of errors made in predicting a criterion from a selection predictor.
Multiple Regression - can be used to predict criterion scores for job applicants. Where as the simple regression model assumes oonly one predictor, multiple regression assums two or more predictors are being used to predict a criterion. If the additional predictors explain more of the individual differences among job applicants job performance than would have been explained by a single predictor alone, our ability to predict a criterion will be enhanced. As our ability to predict improves (e.g. validity increases), we will make fewer errors in predicting an applicant's subsequent job performance.
- To obtain a predicted job performance score, we would simply substitue an individual's two predictor scores in the equation, multiply the two scores times their regression weights, sum the producs and add the intercept value to obtain predicted performance.
- The multiple regression approach has also been called the compensatory model. It s call this because different combinations of predictor scores can be combined to yield the same predicted criterion score. Therefore if an applicant were to do rather poorly on one measure, he or she could compensate for this low score by performing better on the other measure. Examples of compensatory selection models include those frequently used as a basis for making admission decisions in some professional graduate schools.
Cross-Validation - Whenever Simple or multiple regression equations are used, they are developed to optimally predict the criterion for an existing group of people. But when the equations are applied to a new group,k the predictive accuracy of the equations will most always fall. This "shrinkage" in predictive accuracy occurs because the new group is not identical to the one on which the equations were developed.
Because of the possibility of error, it is important that the equations be tested for shrinkage prior to their implementation in selection decision making. This checkout process is called cross-validation.
Two general methods of cross validation are used:
A - Empirical estimation
B - Formual estimation
- With empirical cross validation, several approcaches can be taken. In general, a regression equation developed on one sample of individuals is applied to another sample of persons. If the the regression equation is "cross-validated" one common procedure of empirical cross-validation (split sample method) involves the following steps:
- 1. A group of people on whom predictor and criterion data are available is randomly divided into two groups.
- 2. A regression equation is developed on one of the groups (called the "weighting group")
- 3. The equation developed on the weighting group is used to predict the criterion for the other group (called the "holdout group").
- 4.Predicted criterion scores are obtained for each person in the holdout group.
- 5. For people in the holdout group, predicted criterion scores are then correlated with their actual criterion scores. A statistically significant correlation coefficient indicates that the regression equation is useful for individuals other than thos on whom the equation was developed.
- Although the "split sample" mehtod of cross-validation has been used, it can produce misleading results. Rather than splitting a sample into two groups, he recommends collecting data on a second, independent sample. Finding a second sample for data collection is not easy, as an alternativeto empirical cross-validation, formula cross-validation can be used. Under this procedure, only one sample of people is used.
- In general, formula cross validation is more efficient, simpler to use, and no less accurate than empirical cross validation.
- Cross validity cannot be accurately estimated when small sample sizes are involved; ideally, the ratio of the number of people in the validation study relative to the number of predictors should be roughly 10:1. With this ration 10:1 ration, Burkets cross-validation formula is recommended.
- Whatever the approach, cross-validation is essential. It should be routinely implemented whenever regression equations are used in prediction. Without it, you should be skeptical of regression equation predictions.
Expectancy Tables and Charts: an expectancy table is simply a table of numbers that shows the probability that a person with a particular predictor score will achieve a defined level of success. An expectancy chart presents esentially the same data except it provides a visual summarization of the relationship between a predictor and criterion. They are useful for communitcation the meaning of a validty coefficient. They are helpful as an aid in predicting the probability of success of job applicants.
- Five steps of constructing expectancy tables/charts:
- 1. Individuals on whom criterion data are available are divided into two groups: superior performers and the others. Roughly half of the individuals are in each group.
- 2. For each predictor score, frequencies of the number of employees in the Superior Performers and the Other groups are determined.
- 3. The predictor score distribution is divided into fifths.
- 4. The number and percentage of individuals in the Superior performers group and Others group are determined for each 'fifth' of the predictor score distribution.
- 5. An expectancy chart that depicts these percentages is then prepared.
- Two types of expectancy charts exist: A - individual and B- Institutional.
- The individual chart shows the probability that a person will achieve a particular level of performance given his or her score on the test. Therefore the individual chart permits individaul prediction.
- The institutional chart indicates what will happen within an organization if all applicants above a particular minimum scoreare hired. E.g. in our study of financial advisers, 77 percent of the applicants with a minimum score of 23 on the employment interview will be rated superior; 64% of those with a minimum score of 7 will be rated superior. By using the institutional chart, one can estimate what will happen in the organisation if various passing or cutoff scores are used for a selection measure.
Factors Affecting the Size of Validity Coefficients
The size of a validity coefficient is dependent on a variety of factors. Any number of factors may have an effect bu four seem to be predominant in determining the magnitude of a validity coefficient.
- 1. Reliability of criterion and predictor
- 2. Restriction of Range
- 3. Criterion Contamination
- 4. Violation of Statistical Assumptions
Reliability of criterion and predictor
The extent a predictor and criterion has error, it will be unreliable. The more error present, the more unreliable and unpredictable these variables will be. Any unreliability in either the criterion or predictor will lower the correlation or validity coefficient computed between the two. If both predictor and criterion variables have measurement error, error is compunded, and the validity coefficient will be lowered even futher.
- Because of the negative effect of lowered reliability on validity, we should strive for high reliability of both the predictor and criterion to get an accurate assessment of what true validity may be.
- If the validity coefficient is restricted or attenuated by unreliability of the predictor or criterion (measurement error), it is possible to make statistical adjustments to see what validity would be if the variables had perfect reliability. This adjustment is referred to as correction for attenuation.
- Although unreliability in predictor scores can be corrected, in HR selection situations we have to use predictor data as they normally exist. Selection decision are made witht he actual predictor information collected. Therefore, correction for attenuation in predictor scores is not made typically. Thre is a formula for correction of unreliability in criterion data.
- In using the correction forumula, accurate estimates of reliability are essential.Correction for unreliability measure recommendations include:
- 1. Report validity coefficients corrected for interrate reliability. However, this step assumes that accurate rating information on participants can be collected from more than one supervisor.
- 2. If employees have only one supervisor (which is often the case), collecting rating data from peers of those employees in the validation study.
- 3. If for some reason peer ratings cannot be obtained, other less-than-ideal solutions include
- a) correcting validity coefficients based on meta-analytic estimates of interrater reliability
- B) computing coefficient alpha for the ratings (tends to overestimate ratings reliability that there fore yield a conservative estimate of corrected predictor validity).
- Important to use and interpret the results of reliability correction formulas with caution when accurate estimates of criterion reliability are unknown.
Restriction of Range
One of the important assumptions in calculating a validity coefficient is that there is variance among individuals' scores on the criterion and predictor. By variance, we simply mean that people have different scores on the measures, that is, individual difference. When we calculate a validity coefficient,m we are asking "Do these predictor and criterion score difference co-vary or move together?" That is, are systematic difference among people on the criterion associated with their differences or the predictor? If there is little variance or range in individuals scores for one or both variables, then the magnitude of the validity coefficient will be lowered.
- Restriction in range is the term used to describe situations in which variance in scores on selection measure has been reduced. In selection practive, range restriction can occur in a number of circumstances. For instance in a predictive validation study, direct restriction occurs when and employer uses the test being validated as the basis for selection decision making. Indirect restriction happens when the test being validates id correlated with the procedures used for selection. Laterk, when individuals' test scores are correlated with their criterion scores, the validity coefficient will be curtailed. Range restriction occurs because individuals scoring low on the test were not hired. Their test scores could not be used in computing the validity coefficient becasue criterion data were unavailable.
- Criterion scores may also be restricted. Restriction of criterion scores may occur because turnover, transfer, or termination of employees has taken place prior to the collection of criterion data. Performance appraisal ratings migh als be restricted because raters did not discriminate among ratees in judging their job performance and gave them very similar rating.
- Restriction of scores can happen for either predictor or criterion or for both variables. Any restriction will lower computed validity. What we need to know is what validition would be if restriction had not occurred.
- A number of forumlas to correct in selection but only on the predictor not for criteria.
- Cautionary actions to consider in range restriction include:
- 1. there are 11 types of range restriction that can occur; applying the wrong fromula to the specific situation at hand can lead to an overestimate or an underestimate of true predictor validity.
- 2. Concerns remain regarding the specific range restriction corrections that should be made. Initial use of sound validation designs and measurement procedures can help mitigate some of these concerns and should be emphasized whenever possible.
- Correcting validity coefficients for both range restriction and criterion unreliability are teps that the society of organization psyc has recommended.
If scores on a criterion are influenced by variables other than the predictor, then criterion scores may be contaminated. The effect of contamination is to alter the magnitued of the validity coefficient. One criterion frequently used in validation studies is a performance evaluation rating. WE may want to know whether performance on a selection measure is associated with performance on the job. Performance ratings are sometimes subject to contamination or biased by extraneous variables such as gender and ethnicity of ratees or raters or by the job tenure of persons being rated. If criterion ratings are influenced by variables that have nothing to do with actual job performance, then our obtained validity coefficient will be affected. In some cases, the validity coefficient will be supriously high; in other, spuriously low.
an example of contamination would be where multiple people are performing tasks on the same machines, but there were variances in how the machines operated then the good criterion would be contaminated by testing the variances in the machines also.
When contaminating effects are known, they should be controlled either by statistical procedures such as partial correlation, b y the research design of the validation study itself or by adjustments to criterion data such as the computation of ratios. The reason for controlling contaminating variables is to obtain a more accurate reading of the true relationship between predictor and criterion.
Violation of statistical Assumptions
Among others, one important assumption of a person correlation is that a linear or straight-line relationship exists between a predictor and criterion. If the relationship is nonlinear, the validity coefficient will give an underestimate of the true relationship between the two variables.
When we know that ar relation exists, but Pearson statistic will not detect it; other analyses are called for. If we had simply computed the correlation without studying the scattergram, we could have drawn an incorrect conclusion. Prior to computing a validity coefficient, a scattergram should always be plotted and studied for the possibility of non-linear association.
Utility analysis - conversion of validity study in to dollars and cents terminology to communicate results to communicate to managers.
Utility analysis can be used to translate the results of a validation study into terms that are important to and understandable by managers. Utility analysis summarises the overall usefulness of a selection measure or selection system. Using dollars and cents terms as well as other measures such as percentage increases in output, utility analysis shows the degree to which use of a selection measure improves the quality of individuals selected verses what would have happened had the measure not been used.
Preliminary work on utility analysis:
In general, the more complex the job, the greater the spread or variability in worker job performance. If the variability in worker productivity is large, then the usefulness or utility of valid methods to hire the best-performing workers will also be large.
Productivity distrubutions - common conclusion, good workers produce roughly twice as much work as poor workers.
How much is a good worker worth?
If we identify and hire applicants who are good workers using a valid selection procedure, what will be the dollar return from using the procedure?
- The percentage of employees who will be identified as successful following the use of a valid test depends on three factors:
- 1. Validity coefficient - the correlation of the test with a criterion.
- 2. Selection ratio- the ratio of the number of persons hired to the number of applicants available. (The smaller the ratio, the more favorable for the organization, because it can be more selective in who is hired.)
- 3. Base rate - the percentage of employees successful on the job without use of the test. (A low base rate suggests that with the current selection system, there is difficulty in identifying satisfactory workers. Use of a valid test should improve the identification of satisfactory employees)
- The use of utility analysis is difficult to understand by managers and sole reliance on the results does not appear to be a guarantee that practitioners will be persuaded to adopt a particular selection program.
Broader perspectives of validity:
Content, concurrent and predictive validity have been the strategies traditionally employed in validation studies involving predictors used in personnel selection, However, other approaches have been developed that take a broader view of selection measure validity. Two approaches:
Validity Generalization and Job component validity.
- Validity generalization relies on evidence accumulated from multiple validation studies that shows the extent to which a predictor that is valid in one setting is valid in another similar setting.
- Job component validity is a process of inferring the validity of a predictor based on existing evidence, for a particular dimension or component of job performance.
For many years selection specialists noted that validity coefficients for the same selection instruments and criteria measures varied greatly for validation studies performed in different organizational settings. This was even true whe the jobs for which the selection program had been designed were very similar. It was concluded that the idiosyncrasies of jobs, organizations, and other unknown factors contributed to the difference in results that were obtained.
- Test validity does, in fact, generalize across situations. Much of the differences found in a test's validity for similar jobs and criteria across different validation studies are not due to situational specificity but rather to methodological deficiencies in the validation studies themselves.
- It was hypothesized that these deficiences that accounted for the differences among the validity coefficients reported in validation studies were due to the following factors:
- 1. The use of small sample sizes (sampling error).
- 2. Differences in test or predictor reliability
- 3. Differences in criterion reliability
- 4. Differences in the restriction of range of scores.
- 5. The amount and kind of criterion contamination and deficiency.
- 6. Computational and typographical errors.
- 7. Slight differences among tests though to be measuring the same attributes or constructs.
Validity generalization methods:
Validity generalization involves the statistical analyses of information accumulated from multiple validation studies involving a predictor whose results are combined to determine the overall effectiveness of the predictor in new employment settings or locations.
By combining data from multiple studies involving larger sample sizes than what would be available in a single valdiation study, a population validity coefficient can be derived to yield a more accurate picture of the tru validity of the predictor.
- The major steps are:
- 1. Obtain a large number of published and un published validation studies.
- 2. Compute the average validity coefficient for these studies.
- 3. Calculate the variance of differences among these validity coefficients. Subtract, from the amount of these differences, the variance due to the effects of small sample size.
- 4. Subtract, from the amount of these differences the variance due to the effects of small sample size.
- 5. Correct the average validity coefficient and the variance for errors that are due to other methodological deficiencies (that is, differences in criterion reliability, test reliability, and restriction in the range of scores).
- 6. Compare the corrected variance to the average validity coefficient to determine the variation in study refults.
- 7. If the difference among the validity coefficients are very small, then validity coefficient differences are concluded to be due to the methodological dificiences and not the nature of the situation. Therefore validity is generalizable across situations.
- Validity gehnralizability has been reported to extend pas mental ability testing to predictors such as biographical data, personality inventories,and assessment centers.
Conclusions from validity generalization studies
It is not necessary to conduct validity studies within each organization for every job. If the job of interst for the selection program is one of those for which validity generalization data have been reported, then the selection instruments reported in the validity generalization study can be used in the organization for selection.
This is because thre are no organization effects on validity; therefore, the same predictor can be used across all organizations for the relevant job or jobs.
- To set up the selection program, it is only necessary to show that the job with the organization is similar to the job in the valididty generalization study. If job similarity can be shown, then the selection measure can be used; a separate validation study is not needed. This reduces the time, effort and cost of establishing a valid selection program.
- Mental ability test can be expected to predict job performance in most, if not all, employment situations. However, their usefulness as a predictor depends upon the complexity of the job in question. Validity coefficients of these tests will be higher for jobs that are more complex than for less complex jobs. In other words, the more complex the job, the better mental ability tests will be able to predict job performance.
Criticisms of validity Gerneralization
Validity generalization studies apply correction formulas to the results of previous validity studies in order to correct the measurement deficiencies of these studies.
Conclusions regarding the validity of the predictor and generalizability of this validity estimate across organizations are based on the results of these correction formulas.
Ideally, these corrections should be made to each study in the validity generalization analyses using data supplied by that study. However, the validity studies usually do not report enough data to permit the corrections to be made in this manner. Instead, correction formulas use hypothetical values derived from other research work that are assumed to be appropriate for validity geralization analyses.
- Criticisms of validity generalization studies have focused on the appropriateness of the correction formulas. Many of these critical studies have used computer simulation as the method for examining appropriateness. In doing this, samples are generated from various populations of validation studies.
- Common finding of these simulation studies that, under many conditions, the validity generalization correction formaulas overestimate the amount of variance attributable to study deficiencies. The result of overestimates may be the rejection of the proposition that organizational difference affect validity coefficients more oftan than would be appropriate.
- other criticisms are:
- 1. Lumping together good and bad studies.
- 2. File drawer bias - Journals are less likely to publish negative resuls; results showing that a selection procedure is not valid.
- 3. Criterion unreliability -
- Concluded that criticisms are concerned more with fine-tuning the method of validity generalization and do not detract significantly from its overall methodology and conclusions.
- The numerous validity generalization studies available suggest that the validity of a test is more generalizable across similar situations and similar jobs than has been previously though. This conclusion should not be interpreted to mean that a test that is valid for one job will be valid for any other job. However, sufficient validity information on one job may be generalizable to other, very similar jobs. Ultimalte, if validity generalization evidence fo a test or someother predictor is available, a validation study may not be needed. Instead, it may be possible to use previous validation research to support the validity and use of the test for certain jobs.
Validity generalization Requirements
In using validity genralization evidence to support a proposed selection procedure, a user would need to take a number of steps. Assuming that an appropriate validity generalization study does not exist, the first step is to gahter all relevant validation studies, both published and unpublished, that investigate the relationship between the proposed selection procedure and the relevant criteria. Once these studies have been collected and proper meta-analytic methods ahve been applied to the studies, the following conditions must be met:
- 1. The user must be able to show that the proposed selection procedure assesses the same knowledge, skill, ability, or other characteristic, or that it is a representative example of the measure used in the validity generalization study database.
- 2. The user must be able to show that the job in the new employment setting is similar (in job behaviours or Knowledge, skills and abilities) to the jobs or group of jobs included in the validity genralization study database.
- If these conditions can be met, then the new employment settings shows evidence of selection procedure validity.
- To meet these conditions, specific supporting information is needed. This information is needed. This information includes:
- 1. Validity generalization evidence consisting of studies summarizing a selection measure's validity for similar jobs in other settings.
- 2. Data showing the similarity between the jobs for which the validity evidence is reported and the job in the new employment setting (an analysis of the job in the new employment setting is mandatory).
- 3. Data showing the similarity between the selection measures in other studies that compose the validity evidence, and those measures that are to be used in the new employment setting.
Job Component validity:
A validation strategy that incorporates a standardized means for obtaining information on the jobs for which a validation study is being conducted. Whereas validity generalization typically involves a more global assessment of a job, job component validity incorporates a more detailed examination of the job. The procedure involves inferring validity for a given job by analyzing the job, identifying the jobs's major functions or components of work, and then choosing tests or other predictors- based on evidence obtained through previous research - that predict performance on these major work components. Notice that selection procedure validity for awork component is inferred from exisiting validation reseach evidence rather than directly measuring validity through the traditional methods we have discussed (e.g. criterion-related or Content-related strategies). Two important assumptions underlie this strategy:
1. When jobs have a work component in commont, the KSA's required for performing that component are the same across these jobs.
2. A predictor's validity for a KSA required for performing a work component is reasonably consistent across jobs.
- Major steps involve in conducting a job component validation study:
- 1. Conduct an analysis of the job using the Position Analysis Questionnaire (PAQ). The PAQ is a commercially available, paper-and-pencil questionnaire that contains a comprehensive listing of general behaviours required at work. A respondent (eg. job analyst) responds to the PAQ by indicating the extent to which these descriptions accurately reflect the work behaviours performed on the particular job.
- 2. Identify the major components of work required on the job. Once the PAQ has been used to analyze the job, the most important work behaviours or components of the job are identified.
- 3. Identify the attributes requrired for performing the major components of the job. Using ratings from experts, the developers of the PAQ established links between 76 job attributes (including those related to mental ability, perceptual ability, psychomotor ability, interest, and temperment) and the work behaviours listed on the PAQ for approximately 2,200 job titles in the US labour force. This expert ratings database serves as a basis for identifying which of the 76 attributes are needed to perform the most important work components of a new job being analyzed. For instance when a new job is analyzed with the PAQ, results of the analysis show the importance of specific attributes, relative to the 2200 job in the PAQ database, in performing the identified components of the job.
- 4. Choose tests that measure the most important attributes identified from the PAQ analysis. Strictly speaking, actual validity coefficients are not computed in job component validity. Predicted validity coefficients and scores for selected general aptitude tests are estimated from an existing database of test scores and PAQ information. These coefficients estimate what the validity of a test would have been had an actual criterion-related validation study been performed. The results show which ability tests are likely to be most useful and valid for selecting among applicants for the job being analyzed.
- The results of a job component validity analysis identify which tests to use for the job being analyzed.
Accuracy of Job component validity studies.
Reported that job component validity estimates were generally lower and more conservative than validity coefficients obtained in actual validiation studies. Comparisons of predicted job component validity coefficients for 51 clerical jobs with actual validity coefficients computed for similar jobs iin a prior validation study. They concluded that the job component validity procedure produced predicted validity coefficients that were very similar to those that had been determined statistically. They reasoned that if the job component validity procedure were used to infer validity of tests for clerical jobs, the conclusions reached would be similar to those that had been drawn from actual validation studies.
- Criticisms of job component validity strategy
- Althougyh there is limited evidence to the contrary, they noted that the method has been less successful in predicting actual validity coefficients.
- The strategy has been relatively less useful in predicting psychomotor test data.
- The strategy has generally reported results for tests from the General Aptitude Test Battery that are available only to public employers. Results are available for only a relatively limited number of commercially available tests.
The method provides a unique approach for small organizations where sample sizes for criterion-related validity are inadequate, or for those jobs yet to be created or undergoing significant change.
Validation Options for Small Businesses:
We have reviewed a number of strategies for validating selection measures, all of which are suitable for use in organizations. However, some of these strategies are practical only for large organizations that have large numbers of job incumbesnts and applicants on whom validation data can be collected. Small Businesses can pose particular methodological challenges for those contemplating a validation study. These challenges do not mean selection measure validation should be ignored. Selection measure validation is probably more critical for a small business than it is for a large one. Small businesses often hire applicants who possess job-specific KSAs and work habits and who must immediately begin a job without completing an extensive training program. Large businesses generally have the resources to compensate for a hiring mistake. In small businesses, one or two bad hires could be financially devastating (e.g. because of poor job performance, theft, a negligent hiring suite brought against the business, or even a discrimination lawsuit brought by an aggrieved applicant).
Small businesses with 15 or more employees are covered by many eeo laws. Because of this, validation evidence is essential - but what options are available?
- Content validity is one option - As noted earlier, criterion problems - and the smaller numbers of people available to participate in a criterion-related validation study in many organizations - are two reasons for the increased use of content validity strategies. When a small organization of 50 or so people is hiring only one or two people a year, ordinary empirical validation cannot be performed. Content validity, however, does not necessarily mean criterion-related validity.
- Assuming that validity generalization continues to mature as an acceptable means for validating predictors, it is a second option.
- Synthetic validity is another option. It is a logical process of inferring test validity for components of jobs. Whatever the approach all synthetic approaches tend to involve the following steps:
- A - Analyzing jobs to identify their major components of work.
- B - Determining the relationships of selection predictors with these job components using content, construct, or criterion-related validity strategies.
- C - Choosing predictors to use in selection based on ther relationships with important job components.
- with validity we can collapse jobs together because we are studying a common work activity. In this sense, validity is "synthetic", we are not creating validity per se, but rather creating a situation that will permit a better estimate of validity. The major advantage of this is that selection measures are validated across several jobs in an organization, rather than for only one job.
The future of validation research
Changes in the workpalce are occurging and they will likely affect the assumptions we have made and the future of validation research.
- 1. Increasing numbers of small organizations without the resources (e.g. time or money) or technical requirements (e.g large sample sizes) to undertake traditional validation strategies.
- 2. Increasing use of teams of workers rather than individuals.
- 3. Changes in the definitions of job success to include criteria such as organization and job commitment, teamwork, and quality of service delivered to customers.
- 4. The changing nature of work - in that jobs and the requirements for performing them are becoming more fluid, requiring job analytic methods that focus on broader work capacities rahter than on molecular requirements of tasks.
- Small orgs, complex, expensive validation studies not necessary.
- Ability tests, assessment centers, interviews nd the like can be implemented in such small organizations and can produce an imediate payoff in their selection systems.